Monday, June 27, 2005

Northrop Grumman Enters Semantic Arena

This is very important news for both Open Source and Semantic web folks! Northrop Grumman has purchased the assets of Tucana and will continue fostering development as an open-source platform.

Northrop has decided to continue development of TKS and to release a new version sometime in the future. We are still discussing feature sets, licensing and schedule, so don't bother to ask me yet. I'll post it here when I can.

The purchase also included the copyright to Kowari. Stunningly for a company their size, Northrop has not only agreed to support Kowari but rushed to do so. I certainly didn't expect a US federal systems integrator to "get" Open Source Software, but times have clearly changed. Their senior managers have made a legitimate effort to figure out the licensing and how to make it work within their business model. I have confidence that we can figure out a way to make it work for both the Kowari community and Northrop Grumman.

Via Vowel Movement

Oracle Adding Semantic Tech

I've been told that the feedback to Oracle about NDM went along the lines of "yeah that's awesome, but when are you going to support the standards?". I haven't been able to find references to this news yet, but here it is anyway: 10g.2 will include an RDF layer built on NDM -- I'm hoping this means we'll have a bulletproof and vendor-supported metastore + inference engine for building semantic web applications of the future. According to my source, Oracle will make a big marketing splash about this when 10g.2 is formally announced. Soon.

The Oracle Spatial Network Data Model (NDM) feature enables graph modeling and analysis in Oracle Database 10g. NDM explicitly stores and maintains connectivity (nodes, links, and paths) within networks and provides network analysis capability such as shortest path and connectivity analyses. NDM includes a PL/SQL API Package for Network Data Query and management, and a Java API for network creation, representation, editing and analysis.

Actually, I did find a reference to Oracle's RDF Data Model in 10g here. I'm going to try to get my hands on a copy of 10g soon and see what is provided.

Updated: It's amazing what you can find when you use the right search terms.

Enterprise Ruby on Rails

[I recently had the following email exchange with a developer who was gracious enough to allow me to repost it here, appropriately anonymized of course. His questions are mostly italicized, my responses are mostly bold. I hope this gives you an idea of what we are facing in selling Ruby on Rails as a viable technology for corporate developers. - Obie]

What in particular would make Ruby on Rails a poor choice for a large production application? Speed? Scalability? Difficulty in dealing with complexity?

Speed and scalability are not issues as far as I’m concerned. I have discussed that topic in depth with David H. Hansson and I know the details of how they have the 37signals products setup. You can create clusters with HA-proxy providing the glue between tiers and load balancers in front divvying up incoming traffic. The great thing about working with process-based platforms like Ruby is that you scale them the same as Perl CGI apps and there is a lot of wisdom on how to do that effectively thanks to years and years of practice and open-source involvement by mega-traffic sites such as LiveJournal and Slashdot.

The issue of dealing with complexity is multi-faceted. I’ve heard of some guys (via IRC) working with dozens of model objects and controllers. I think one of the biggest concerns is how to effectively share the work between more than a couple of developers working closely together -- all a matter of communication and coordinating approaches. It is so easy to do the same thing in Ruby about twenty different ways that I assume (perhaps prematurely!?) that a large team would be difficult to manage.

What would make an application too large for Ruby on Rails? Basecamp (and even Backpack) seem to be decent in size (at least in target market, and Basecamp seems significant in the functionality provided).

They have significant functionality, but the scope is very focused. That is really the whole mentality behind "Web 2.0" style apps as I understand it: do one thing and do it very, very well.

Is it possible that if an application is too "large" for this sort of technology that perhaps it is poorly architected? Yes

(Some exceptions, perhaps few?) Probably

Is it possible that many web applications are far more complex and monolithic than they should be? Most definitely!

Perhaps web applications that provide large sets of functionality (like the one I work on) should be build(sic) as smaller semi-independent modules that depend either on the other modules for interoperability or on some shared data concepts? Sounds correct

As an example, we recently developed a new module for this site. The functional requirements actually turned out to be relatively basic. However, the project took significant time to spec and build, went over budget, and is not as easily maintainable as it should be given the depth of functionality provided. Much of the additional time spent seemed (in my opinion) due to trying to shoe-horn this new module into the existing massive structure of the site. The points of necessary interaction with existing site data were few. Seems like there might be a better way to develop new functionality while still maintaining the necessary interaction with the other areas of the site as well as the availability of the data for cross-module reporting?

Well, since you use Oracle, there are a lot of people that are saying that you could design Oracle updatable views as the data sources for your Ruby on Rails apps. As for determining the suitability of Ruby on Rails, it is a productive enough environment that you could probably reimplement one or more of the modules of your web application without much effort just to prove the concept.

Here is additional commentary on the topic that I shared with him later on...

Whytheluckystiff (silly name, but famous author in Ruby-land) discusses Ruby’s memory model in more depth than has been attempted before. A small, highly-competent team of developers can be very productive with RoR and still be mindful of things like respecting memory consumption – a larger team probably not. Actually, I was thinking of my experience on large enterprise app development based on J2EE and the same holds true! So the real barriers to adoption of Ruby on Rails are probably 1) it’s new 2) the productivity factor makes it a disruptive technology in most corporate settings.

Friday, June 24, 2005

Semantics at the Roundtable

I'm participating in a series of roundtable discussions with some IT analysts and ThoughtWorkers today. Martin led off with a long discussion of language-oriented programming and we've covered a few more already. The topics are disparate but all have touched on semantics in some way -- most notably during talk about language workbenches and domain-specific languages. Those technologies facilitate intentional programming and bringing business people closer into the creative aspect of making software, but still suffer the same problems of the past when it comes to establishing shared understanding of the business domain. I'm slated to talk about Ruby but 2/3rds of my presentation is actually about Semantic Enterprise Architecture.

Thursday, June 23, 2005

Semantic SOA Billing Rates

The December 2004/January 2005 issue of Business Integration Journal had an article by John Schmidt called What the &%$! Are Ontologies? in which Schmidt says, "If your consultant starts talking about semantics, metadata, cononicals [sic] and ontologies, expect to pay an extra $100 per hour."

And at the conclusion of the Protégé Short Course, the organizers passed out a menu of service offerings which included consulting on ontology development at $500 per hour. (via Semantic Arts)

Wednesday, June 22, 2005

Metastore Scaleability Concerns

I'm sure that I am not the only semhead concerned about scaleability issues when we start to pump millions of RDF documents into our datastore. The company that was once behind the open-source kowari metastore had a commercial offering described as...

The Tucana Knowledge Server has been developed to fill the market need for managing large quantities of RDF statements Acknowledging the problems that traditional relational database management systems (RDBMSs) have with storing large quantities of RDF data, the Tucana Knolwedge Server implements a native RDF database and consists of high-level APIs, a query engine and an underlying data store. TKS is implemented entirely in Java and is a scalable, distributed, secure, transaction-safe database built on a directed graph data model and optimized for searching metadata.

A single instance of the Knowledge Server has been used to store 350 million RDF statements and Tucana continues to improve the engine to maintain its position as the most scalable RDF data store available. Multiple instances of the Tucana Knowlege Server can be combined and treated as a "virtual database", offering another approach towards scalability. Any instance of the Knowledge Server may be used as the entry point for such a "federated" query, and will subsequently query any number of remote servers, collect their intermediate results and join on them to produce a single, coordinated result.

The problem is that Tucana seems to have gone out of business. There are myriad reasons why they might have gone out of business and I'm trying to get some information about that and whether it is still viable to base a solution on kowari. Another question I'd like to put out to the community at large is whether it makes sense to setup a hybrid architecture where low-volume data flows into the metastore, but high-volume data flows into a relational database. When we need to run a report based on a semantic query against the metastore we first load it with relevant instance data culled from the relational database and transformed back into RDF dynamically.

The side by side metastore/db idea also stands to facilitate adoption of this architecture, since it is a lot easier to sell the powers-that-be on a combined solution than it is to convince them to go from a (somewhat) unproven technology.

I had to put parentheses around unproven in the statement above because I believe that this technology is only unproven in the wide commercial arena. It is my understanding that metastores and a lot of the technologies that underlie the Semantic web are actually quite proven by medical research teams and defense contractors.

I've had this Blogger account for years and now it's time to put it to good use. JRoller has been a decent home, but the occasional bursts of crappiness finally got to me. My blog will cover topics that I find relevant in my professional life as a technologist. The subjects that excite me nowadays are dynamic languages, the semantic web, intentional and language-oriented programming. I consider these technologies to be the foundation of agile enterprise development and I'm writing a book along that line of thinking, hence the new title. - Obie