SWI-Prolog

Report approved - preliminary version for download

My degree project, titled "SWI-Prolog as a Semantic Web tool for semantic querying in Bioclipse" is getting closer to finish. Now my report is approved by the Scientific Reviewer (thanks, Prof. Mats Gustafsson), so I wanted to make it available here (Download PDF). Reports on typos are welcome of course! :)

On larger datasets and more peaks in the query, Prolog is the fastest

In a previous performance test I compared a NMR Spectrum similarity search programmed in Prolog and run in SWI-Prolog integrated in Bioclipse, against a SPARQL Query doing the same task, run in Jena, and Pellet, also integrated in Bioclipse.

In that test Jena was the fastest, outperforming Prolog. The test had flaws though, as reported here with new tests added, though these new results were still not giving a clear winner.

This changed when I turned to larger datasets, testing on the range 2000-25000 spectra (where 25000 spectra corresponds to around a million RDF triples), instead of 10-100, which is a lot closer to the size of all spectra in NMRShiftDB (36419 searchable spectra as of March 29, 2010). I also used 16 instead fo 3 peak shift values in the search query. When testing with these changes, Prolog clearly took the lead.

(Bioclipse scripts, including the SPARQL Query, and the Prolog code, attached. See below).

This time I also included tests with Jena using both the in-memory RDF Store, and the Jena TDB disk based one. Interesting is that Jena run against the TDB disk based store is about double as fast as against the in-memory store! (I had to exclude the combination Pellet/TDB store, as it didn't complete for more than an hour, after which I stopped the run.)

To be noted, in these tests I did also take more measures as to avoid memory and/or caching related bias. For example, I did not rerun tests shortly after each other in a loop, but restarted Bioclipse after each run. This kind of testing took a lot more time, and I have so far only had time to do three iterataions for each specific size of dataset. Error bars, indicating the standard deviation, are included though to give an idea of the variation among the three. Hopefully I will have time to complement with a few more runs per measure point.

See figures below for results. I've included one figure where Pellet is included, and one where it is skipped, in order to get a zoom in on the Jena/Prolog difference.

 

...and, Pellet excluded:

 

Correction of flawed results: Close competition between Jena and Prolog

UPDATE 29/3: See new results here

I reported in a previous blog post (with a bit of surprise) that Jena clearly outperformed SWI-Prolog for a NMR Spectrum similarity search run inside Bioclipse. I have now realized that indeed these previous results were flawed for a number of reasons.

Need to add "colon dash" before rdf_register_ns

I could not get the rdf_db:rdf_register_ns/2 function of SWI-Prolog's semweb package to work ... getting errors like "No permission to redefine static method ..." etc.

Now I finally figured out one has to add ":-" before the line with the rdf_db:rdf_register_ns predicate, like so:

:- rdf_db:rdf_register_ns(nmr, 'http://www.nmrshiftdb.org/onto#').

or ... inside Bioclipse:
swipl.loadPrologCode(":- rdf_db:rdf_register_ns(nmr, 'http://www.nmrshiftdb.org/onto#').");

Screencast: Experimental Prolog integration in Bioclipse

I wanted to test out some screen casting, so I chose to demo the (still experimental) SWI-Prolog integration into Bioclipse, showing how Prolog code (or a "Prolog knowledge base") can conveniently be stored inside Bioclipse's JavaScript environment (in a JS variable), loaded into the prolog engine, and then queried, all from the JS environment, and finally the results can be returned as well to the Javascript environment for further processing or output.

Note that this is still at the experimental stage, so things are a bit rough around the edges!

A usage strategy emerges

A strategy for how to work with the Bioclipse/JPL/Prolog/Blipkit combination I'm setting up, is becoming clear.

The main idea with Bioclipse, as well as with having a prolog engine available in it, is for flexible and "interactive" knitting together of knowledge. One of the main questions regarding how to use a Bioclipse/JPL/Prolog/Blipkit combination, has been where to put the bulk of knowledge integration/reasoning code? There would in principle be three options for that:

  1. Bioclipse (Javascript environment)
  2. The Blipkit-Prolog/Bioclipse integration plugin (Java code, a.k.a. "Manager methods")
  3. The prolog engine (As a prolog file)

Converting RDF predicates to Prolog convenience methods with RegEx

Using SWI-Prolog's semweb package, I had extracted all predicates in a RDF source, containing some 1 million triples, into the following list:

Nice intro to RDF in Prolog (by Pellet author)

I found a nice introduction to the use of RDF in Prolog (SWI-Prolog). It contains short primers for both RDF and Prolog, so it should be accessible to anyone with a minimal programming background:

Accessing BLIPKIT from command line Java or Eclipse Plug-in in Ubuntu

For my own documentation I went ahead and summarized all the steps I had to take

  1. In order to start blipkit from java commandline, and
  2. In order to start blipkit from an Eclipse plugin.

Got blipkit started from inside Eclipse

I now also managed to start blipkit from inside Eclipse.

The trick was to start the whole eclipse (The eclipse using for building Bioclipse) preceded with LD_PRELOAD=...the path to libjpl.so , and before that adding the paths to where libjava.so and libjvm.so are located.