My degree project, titled "SWI-Prolog as a Semantic Web tool for semantic querying in Bioclipse" is getting closer to finish. Now my report is approved by the Scientific Reviewer (thanks, Prof. Mats Gustafsson), so I wanted to make it available here (Download PDF). Reports on typos are welcome of course! :)
In a previous performance test I compared a NMR Spectrum similarity search programmed in Prolog and run in SWI-Prolog integrated in Bioclipse, against a SPARQL Query doing the same task, run in Jena, and Pellet, also integrated in Bioclipse.
In that test Jena was the fastest, outperforming Prolog. The test had flaws though, as reported here with new tests added, though these new results were still not giving a clear winner.
This changed when I turned to larger datasets, testing on the range 2000-25000 spectra (where 25000 spectra corresponds to around a million RDF triples), instead of 10-100, which is a lot closer to the size of all spectra in NMRShiftDB (36419 searchable spectra as of March 29, 2010). I also used 16 instead fo 3 peak shift values in the search query. When testing with these changes, Prolog clearly took the lead.
(Bioclipse scripts, including the SPARQL Query, and the Prolog code, attached. See below).
This time I also included tests with Jena using both the in-memory RDF Store, and the Jena TDB disk based one. Interesting is that Jena run against the TDB disk based store is about double as fast as against the in-memory store! (I had to exclude the combination Pellet/TDB store, as it didn't complete for more than an hour, after which I stopped the run.)
To be noted, in these tests I did also take more measures as to avoid memory and/or caching related bias. For example, I did not rerun tests shortly after each other in a loop, but restarted Bioclipse after each run. This kind of testing took a lot more time, and I have so far only had time to do three iterataions for each specific size of dataset. Error bars, indicating the standard deviation, are included though to give an idea of the variation among the three. Hopefully I will have time to complement with a few more runs per measure point.
See figures below for results. I've included one figure where Pellet is included, and one where it is skipped, in order to get a zoom in on the Jena/Prolog difference.

...and, Pellet excluded:

I could not get the rdf_db:rdf_register_ns/2 function of SWI-Prolog's semweb package to work ... getting errors like "No permission to redefine static method ..." etc.
Now I finally figured out one has to add ":-" before the line with the rdf_db:rdf_register_ns predicate, like so:
:- rdf_db:rdf_register_ns(nmr, 'http://www.nmrshiftdb.org/onto#').swipl.loadPrologCode(":- rdf_db:rdf_register_ns(nmr, 'http://www.nmrshiftdb.org/onto#').");I wanted to test out some screen casting, so I chose to demo the (still experimental) SWI-Prolog integration into Bioclipse, showing how Prolog code (or a "Prolog knowledge base") can conveniently be stored inside Bioclipse's JavaScript environment (in a JS variable), loaded into the prolog engine, and then queried, all from the JS environment, and finally the results can be returned as well to the Javascript environment for further processing or output.
Note that this is still at the experimental stage, so things are a bit rough around the edges!
A strategy for how to work with the Bioclipse/JPL/Prolog/Blipkit combination I'm setting up, is becoming clear.
The main idea with Bioclipse, as well as with having a prolog engine available in it, is for flexible and "interactive" knitting together of knowledge. One of the main questions regarding how to use a Bioclipse/JPL/Prolog/Blipkit combination, has been where to put the bulk of knowledge integration/reasoning code? There would in principle be three options for that:
Using SWI-Prolog's semweb package, I had extracted all predicates in a RDF source, containing some 1 million triples, into the following list:
I found a nice introduction to the use of RDF in Prolog (SWI-Prolog). It contains short primers for both RDF and Prolog, so it should be accessible to anyone with a minimal programming background:
For my own documentation I went ahead and summarized all the steps I had to take
I now also managed to start blipkit from inside Eclipse.
The trick was to start the whole eclipse (The eclipse using for building Bioclipse) preceded with LD_PRELOAD=...the path to libjpl.so , and before that adding the paths to where libjava.so and libjvm.so are located.
Recent comments
2 weeks 3 days ago
2 weeks 3 days ago
2 weeks 3 days ago
5 weeks 12 hours ago
6 weeks 4 days ago
6 weeks 5 days ago
6 weeks 5 days ago
6 weeks 6 days ago
6 weeks 6 days ago
6 weeks 6 days ago