I'm a M.Sc. Student (grad in Aug 2010) in Biotechnology, at Uppsala University, focusing on Semantic Web, Bioinformatics and Systems Biology. I'm also a MediaWiki and Drupal geek, more or less =D.
This blog is currently used for documentation of the following projects:
-- Samuel Lampa - firstname.lastname@gmail.com
I presented my MSc thesis project "SWI-Prolog as a Semantic Web tool for semantic querying in Bioclipse" today. (Report for download here
Find the slides below. I expected a very non-informatics audience (though some of my fellow Bioclipse:rs showed up =) ), had only 20 minutes, and lots of non-common-knowledge things to introduce, so these are really mostly a bunch of pictures for talking through the basics of semantic web, prolog and Bioclipse.
Turning back to the GSoC project now!
Have started actual coding for GSoC this wednesday (start was 2 weeks delayed because of exams, which I'll catch up). Still just getting up to speed, but looking now into the PHP RDF framework ARC, whose RDF store will replace the currently used RAP store in Semantic Mediawiki. Usage ARC itself looks very straightforward. Just have to figure out the SMW Store API. Looking at the SMWRapStore2.php now, to get an idea.
If you want to follow my progress in (approximate) real-time, then see my twitter.
My degree project, titled "SWI-Prolog as a Semantic Web tool for semantic querying in Bioclipse" is getting closer to finish. Now my report is approved by the Scientific Reviewer (thanks, Prof. Mats Gustafsson), so I wanted to make it available here (Download PDF). Reports on typos are welcome of course! :)
Coding for my GSoC project will start for real in June, 9th or so, but I just had a first look at code, to start wrapping my head around the things involved. I installed the following on my local SMW:
I tested the RAP based SPARQL endpoint, played a bit with SMWWriter, and tried to get some grips of how to best use existing functionality for implementing RDF import/export.
Some questions that arose (for Denny in the first place, I guess, but feel free to comment):
We are probably going to use SMWWriter for extending the RDF import/export functionality of Semantic MediaWiki, so I wanted to test it out a bit.
With some copy and paste of code from this page, I quickly had a MediaWiki Special Page set up, where I could make use SMWWriters internal API to implement a crude form for adding or removing "triples" in my Semantic MediaWiki. See Screenshot:

And the result, on the Methane page:

Looks promising. Connecting this with some ARC functionality for parsing SPARQL and RDF/XML, should make a big step in the right direction.
(For my internal documentation, mostly)
Got Eclipse for PHP up running now, with XDebug. Yay :) It was a snap to install on my Ubuntu box. I basically followed this and this blog post. (The Ubuntu package for XDebug is php5-xdebug).
The Eclipse dialogs had changed location and structure a little, so for my documentation, I included a screenshot of the dialog under "Run > Debug configurations" below.

Interesting: "Scientists Seeking NSF Funding Will Soon Be Required to Submit Data Management Plans". Is this a new trend? If so, it should be an area where Bioclipse can help, I gues, not least through the planned ability to export semantic data to a Semantic MediaWiki (That will require me to finish my GSoC first, though).
I reported earlier that Jena/SPARQL outperformed Prolog for a lookup query with some numerical value comparison. It later on turned out that the results were flawed and finally that Prolog indeed was the fastest as soon as turning to datasets with more than a few hundred peaks.
The Prolog program I was using was rather complicated with recursive operations on double lists etc. Then, some week ago, I tried, in order to highlight differences in expressivity between Prolog and SPARQL, to implement a Prolog query that mimicked the structure of the SPARQL query I used, as close as possible. Interestingly it turned out that this Prolog query can be optimized to become blazing fast by reversing the order of shift values to search for, so that the largest values are searched for first. With this optimization the query outperforms both the SPARQL and the earlier used prolog code. See figure 1 below for results (The new prolog query is named "SWI-Prolog Minimal"). It appears that the querying time does not even increase with the number of triples in the RDF store!
Figure 1: Spectrum similarity search comparison: SWI-Prolog vs. Jena
The explanation seems to stem from the fact that larger NMR Shift values are in general more unique than smaller values (see histogram of shift values in the full data set in figure 2 below). Thus, by testing for the largest value first, the query will be much less prone to get stuck in false leads. (Well, looking at the histogram, it appears that one could in fact do even better sorting than just from larger to smaller, like testing for values around 100 before values around 130 etc.)
Figure 2: Histogram of NMR Shift values in 25000 spectrum dataset
Recent comments
1 day 16 hours ago
4 days 4 hours ago
3 weeks 2 days ago
3 weeks 2 days ago
3 weeks 2 days ago
5 weeks 6 days ago
7 weeks 2 days ago
7 weeks 3 days ago
7 weeks 4 days ago
7 weeks 5 days ago