Le blog de pingou - Tag - rdf
Le blog de pingou, ses actualités sur Fedora, ses RPMs, ses tests, son Linux... :-)
Pingou's weblog, his fedora's news, his RPMs, his tests, his Linux... :-)
2022-02-17T10:46:15+01:00
pingou
urn:md5:66db5ce1ed1a80cb2f424695b4bb7780
Dotclear
Jena, virtuoso and option transitive
urn:md5:71fe78b8134a7224e9df07b6489537ee
2011-10-12T09:34:00+01:00
2011-10-12T09:34:00+01:00
Pierre-Yves
Bioinformatique
Fedora-planet
jena
rdf
Semantic Web
virtuoso
<p>Virtuoso has a specific syntax <q><em>option(transitive)</em></q> which need a little trick for Jena.</p> <p><a href="http://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/">Virtuoso</a> is a Database Management System able to handle <a href="http://en.wikipedia.org/wiki/RDF">rdf</a> and provide a <a href="http://en.wikipedia.org/wiki/Sparql">sparql</a> endpoint.</p>
<p>I am playing around with it for work and recently I have been facing a small problem which I thought I would document here.</p>
<p>The dataset I am using is the <a href="http://www.geneontology.org/">Gene Ontology</a> which can be <a href="http://archive.geneontology.org/latest-termdb/go_daily-termdb.owl.gz">download in owl</a>.</p>
<p>You may know that the gene ontology has a <a href="http://amigo.geneontology.org/cgi-bin/amigo/browse.cgi">tree structure</a>. This is represented in the ontology has:</p>
<pre> <owl:Class rdf:about="http://purl.org/obo/owl/GO#GO_0042254">
<rdfs:label xml:lang="en">ribosome biogenesis</rdfs:label>
...
<rdfs:subClassOf rdf:resource="http://purl.org/obo/owl/GO#GO_0022613"/>
</owl:Class>
<owl:Class rdf:about="http://purl.org/obo/owl/GO#GO_0022613">
<rdfs:label xml:lang="en">ribonucleoprotein complex biogenesis</rdfs:label>
...
<rdfs:subClassOf rdf:resource="http://purl.org/obo/owl/GO#GO_0071843"/>
</owl:Class>
<owl:Class rdf:about="http://purl.org/obo/owl/GO#GO_0071843">
<rdfs:label xml:lang="en">cellular component biogenesis at cellular level</rdfs:label>
...
<rdfs:subClassOf rdf:resource="http://purl.org/obo/owl/GO#GO_0022411"/>
<rdfs:subClassOf rdf:resource="http://purl.org/obo/owl/GO#GO_0071842"/>
</owl:Class>
....</pre>
<p>So has you see, the tree structure is represented using the "rdfs:subClassOf" predicate for each element.</p>
<p>My problem was rather simple, how can I retrieve all the GO terms subClass of a given GO term.</p>
<p>The base of the sparql query is:</p>
<pre>PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>
SELECT *
WHERE {
?s rdfs:subClassOf <http://purl.org/obo/owl/GO#GO_0015995> .
}</pre>
<p>This will return us the GO term just below our GO term (here 0015995), but no their children.</p>
<p>Using virtuoso's sparql endpoint, we can run the query:</p>
<pre>PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>
SELECT *
WHERE {
?s rdfs:subClassOf <http://purl.org/obo/owl/GO#GO_0015995> option(transitive) .
}</pre>
<p>This query will do what we want and return all the tree below the specified GO term.</p>
<p>However, porting this sparql query in <a href="http://incubator.apache.org/jena/">Jena</a> (a java framework for semantic web applications) might not be that easy.</p>
<p>If you try to use:</p>
<pre class="java java" style="font-family:inherit"><span style="color: #723F12; font-style: italic; font-weight: bold;">/** Logger. */</span>
<span style="color: #279AC4; font-weight: bold;">private</span> <span style="color: #279AC4; font-weight: bold;">static</span> <span style="color: #279AC4; font-weight: bold;">final</span> Logger LOG <span style="color: #339933;">=</span> Logger.<span style="color: #05A550;">getLogger</span><span style="color: #009900;">(</span>App.<span style="color: #279AC4; font-weight: bold;">class</span>.<span style="color: #05A550;">getName</span><span style="color: #009900;">(</span><span style="color: #009900;">)</span><span style="color: #009900;">)</span><span style="color: #339933;">;</span>
<span style="color: #723F12; font-style: italic; font-weight: bold;">/** Default URL to virtuoso. */</span>
<span style="color: #279AC4; font-weight: bold;">protected</span> <span style="color: #0058FC;">String</span> endpoint <span style="color: #339933;">=</span> <span style="color: #FF7700;">"http://localhost:8890/sparql/"</span><span style="color: #339933;">;</span>
<span style="color: #723F12; font-style: italic; font-weight: bold;">/**
* Main function.
* @param args an array of String representing the command line arguments.
*/</span>
<span style="color: #279AC4; font-weight: bold;">public</span> <span style="color: #787AFB; font-weight: bold;">void</span> runQueries<span style="color: #009900;">(</span> <span style="color: #0058FC;">String</span><span style="color: #009900;">[</span><span style="color: #009900;">]</span> args <span style="color: #009900;">)</span>
<span style="color: #009900;">{</span>
<span style="color: #0058FC;">String</span> querystring <span style="color: #339933;">=</span> <span style="color: #FF7700;">""</span>
<span style="color: #339933;">+</span> <span style="color: #FF7700;">"PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#> "</span>
<span style="color: #339933;">+</span> <span style="color: #FF7700;">"SELECT * "</span>
<span style="color: #339933;">+</span> <span style="color: #FF7700;">"WHERE {"</span>
<span style="color: #339933;">+</span> <span style="color: #FF7700;">" ?s rdfs:subClassOf <http://purl.org/obo/owl/GO#GO_0015995> option(transitive) . "</span>
<span style="color: #339933;">+</span> <span style="color: #FF7700;">"}"</span><span style="color: #339933;">;</span>
QueryExecution qexec <span style="color: #339933;">=</span> <span style="color: #787AFB; font-weight: bold;">null</span><span style="color: #339933;">;</span>
<span style="color: #0058FC;">ResultSet</span> results<span style="color: #339933;">;</span>
<span style="color: #279AC4; font-weight: bold;">try</span> <span style="color: #009900;">{</span>
qexec <span style="color: #339933;">=</span> QueryExecutionFactory.<span style="color: #05A550;">sparqlService</span><span style="color: #009900;">(</span>endpoint, querystring<span style="color: #009900;">)</span><span style="color: #339933;">;</span>
results <span style="color: #339933;">=</span> qexec.<span style="color: #05A550;">execSelect</span><span style="color: #009900;">(</span><span style="color: #009900;">)</span><span style="color: #339933;">;</span>
<span style="color: #009900;">}</span>
<span style="color: #279AC4; font-weight: bold;">catch</span> <span style="color: #009900;">(</span><span style="color: #0058FC;">Exception</span> ex<span style="color: #009900;">)</span> <span style="color: #009900;">{</span>
LOG.<span style="color: #05A550;">log</span><span style="color: #009900;">(</span>Level.<span style="color: #05A550;">SEVERE</span>, ex.<span style="color: #05A550;">getMessage</span><span style="color: #009900;">(</span><span style="color: #009900;">)</span><span style="color: #009900;">)</span><span style="color: #339933;">;</span>
<span style="color: #009900;">}</span>
<span style="color: #279AC4; font-weight: bold;">finally</span> <span style="color: #009900;">{</span>
qexec.<span style="color: #05A550;">close</span><span style="color: #009900;">(</span><span style="color: #009900;">)</span><span style="color: #339933;">;</span>
<span style="color: #009900;">}</span>
<span style="color: #666666; font-style: italic;">// do something with the ResultSet...</span>
<span style="color: #009900;">}</span>
<span style="color: #339933;">/</span></pre>
<p>That will not work, the trick is to use:</p>
<pre class="java java" style="font-family:inherit"><span style="color: #723F12; font-style: italic; font-weight: bold;">/** Logger. */</span>
<span style="color: #279AC4; font-weight: bold;">private</span> <span style="color: #279AC4; font-weight: bold;">static</span> <span style="color: #279AC4; font-weight: bold;">final</span> Logger LOG <span style="color: #339933;">=</span> Logger.<span style="color: #05A550;">getLogger</span><span style="color: #009900;">(</span>App.<span style="color: #279AC4; font-weight: bold;">class</span>.<span style="color: #05A550;">getName</span><span style="color: #009900;">(</span><span style="color: #009900;">)</span><span style="color: #009900;">)</span><span style="color: #339933;">;</span>
<span style="color: #723F12; font-style: italic; font-weight: bold;">/** Default URL to virtuoso. */</span>
<span style="color: #279AC4; font-weight: bold;">protected</span> <span style="color: #0058FC;">String</span> endpoint <span style="color: #339933;">=</span> <span style="color: #FF7700;">"http://localhost:8890/sparql/"</span><span style="color: #339933;">;</span>
<span style="color: #723F12; font-style: italic; font-weight: bold;">/**
* Main function.
* @param args an array of String representing the command line arguments.
*/</span>
<span style="color: #279AC4; font-weight: bold;">public</span> <span style="color: #787AFB; font-weight: bold;">void</span> runQueries<span style="color: #009900;">(</span> <span style="color: #0058FC;">String</span><span style="color: #009900;">[</span><span style="color: #009900;">]</span> args <span style="color: #009900;">)</span>
<span style="color: #009900;">{</span>
<span style="color: #0058FC;">String</span> querystring <span style="color: #339933;">=</span> <span style="color: #FF7700;">""</span>
<span style="color: #339933;">+</span> <span style="color: #FF7700;">"PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#> "</span>
<span style="color: #339933;">+</span> <span style="color: #FF7700;">"SELECT * "</span>
<span style="color: #339933;">+</span> <span style="color: #FF7700;">"WHERE {"</span>
<span style="color: #339933;">+</span> <span style="color: #FF7700;">" ?s rdfs:subClassOf <http://purl.org/obo/owl/GO#GO_0015995> option(transitive) . "</span>
<span style="color: #339933;">+</span> <span style="color: #FF7700;">"}"</span><span style="color: #339933;">;</span>
QueryExecution qexec <span style="color: #339933;">=</span> <span style="color: #787AFB; font-weight: bold;">null</span><span style="color: #339933;">;</span>
<span style="color: #0058FC;">ResultSet</span> results<span style="color: #339933;">;</span>
<span style="color: #279AC4; font-weight: bold;">try</span> <span style="color: #009900;">{</span>
qexec <span style="color: #339933;">=</span> <span style="color: #279AC4; font-weight: bold;">new</span> QueryEngineHTTP<span style="color: #009900;">(</span>endpoint, querystring<span style="color: #009900;">)</span><span style="color: #339933;">;</span>
results <span style="color: #339933;">=</span> qexec.<span style="color: #05A550;">execSelect</span><span style="color: #009900;">(</span><span style="color: #009900;">)</span><span style="color: #339933;">;</span>
<span style="color: #009900;">}</span>
<span style="color: #279AC4; font-weight: bold;">catch</span> <span style="color: #009900;">(</span><span style="color: #0058FC;">Exception</span> ex<span style="color: #009900;">)</span> <span style="color: #009900;">{</span>
LOG.<span style="color: #05A550;">log</span><span style="color: #009900;">(</span>Level.<span style="color: #05A550;">SEVERE</span>, ex.<span style="color: #05A550;">getMessage</span><span style="color: #009900;">(</span><span style="color: #009900;">)</span><span style="color: #009900;">)</span><span style="color: #339933;">;</span>
<span style="color: #009900;">}</span>
<span style="color: #279AC4; font-weight: bold;">finally</span> <span style="color: #009900;">{</span>
qexec.<span style="color: #05A550;">close</span><span style="color: #009900;">(</span><span style="color: #009900;">)</span><span style="color: #339933;">;</span>
<span style="color: #009900;">}</span>
<span style="color: #666666; font-style: italic;">// do something with the ResultSet...</span>
<span style="color: #009900;">}</span></pre>
<p>Note the way the QueryExecution object is created. In the first case, it will fail complaining about the syntax of the query but not in the second case :-)</p>
<p>I could not find this documented on the web, so there it is :)</p>
<p>Thanks to shellac on #jena (freenode) for helping me/finding it.</p>