Virtuoso is a Database Management System able to handle rdf and provide a sparql endpoint.

I am playing around with it for work and recently I have been facing a small problem which I thought I would document here.

The dataset I am using is the Gene Ontology which can be download in owl.

You may know that the gene ontology has a tree structure. This is represented in the ontology has:

 <owl:Class rdf:about="http://purl.org/obo/owl/GO#GO_0042254">
   <rdfs:label xml:lang="en">ribosome biogenesis</rdfs:label>
   ...
   <rdfs:subClassOf rdf:resource="http://purl.org/obo/owl/GO#GO_0022613"/>
 </owl:Class>
 <owl:Class rdf:about="http://purl.org/obo/owl/GO#GO_0022613">
   <rdfs:label xml:lang="en">ribonucleoprotein complex biogenesis</rdfs:label>
   ...
   <rdfs:subClassOf rdf:resource="http://purl.org/obo/owl/GO#GO_0071843"/>
 </owl:Class>
 <owl:Class rdf:about="http://purl.org/obo/owl/GO#GO_0071843">
   <rdfs:label xml:lang="en">cellular component biogenesis at cellular level</rdfs:label>
   ...
   <rdfs:subClassOf rdf:resource="http://purl.org/obo/owl/GO#GO_0022411"/>
   <rdfs:subClassOf rdf:resource="http://purl.org/obo/owl/GO#GO_0071842"/>
 </owl:Class>
 ....

So has you see, the tree structure is represented using the "rdfs:subClassOf" predicate for each element.

My problem was rather simple, how can I retrieve all the GO terms subClass of a given GO term.

The base of the sparql query is:

PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#> 
SELECT * 
WHERE {
  ?s rdfs:subClassOf  <http://purl.org/obo/owl/GO#GO_0015995> . 
}

This will return us the GO term just below our GO term (here 0015995), but no their children.

Using virtuoso's sparql endpoint, we can run the query:

PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#> 
SELECT * 
WHERE {
  ?s rdfs:subClassOf  <http://purl.org/obo/owl/GO#GO_0015995> option(transitive) . 
}

This query will do what we want and return all the tree below the specified GO term.

However, porting this sparql query in Jena (a java framework for semantic web applications) might not be that easy.

If you try to use:

/** Logger. */
private static final Logger LOG = Logger.getLogger(App.class.getName());
/** Default URL to virtuoso. */
protected String endpoint = "http://localhost:8890/sparql/";
 
/**
 * Main function.
 * @param args an array of String representing the command line arguments.
 */
public void runQueries( String[] args )
{
    String querystring = ""
            + "PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#> "
            + "SELECT * "
            + "WHERE {"
            + "  ?s rdfs:subClassOf  <http://purl.org/obo/owl/GO#GO_0015995> option(transitive) . "
            + "}";
 
    QueryExecution qexec = null;
    ResultSet results;
    try {
        qexec = QueryExecutionFactory.sparqlService(endpoint, querystring);
        results = qexec.execSelect();
    }
    catch (Exception ex) {
        LOG.log(Level.SEVERE, ex.getMessage());
    }
    finally {
        qexec.close();
    }
    // do something with the ResultSet...
}
/

That will not work, the trick is to use:

/** Logger. */
private static final Logger LOG = Logger.getLogger(App.class.getName());
/** Default URL to virtuoso. */
protected String endpoint = "http://localhost:8890/sparql/";
 
/**
 * Main function.
 * @param args an array of String representing the command line arguments.
 */
public void runQueries( String[] args )
{
    String querystring = ""
            + "PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#> "
            + "SELECT * "
            + "WHERE {"
            + "  ?s rdfs:subClassOf  <http://purl.org/obo/owl/GO#GO_0015995> option(transitive) . "
            + "}";
 
    QueryExecution qexec = null;
    ResultSet results;
    try {
        qexec = new QueryEngineHTTP(endpoint, querystring);
        results = qexec.execSelect();
    }
    catch (Exception ex) {
        LOG.log(Level.SEVERE, ex.getMessage());
    }
    finally {
        qexec.close();
    }
    // do something with the ResultSet...
}

Note the way the QueryExecution object is created. In the first case, it will fail complaining about the syntax of the query but not in the second case :-)

I could not find this documented on the web, so there it is :)

Thanks to shellac on #jena (freenode) for helping me/finding it.