English version

A while ago I blogged about using pyRdfa to parse RDFa meta information from a web-page.

The pyRdfa library has since been merged into the great rdflib library.

Today I wanted to integrate doap information into a small project I have been working on recently cnucnu web. I decided to integrate the meta information using RDFa and thus needed a way to check if this integration was correctly made.

So here are the updated instruction to get rdflib working to parse RDFa tags in a HTML page.

  • Create a virtualenv (required as I still haven't updated python-rdflib in Fedora :-s)
virtualenv rdflib
  • Activate the virtualenv
source rdflib/bin/activate
  • Install rdflib
pip install rdflib

You can then run the following code in a python console:

import rdflib
from rdflib.plugins.parsers.pyRdfa import pyRdfa
proc = pyRdfa()
graph = rdflib.Graph()
url = ""

for s, p, o in proc.graph_from_source(url, graph): print s, p, o

This should return you:

N5574ec1c93054b898e44d4ab7f9df431 http://usefulinc.com/ns/doap#revision 2.5.4
N5574ec1c93054b898e44d4ab7f9df431 http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://usefulinc.com/ns/doap#Version
http://code.google.com/p/abcde/ http://usefulinc.com/ns/doap#release N5574ec1c93054b898e44d4ab7f9df431
http://code.google.com/p/abcde/ http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://usefulinc.com/ns/doap#Project
http://code.google.com/p/abcde/ http://usefulinc.com/ns/doap#homepage http://code.google.com/p/abcde/
http://code.google.com/p/abcde/ http://usefulinc.com/ns/doap#name abcde

Providing thus the RDF version of the information contained in the HTML about the project, its name, homepage and version available.