English version

I just started to play around with the pyRdfa library.

The really cool features it has is to be able to retrieve/generate complete RDF document from RDFa (which is basically RDF embedded within html).

Using it you are therefore able to add RDFa to your website, retrieve the RDF content which is in it and reason on the RDF as you normally do with RDF.

Some demo:
  • Install the dependencies:
yum install python-isodate python-html5lib python-rdflib
  • Clone the sources:
git clone https://github.com/RDFLib/pyrdfa3.git
  • Go in the sources:
cd pyrdfa3
  • Open a python shell and play:
from pyRdfa import pyRdfa
proc = pyRdfa()
print proc.rdf_from_source('https://github.com/RDFLib/pyrdfa3', 'nt')

This will print you the RDF retrieved from the github project (I discovered in the meanwhile that github uses RDFa)

Output:

>>> from pyRdfa import pyRdfa
>>> proc = pyRdfa()
>>> print proc.rdf_from_source('https://github.com/RDFLib/pyrdfa3', 'nt')
<https://github.com/RDFLib/pyrdfa3> <http://ogp.me/ns#url> "https://github.com/RDFLib/pyrdfa3" .
<https://github.com/RDFLib/pyrdfa3> <http://ogp.me/ns#site_name> "GitHub" .
<https://github.com/RDFLib/pyrdfa3> <http://ogp.me/ns#type> "githubog:gitrepository" .
<https://github.com/RDFLib/pyrdfa3> <http://ogp.me/ns#description> "pyrdfa3 - RDFa 1.1 distiller/parser library: can extract RDFa 1.1 (and RDFa 1.0, if properly set via a @version attribute) from (X)HTML, SVG, or XML in general. The module can be used to produce serialized versions of the extracted graph, or simply an RDFLib Graph." .
<https://github.com/RDFLib/pyrdfa3> <http://ogp.me/ns#image> "https://a248.e.akamai.net/assets.github.com/images/gravatars/gravatar-140.png?1329275856" .
<https://github.com/RDFLib/pyrdfa3> <http://ogp.me/ns#title> "pyrdfa3" .

So rdf_from_source nicely provides you a string of your RDF document (Note: this method also accept text files for input).


You can of course also change the format:

print proc.rdf_from_source('https://github.com/RDFLib/pyrdfa3', 'xml')

Which returns:

>>> from pyRdfa import pyRdfa
>>> print proc.rdf_from_source('https://github.com/RDFLib/pyrdfa3', 'xml')
<?xml version="1.0" encoding="utf-8"?>
<rdf:RDF
  xmlns:og="http://ogp.me/ns#"
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
>
  <rdf:Description rdf:about="https://github.com/RDFLib/pyrdfa3">
    <og:url>https://github.com/RDFLib/pyrdfa3</og:url>
    <og:image>https://a248.e.akamai.net/assets.github.com/images/gravatars/gravatar-140.png?1334862345</og:image>
    <og:site_name>GitHub</og:site_name>
    <og:description>pyrdfa3 - RDFa 1.1 distiller/parser library: can extract RDFa 1.1 (and RDFa 1.0, if properly set via a @version attribute) from (X)HTML, SVG, or XML in general. The module can be used to produce serialized versions of the extracted graph, or simply an RDFLib Graph.</og:description>
    <og:title>pyrdfa3</og:title>
    <og:type>githubog:gitrepository</og:type>
  </rdf:Description>
</rdf:RDF>



And finally, it is nice to retrieve a string but you will likely want to retrieve a graph if you are already familiar with RDF, so have a look at graph_from_source which returns you a RDFLib graph:

>>> print proc.graph_from_source('https://github.com/RDFLib/pyrdfa3')
[a rdfg:Graph;rdflib:storage [a rdflib:Store;rdfs:label 'IOMemory']].



Hope this helps, at least I am quite happy to have found a library to check my RDFa!