A pinger for the semantic web: the export feature

ping.semanticweb.org is a repository indexing updated SIOC, DOAP and FOAF files present on the web. You may notify the service that you updated one of those documents on your web server by pinging it via a XML-RPC or REST interface. This is common part of all pingers.

The other common part of pingers is their exposure of the URL which have pinged the service. ping.semanticweb.org provides an interface to that data too at /export. The export is implemented by providing the results of a query as a XML file with two tags: pingthesemanticwebUpdate and rdfdocument (both with some attributes). This is a very efficient way to expose the URL but most of the other consumers (like slug) require an other format: a [scutter vocabulary][3] based.

So how may a export of the URL having pinged in the last 2 hours look like?

<rdf:RDF xmlns="http://purl.org/net/scutter/"
 xmlns:dc="http://purl.org/dc/elements/1.1/"
 xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
 xmlns:scutter="http://purl.org/net/scutter/">

 <rdf:Description>
  <rdf:type rdf:resource="http://purl.org/net/scutter/Representation"/>
  <localCopy>cache/b4mad.net/datenbrei/index.php_sioc_type=post_sioc_id=300</localCopy>
  <source rdf:resource="http://b4mad.net/datenbrei/index.php?sioc_type=post&#38;sioc_id=300"/>
  <origin rdf:resource="http://ping.semanticweb.org/ping/http://b4mad.net/datenbrei/index.php?sioc_type=site"/>
  <fetch rdf:parseType="Resource">
   <rdf:type rdf:resource="http://purl.org/net/scutter/Fetch"/>
   <dc:date rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2006-08-06T14:35:11+0200</dc:date>
   <contentType>application/rdf+xml</contentType>
   <rawTripleCount>68</rawTripleCount>
   <status>200</status>
  </fetch>
  <fetch rdf:parseType="Resource">
   <rdf:type rdf:resource="http://purl.org/net/scutter/Fetch"/>
   <dc:date rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2006-08-05T19:23:16+0200</dc:date>
   <contentType>application/rdf+xml</contentType>
   <rawTripleCount>68</rawTripleCount>
   <status>200</status>
  </fetch>
 </rdf:Description>
</rdf:RDF>

And what does it all mean? The export above says:

http://b4mad.net/datenbrei/index.php?sioc_type=post&#38;sioc_id=300 has been fetched two times: on 2006-08-06 and on 2006-08-05 always giving 68 statements and HTTP return code 200. A local copy has been stored to cache/b4mad.net/datenbrei/index.php_sioc_type=post_sioc_id=300. The crawler knew about the ping because of its origin.

Another valuable extension would be a RSS feed having all fetched items in it.

[3]: http://rdfweb.org/topic/ScutterVocab