scrappy 0.1
Scrappy is a tool that allows extracting information from web pages and producing RDF data. It uses the scraping ontology to define the mappings between HTML contents and RDF data. An example of mapping is shown next, which allows extracting all titles from http://www.elmundo.es: dc: http://purl.org/dc/elements/1.1/ rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# sioc: http://rdfs.org/sioc/ns# sc: http://lab.gsi.dit.upm.es/scraping.rdf# *: rdf:type: sc:Fragment sc:selector: *: rdf:type: sc:UriSelector rdf:value: "http://www.elmundo.es/" sc:identifier: *: rdf:type: sc:BaseUriSelector sc:subfragment: *: sc:type: sioc:Post sc:selector: *: rdf:type: sc:CssSelector rdf:value: ".noticia h2, .noticia h3, .noticia h4" sc:identifier: *: rdf:type: sc:CssSelector rdf:value: "a" sc:attribute: "href" sc:subfragment: *: sc:type: rdf:Literal sc:relation: dc:title sc:selector: *: rdf:type: sc:CssSelector rdf:value: "a" (The above code is serialized using YARF format, supported by LightRDF gem, as well as RDFXML, JSON, NTriples formats, which can also be used to define the mappings).