Monthly Archives: June 2014

Yahoo7 using bigdata

One of the major successes that people point to for the semantic web is the semantic publishing platform at the BBC.  We are pleased to announce that Yahoo7 has rolled out a semantic publishing platform based on bigdata. Read more about the Yahoo7 experience and how they have doubled their users time on site using semantic publishing and bigdata.

http://www.itnews.com.au/News/388296,yahoo7-swaps-sql-datastore-for-graph.aspx

This is just one more in a list of major semantic web success stories built around the bigdata platform:

  • EMC – data and host mangement solutions in data centers around the world (slides from SEMTECH 2012, NYC)
  • Autodesk –  graph management for the Autodesk cloud ecosystem (SEMTECH 2013, SF)
  • Yahoo7 – semantic publishing (today)

Contact us if you want to be the next success.

 

facebooktwittergoogle_pluslinkedin

Formalized model for Reification done Right: RDF* and SPARQL* semantics.

Olaf Hartig has developed a formal model of the “Reification Done Right” concepts [1].  The model formalizes an extension to both RDF (known as RDF*) and SPARQL (known as SPARQL*).  These extensions define a backwards compatible relationship between the RDF data model and the SPARQL query language and an alternative perspective on RDF Reification. The RDF* and SPARQL* models are introduced and formally described in  Foundations of an Alternative Approach to Reification in RDF.

The key contributions of this paper are:

  • Formal extensions of the RDF data model and the SPARQL algebra that reconciles RDF Reification with statement-level metadata;
  • An extended syntax for TURTLE that permits easy interchange of statements about statements.
  • An extended syntax for SPARQL that make it easy to express queries and data for statements about statements.
  • Rewrite rules that may be used to translate RDF* into RDF and SPARQL* into SPARQL.

RDF* and SPARQL* allow statements to appear as Subjects or Objects in other statements.  Statements about these “inline” statements can be interpreted as if they were statements about statements.  The paper shows that this is equivalent to statements about reified RDF statement models. For example, the following statements declare a name for some resource “:bob”, an age for :bob, and provide assertions about how and where that age was obtained:

:bob foaf:name "Bob" .
<<:bob foaf:age 23>> dct:creator <http://example.com/crawlers#c1>
                     dct:source <http://example.net/homepage-listing.html> .

and then queried using:

SELECT ?age ?src WHERE {
   ?bob foaf:name "Bob" .
   <<?bob foaf:age ?age>> dct:source ?src .
}

In both cases the << >> notation denotes a statement appearing as the Subject or Object of another statement.  Further, statements may become bound to variables as shown in this alternative syntax:

SELECT ?age ?src WHERE {
   ?bob foaf:name "Bob" .
   BIND( <<?bob foaf:age ?age>> AS ?t ) .
   ?t dct:source ?src .
}

The paper proves that these examples are equivalent using RDF Reification. That is RDF Reification already gives us a mechanism to represent, interchange, and query statements about statements.  However, the paper also shows that statements about statements may be modeled and queried within the database in a wide variety of different physical schemas that allow great efficiency and data density when compared to naive indexing of RDF statement models.  This gives database designers enormous freedom in how they choose to represent those statements about statements and helps to counter the impression that RDF databases are necessarily bad for problems requiring link attributes.  For example, any of the following physical schema could be used to represent these statements about statements:

  • Explicitly model the statements about statements as reified RDF statement models;
  • Associating a “statement identifier” with each statement in the database and then using it to represent statements about statements;
  • Directly embed the statement “:bob foaf:age 23” into the representation of each statement about that statement (inlining within the statement indices using variable length and recursively embedded encodings of the Subject and Object of a statement); and
  • Extending the (s,p,o) table to include additional columns, in this case dct:creator and dct:source.  This can be advantageous when some metadata predicate has a maximum cardinality of one and is used for most statements in the database (for example, this could be used to create an efficient bi-temporal database with statement-level metadata columns for the business-start-time, business-end-time, and transaction-time for each assertion).

By clarifying the formal semantics of RDF Reification and offering a simplified syntax for data interchange, query, and update, database designers and database users can now more easily and confidentially model domains that require statement level metadata.  There is a long list of such domains, including domains that model events, domains that require link attributes, sparse matrices, the property graph model, etc.

Bigdata supports RDF* and SPARQL* for the efficient interchange, query, and update of statements about statements.  Today, this is enabled to through the “SIDS” option

com.bigdata.rdf.store.AbstractTripleStore.statementIdentifiers=true

This enables the historical mechanism for efficient statements about statements in bigdata.  In the future, we plan to add support for RDF* and SPARQL* to the quads mode of the platform as well.  This will allow statement level metadata to co-exist seamlessly with the named graphs model.

Thanks,
Bryan

[1] http://arxiv.org/abs/1406.3399

facebooktwittergoogle_pluslinkedin