Category Archives: Blazegraph

Blazegraph 1.5.2 Release!

We’re very pleased to announce the release of Blazegraph 1.5.2 is now available for download. This is a significant release of Blazegraph featuring a large number of performance improvements and new features. The full set of features is available on JIRA. Highlights are below:

SystapSiteBanner_1600x362_150px_wideHave you seen our new website? Did you know you can now purchase commercial licensing and support for Blazegraph online?

smartdata2015

Will you be in San Jose at the SmartData or the NoSQLNow conference? We’re a Gold sponsor and will be speaking and exhibiting. Come see us and get a discounted registration.

Do you have a great success story with Blazegraph? Want to find out more? Check us out below or contact us.  We’d love to hear from you.

facebooktwittergoogle_pluslinkedin

In SPARQL, Order Matters

One common mistake made when writing SPARQL queries is the implicit assumption that, within SPARQL join groups, the order in which components of the group are specified does not matter. The point here is that, by definition, SPARQL follows a sequential semantics when evaluating join groups and, although in many cases the order of evaluating patterns does not change the outcome of the query , there are some situations where the order is crucial to get the expected query result.

In Blazegraph 1.5.2 we’ve fixed some internal issues regarding the evaluation order within join groups, making Blazegraph’s join group evaluation strategy compliant with the official SPARQL W3C semantics. As a consequence, for some query patterns, the behavior of Blazegraph changed and if you’re upgrading your Blazegraph installation it might make sense to review your queries for such patterns, in order to avoid regressions.

A Simple Example

To illustrate why order in SPARQL indeed matters, assume we want to find all person URIs and, if present, associated images.  Instantiating this scenario, let us consider query

### Q1a:
SELECT * WHERE {
  ?person rdf:type <http://example.com/Person>
  OPTIONAL { ?person <http://example.com/image> ?image }
}

over a database containing the following triples:

<http://example.com/Alice> rdf:type                    <http://example.com/Person> .
<http://example.com/Alice> <http://example.com/image>  "Alice.jpg" .
<http://example.com/Bob>   rdf:type                    <http://example.com/Person> .

The result of this query is quite obvious and in line with our expectations: We get one result row for Alice, including the existing image, plus one result row for Bob, without an image:

?person                    | ?image
---------------------------|------------
<http://example.com/Alice> | "Alice.jpg"
<http://example.com/Bob>   |

Now let’s see what happens when swapping the order of the triple pattern and the OPTIONAL block in the query:

### Q1b:
SELECT * WHERE {
  OPTIONAL { ?person <http://example.com/image> ?image } .
  ?person rdf:type <http://example.com/Person>
}

The result of Q1b may come at a surprise:

?person                    | ?image
---------------------------|------------
<http://example.com/Alice> | "Alice.jpg"

Where’s Bob gone?

As mentioned before, SPARQL evaluates the query sequentially. This is, in the first step it evaluates the OPTIONAL { ?person :image ?image } block.  As a result of this first step, we obtain:

?person                    | ?image
---------------------------|------------
<http://example.com/Alice> | "Alice.jpg"

No Bob in sight.

In a second step, this partial result joined with the (non-optional) triple pattern ?person rdf:type <http://example.com/Person>. Intuitively speaking, the join with the triple patterns now serves as an assertion, retaining those result rows for which the value of variable ?person can be shown to be of rdf:type <http://example.com/Person>. Obviously, this assertion holds for our (single) result, the URI of Alice, so the previous intermediate result is left unmodified.

So what we get in this case is the “set of all persons that have an image associated, including the respective images”. But is this informal description really capturing what Q2b does? Consider the evaluation of Q2b over a database where none of the persons has an image associated, say:

<http://example.com/Alice> rdf:type <http://example.com/Person> .
<http://example.com/Bob>   rdf:type <http://example.com/Person> .

Here’s the result:

?person                    | ?image
---------------------------|------------------------------------
<http://example.com/Alice> | 
<http://example.com/Bob>   | 

Surprised again?

Well, here’s the explanation: SPARQL evaluation starts out over the so-called “empty mapping” (also called “universal mapping”): the OPTIONAL block is not able to retrieve any results, and given the semantics of OPTIONAL, this step simply retains the empty mapping. This empty mapping is then joined with the non-optional triple pattern, giving us the two bindings for  ?person as result, with the ?image being unbound in both.

Taking our observations together, the semantics of Q1b could be phrased as: “If there is at least one triple with predicate <http://example.com/image> in our database, return all persons that have an image and including the image, otherwise return all persons.”

That’s quite different from our intended search request, isn’t it?

Practical Implications

The example above illustrates an interesting edge case, which (given that the semantics sketched informally above is somewhat odd) does not frequently show up in practice though. For join groups composed out of (non-optional) statement patterns only, for instance, the order in which you denote the triple patterns does not matter at all. But for operators OPTIONAL and MINUS we may run into such ordering problems.

Interestingly, even for patterns involving OPTIONAL (and MINUS), in many cases there are different orders that lead to the same result (independently on the data in the underlying database). Let’s look at the following variant of our initial query Q1a which, in addition, extracts the person label (and assume we have labels in our sample database for the persons, as well):

### Q2a
SELECT * WHERE {
  ?person rdf:type <http://example.com/Person> .
  ?person rdfs:label ?label .
  OPTIONAL { ?person <http://example.com/image> ?image } .
}

Following the same arguments as in our example before, it is not allowed to move the OPTIONAL to the beginning, but the following query

### Q2b
SELECT * WHERE {
  ?person rdf:type <http://example.com/Person> .
  OPTIONAL { ?person <http://example.com/image> ?image } .
  ?person rdfs:label ?label .
}

can be shown to be semantically equivalent. Without going into details, the key difference here towards our example from the beginning is that variable ?person has definitely been bound (in both Q2a and Q2b) when evaluating the OPTIONAL; this rules out strange effects as discussed in the beginning .

If you’re confused now about where to place your OPTIONAL’s, here are some rules of thumb:

  1. Be aware that order matters
  2. First specify your non-optional parts of the query (unless there’s some good to not do so)
  3. Then specify your OPTIONAL clauses to include optional patterns of your graph

For short, unless you want to encode somewhat odd patterns in SPARQL: it usually makes sense to put OPTIONAL patterns in the end of your join groups. If you’re still unsure about some of your queries, just have a look at the semantics section in the official W3C specs.

Optimizations within Blazegraph 1.5.2

While the official SPARQL semantics uses a sequential approach, one key optimization approach of the Blazegraph query optimizer is to identify a semantics-preserving reordering of join groups in the query that minimizes the amount of work spent in evaluating the query. Thereby, amongst other heuristics (such as cardinality estimations for the individual triple patterns), optimizers typically try to evaluate non-optional patterns first (this holds particularly in the presence of more complex OPTIONAL expressions).

As a consequence, queries like those sketched above challenge the Blazegraph query optimizer: reordering must be done carefully and based on sound theoretical foundations, in order to retain the semantics of the original query within the rewriting process. In Blazegraph 1.5.2, we’ve reworked Blazegraph’s internal join group reordering strategy.

For instance, Blazegraph 1.5.2 is able to detect the equivalence transformations (as sketched in Q2a/Q2b) when evaluating queries and uses this knowledge to find more efficient query execution plans: if you run Q2b query above in Blazegraph and turn the EXPLAIN mode on, you’ll see that the OPTIONAL clause is moved to the end by the optimizer in the optimized abstract syntax tree.

We’d love to hear from you.

Do you have a cool new application using Blazegraph or are interested in understanding how to make Blazegraph work best for your application?   Get in touch or send us an email at blazegraph at blazegraph.com.

facebooktwittergoogle_pluslinkedin

Blazegraph 1.5.2 to Support Hybrid Search using External Solr Indices

While graph databases are a perfect fit for storing and querying structured data, they are not primarily designed to deal efficiently with unstructured data and keyword queries. Therefore, such unstructured data is often kept in dedicated systems that are laid out to tackle the specific challenges for evaluating keyword queries in an efficient way — including advanced techniques such as stemming, TF-IDF indexing, support for complex keyword search requests, scoring, etc.

Graph databases, on the other hand, are about connecting things, so in many scenarios we want to combine the capabilities of structured queries with those of queries against a fulltext index. To give just one simple example, assume we have a structured graph database with data about historical characters and a complementary keyword index over a corpus of historical texts (which may or may not be under our control). Assume we now want to combine structured queries  — asking, e.g., for persons that fall into certain categories such as epochs or countries they lived in  — with historical texts from the index that prominently feature these persons.

blazegraph_by_systap_faviconThe upcoming Blazegraph 1.5.2 release will support such hybrid queries against external Solr fulltext search indices. The fulltext search feature has been implemented as a Blazegraph custom service: using a standard-compliant SPARQL SERVICE call with a reserved service URI, you can now easily combine structured search capabilities over the graph database with information held in an external Solr index.

researchspacelogo_blackonwhite

Blazegraph’s hybrid search capabilities are currently used by the British Museum in the ResearchSpace project, which aims at building a collaborative environment for humanities and cultural heritage research using knowledge representation and Semantic Web technologies.  In this context, Blazegraph’s hybrid search feature supports users in expressing complex search requests for cultural heritage objects. Hybrid SPARQL queries utilizing a Solr index are used to support a semantic autocompletion: As the user types a keyword, hybrid queries are issued in real-time to match keywords against entities in a cultural heritage knowledge graph. Depending on the current context of the search, persons, objects or places are suggested, providing a user friendly means to disambiguate terms as the user types.   If you’re going to be in San Jose for the Smart Data conference, we’re giving a tutorial on the approach.

To illustrate the new hybrid search feature by example, a single SPARQL query like

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax.ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX ex: <http://www.example.com/>
PREFIX fts: <http://www.bigdata.com/rdf/fts#>
SELECT ?person ?kwDoc ?snippet WHERE {
        ?person rdf:type ex:Artist .
        ?person ex:associatedWith ex:TheBlueRider .
        ?person ex:bornIn ex:Germany .
        ?person rdfs:label ?label .
  SERVICE <http://www.bigdata.com/rdf/fts#search> {
        ?kwDoc fts:search ?label .
        ?kwDoc fts:endpoint "http://my.solr.server/solr/select" .
        ?kwDoc fts:params "fl=id,score,snippet" .
        ?kwDoc fts:scoreField "score" .
        ?kwDoc fts:score ?score .
        ?kwDoc fts:snippetField "snippet" .
        ?kwDoc fts:snippet ?snippet .
  }
} ORDER BY ?person ?score

would

  • first extract all persons associated with the group “The Blue Rider” that were born in Germany, then
  • take the label of these persons as search string and send a request against a Solr server, in order to extract a ranked list of articles for the respective persons (including text snippets where these persons are mentioned), next
  • order the results by person and relevance as requested by the ORDER BY, and finally
  • return the identified person URIs (variable ?person, from the graph database), the ID of the keyword index document (variable ?kwDoc, from the fulltext index), and the associated text snippet provided by the keyword index (variable ?snippet).

As the example illustrates, parameterization of the keyword index is made via a reserved, “magic vocabulary”: for instance, within the SERVICE keyword, the object linked through fts:search identifies the search string to be submitted against the keyword index, while fts:endpoint points refers to the address of the Solr server.

Of course, the hybrid search feature is not domain dependent: no matter what data has been loaded into your database and no matter what the keyword index looks like, you can now post hybrid search queries against your data and the external index. The implementation even allows you to query multiple keyword indices within one query and, by the use of SPARQL 1.1 federation, combine this with requests against multiple SPARQL endpoints at a time. The search string can be dynamically extracted from the database (as in the example above, where we bind variable ?label through a structured query and use it as a search string) or can be  a static search string. Even more, nothing prevents you from using more complex Solr keyword search strings using boolean connectives such as AND, OR, or negation: in SPARQL, these complex search strings can be easily composed by the use of BIND in combination with string concatenation. For instance, we may modify the first part of our example as

...
        ?person ex:associatedWith ex:TheBlueRider .
        ?person ex:bornIn ex:Germany .
        ?person rdfs:label ?label .
        BIND(CONCAT("\"", ?label, "\" AND -\"expressionism\"") AS ?search)
        SERVICE <http://www.bigdata.com/rdf/fts#search> {
                ?kwDoc fts:search ?search .
                ...
        }
...

in order to search for keyword index documents mentioning these persons without explicitly mentioning “expressionism” (the “-” in Solr is used to express negation).

If you want to learn more about Blazegraph’s upcoming Solr index support, please check out the documentation in our Wiki.

We’d love to hear from you.

Do you have a cool new application using Blazegraph or are interested in understanding how to make Blazegraph work best for your application?    Get in touch or send us an email at blazegraph at blazegraph.com.

facebooktwittergoogle_pluslinkedin

Blazegraph 1.5.2 Preview and Bloor Group Briefing Room

Blazegraph 1.5.2 Release Preview

We’re in the final stages of the Blazegraph 1.5.2 release.    This is major release for Blazegraph and brings both exciting new features as well as important performance and query optimizations.  You can see the full set of tickets for the release here (it’s our first one since we’ve migrated to JIRA).

Check it out when the release goes final (we’ll send a mailing list), but here’s some of the new features and improvements:

  • External SOLR Search Integration:  Use the SPARQL Service keyword to search content externally stored in SOLR.
  • Substantial refactoring of the Query Plan Generator in close collaboration with our partner, Metaphacts.  This both fixes a number of bugs in the query optimizer and improves performance.
  • Fixes and improvements for embedded and remote clients including Blueprints and Rexster.
  • Online Backup for the non-HA Blazegraph Server.

Did you see Blazegraph featured in the Bloor Group Briefing room?

Blazegraph was recently featured in the Bloor Group Briefing room. Check out the video presentation below, if you missed it.

We’d love to hear from you.

Do you have a cool new application using Blazegraph or are interested in understanding how to make Blazegraph work best for your application?    Get in touch or send us an email at blazegraph at blazegraph.com.

facebooktwittergoogle_pluslinkedin

What pairs well with Veuve Clicquot and Big Data for Graphs?

Have you ever wondered what to pair with Veuve Clicquot and your Big Data graph challenge? The 2015 Big Data Innovations Summit has the answer:  Blazegraph and Mapgraph.   We won the Big Data Startup Award at the 2015 Big Data Innovations Summit in San Jose.

SYSTAP wins Big Data Startup award at the 2015 Big Data Innovations conference.

SYSTAP wins Big Data Startup award at the 2015 Big Data Innovations conference.

Our solutions for scalable graph technologies were recognized with the award.  Naturally, the champagne pairs perfectly with our Blazegraph database platform and Mapgraph technology for GPU-accelerated graph analytics.  Don’t forget our Blazegraph SWAG (must be present to appreciate…) because In graphs, size matters.

SYSTAP's Brad Bebee receives the Big Data Innovation 2015 award for Big Data Startup.

SYSTAP’s Brad Bebee receives the Big Data Innovation 2015 award for Big Data Startup

Blazegraph™ is our ultra high-performance graph database supporting Blueprints and RDF/SPARQL APIs. It supports up to 50 Billion edges on a single machine and has a High Availability and Scale-out architecture. It is in production use for Fortune 500 customers such as EMC, Autodesk, and many others.  The Wikimedia Foundation recently chose Blazegraph to power the Wikidata Query Service.  Mapgraph™ is our disruptive new technology to use GPUs to accelerate data-parallel graph analytics up to 10,000X faster than other approaches. It can traverse billions of edges in milliseconds.

Whether you need an embedded graph database, a 1 Trillion Edge Graph Database, or the ability to traverse billions of edges in milliseconds, SYSTAP’s award-winning graph solutions can meet your needs.  We’d love to hear about your success with our technology or talk to you about how we can help scale your graph challenge.  Via Twitter:  @Blazegraph or via email.

 

facebooktwittergoogle_pluslinkedin

Blazegraph™ Selected by Wikimedia Foundation to Power the Wikidata Query Service

wikidata_logo_200pxBlazegraph™ has been selected by the Wikimedia Foundation to be the graph database platform for the Wikidata Query Service. Read the Wikidata announcement here.  Blazegraph™ was chosen over Titan, Neo4j, Graph-X, and others by Wikimedia in their evaluation.  There’s a spreadsheet link in the selection message, which has quite an interesting comparison of graph database platforms.

Wikidata acts as central storage for the structured data of its Wikimedia sister projects including Wikipedia, Wikivoyage, Wikisource, and others.  The Wikidata Query Service is a new capability being developed to allow users to be able to query and curate the knowledge base contained in Wikidata.

We’re super-psyched to be working with Wikidata and think it will be a great thing for Wikidata and Blazegraph™.

facebooktwittergoogle_pluslinkedin

Mapgraph™ GPU Acceleration for Blazegraph™: Launch Preview

We’re going to be formally launching our GPU-based graph analytics acceleration products, Mapgraph™ Accelerator and Mapgraph™ HPC, at the NVIDIA GTC conference in San Jose the week of 16 March.   We will also be competing as one of 12 finalists for NVIDIA’s early stage competition for a $100,000 prize.  If you’re in the area, come to GTC on Wednesday, March 18 and vote for us!

MapGraph Logo_200px

Mapgraph™ Accelerator (Beta) serves as a single-GPU graph accelerator for Blazegraph™.  We believe it will provide  the world’s first and best platform for building graph applications with GPU-acceleration.   It will bridge the gap between our Blazegraph™ database platform and the GPU acceleration for graph analytics. Users of the Blazegraph™ platform will be able to leverage GPU-accelerated graph analytics via a Java Native Interface (JNI) and via predicates in SPARQL query similarly to our current RDF GAS API, which provides Breadth-First Search (BFS), Single Source Shortest Path (SSSP), Connected Components (CC), and PageRank (PR) implementations.

Mapgraph™ HPC is a new and disruptive technology for organizations that need to process very large graphs in near-real time. It uses GPU clusters to deliver High Performance Computing (HPC) for your organization’s biggest and most time critical graph challenges.

  • Up to 10,000X Faster for graph analytics than Hadoop technologies
  • 10X Price-Performance advantage over supercomputer solutions
  • Familiar Vertex-Centric Graph Programming Model
  • Demonstrated performance of 32 Billion Traversed Edges Per Second (GTEPS) using 64 NVIDIA K40s on Scale-free Graphs

We are currently enrolling Beta customers for Mapgraph™ Accelerator and Mapgraph™ HPC. Chesapeake Technologies International has already accelerated a military planning application seeing computation times drop from minutes for a single-solution to seconds for the generation of multiple scenarios.  We’re doing a session on it at GTC. Contact us if you’re interested finding out more.

Mapgraph Beta Customer Request

* indicates required



facebooktwittergoogle_pluslinkedin