Category Archives: Blazegraph Release

Blazegraph 2.0.0 Release!

SystapSiteBanner_1600x362_150px_wideWe’re very pleased to announce the the release Blazegraph 2.0.0.  2.0.0 release is a Major release of Blazegraph.    There are some very exciting changes for query performance, load performance, improvement deployment options, migration to maven, moving to github, and many more.   This lays the foundation for new features with Tinkerpop3 support, GPU Acceleration, etc.      Download it, clone it, have it sent via carrier pigeon (transportation charges may apply).  Find a bug, hit JIRA.  Have a question, try the mailing list or contact-us.

Github

We love Sourceforge, but today it’s important to be on github as well.  This release will be on both Github and Sourceforge.  We’ll continue to use Sourceforge for distribution, mailing lists, etc., but will migrate towards Github for the open source releases.

git clone -b BLAZEGRAPH_RELEASE_2_0_0 --single-branch https://github.com/blazegraph/database.git BLAZEGRAPH_RELEASE_2_0_0
cd BLAZEGRAPH_RELEASE_2_0_0
./scripts/mavenInstall.sh
./scripts/startBlazegraph.sh

Query Performance

We’ve implemented Symmetric Hash Joins  and provided a number of improvements and new features for query performance and optimizations.  For those of you that followed Dr. Michael Schmidt’s excellent posts on the 1.5.2 features.   These improvement represent a continuation of this work.  Let us know how it works for you.  As always, query optimization is a place where we’d love to help you get your application tuned for the best performance.

Maven Central

Blazegraph 2.0.0 is now on Maven Central.    You can also get the Tinkerpop3 API and the new BlazegraphTPFServer.

    
    <dependency>
        <groupId>com.blazegraph</groupId>
        <artifactId>bigdata-core</artifactId>
        <version>2.0.0</version>
    </dependency>
    <!-- Use if Tinkerpop 2.5 support is needed ; See also Tinkerpop3 below. -->
    <dependency>
        <groupId>com.blazegraph</groupId>
        <artifactId>bigdata-blueprints</artifactId>
        <version>2.0.0</version>
    </dependency>

Tinkerpop3

Blazegraph Tinkerpop3

Tinkerpop3 is here!  Get it from Maven Central.

   <dependency>
      <groupId>com.blazegraph</groupId>
      <artifactId>blazegraph-gremlin</artifactId>
      <version>1.0.0</version>
   </dependency>

Blazegraph-Based TDF Server
The Blazegraph-Based TPF Server is a Linked Data Fragment (LDF) server that provides a Triple Pattern Fragment (TPF) interface using the Blazegraph graph database as backend.  It was originally developed by Olaf Hartig and is being released via Blazegraph under the Apache 2 license.   See here to get started.

Mavenization

Everybody loves (and hates) Maven.  Starting with this release, Blazegraph has been broken into a collection of maven artifacts.    This has enabled us to work on new features like TinkerPop3, which require Java 8 support while keeping the core platform at Java 7 to support users who are still there.   Checkout the Maven Notes on the wiki for full details on the architecture, getting started with development, building snapshots, etc.  If you have a 1.5.3 version checked out in Eclipse, you will want to pay attention to Getting Started with Eclipse and allocate a little extra time for the transition.

So long /bigdata, nice to have known you…

Starting with the 2.0 release, default service context is /blazegraph, i.e. http://localhost:9999/blazegraph/.  This is setup to redirect from /bigdata.    We are continuing to provide the bigdata.war and bigdata.jar archives, which by default point to /bigdata as part of the distribution, but this will be discontinued at some point in the future.  We encourage all users to migration to the /blazegraph context with this release.

If you are currently using a serviceURL like http://localhost:9999/bigdata/sparql and want to continue, you’ll need to use bigdata.war, bigdata.jar, or customize the jetty.xml for your deployer.     If not, they new default serviceURL is http://localhost:9999/blazegraph/sparql.

Deployers

2.0.0 provides  Debian Deployer, an RPM Deployer, and a Tarball.  In addition to the blazegraph.war and blazegraph.jar archives.

Blazegraph Benchmarking

We will be releasing published benchmarks for LUBM and BSBM for the 2.0.0. release.

Enterprise Features (HA and Scale-out)

Starting in release 2.0.0, the Scale-out and HA capabilities are moved to Enterprise features. These are available to uses with support and/or license subscription. If you are an existing GPLv2 user of these features, we have some easy ways to migrate. Contact us for more information. We’d like to make it as easy as possible.

Much, much, more….

There also many other features for improved data loading, building custom vocabularies, and many more that will be coming out in a series of blog posts over the next month or so.  Please check back.

Stay in touch, we’d love to hear from you.

facebooktwittergoogle_pluslinkedin

Blazegraph 1.5.3 Release!

We’re very pleased to announce the release of Blazegraph 1.5.3 is now available for download.  1.5.3 is minor release of Blazegraph with some important bug fixes.   The full set of features is available on JIRA.   We recommend upgrading to this release.  If you’re upgrading from prior to 1.5.2, be sure to checkout the updated documentation on SPARQL and Dr. Michael Schmidt‘s excellent blog posts:  [1] and [2] as well as the Wiki documentation:  Sparql Order Matters and Sparql Bottom Up Semantics.

We have some exciting things planned for our 1.6 release coming later this year including GPU acceleration for SPARQL, deployment on Maven Central, and continued improvements in query optimization and performance.  Don’t miss it.  Sign up to stay in touch with Blazegraph.

SystapSiteBanner_1600x362_150px_wideHave you seen our new website? Did you know you can now purchase commercial licensing and support for Blazegraph online?

Will you be at the International Semantic Web Conference (ISWC) this year? We’re a Gold sponsor along with our partner Metaphacts and will be speaking and exhibiting.  Come see us.

Do you have a great success story with Blazegraph? Want to find out more? Check us out below or contact us.  We’d love to hear from you.

Ready to get started?  Check out:

facebooktwittergoogle_pluslinkedin

Blazegraph 1.5.2 Release!

We’re very pleased to announce the release of Blazegraph 1.5.2 is now available for download. This is a significant release of Blazegraph featuring a large number of performance improvements and new features. The full set of features is available on JIRA. Highlights are below:

SystapSiteBanner_1600x362_150px_wideHave you seen our new website? Did you know you can now purchase commercial licensing and support for Blazegraph online?

smartdata2015

Will you be in San Jose at the SmartData or the NoSQLNow conference? We’re a Gold sponsor and will be speaking and exhibiting. Come see us and get a discounted registration.

Do you have a great success story with Blazegraph? Want to find out more? Check us out below or contact us.  We’d love to hear from you.

facebooktwittergoogle_pluslinkedin

In SPARQL, Order Matters

One common mistake made when writing SPARQL queries is the implicit assumption that, within SPARQL join groups, the order in which components of the group are specified does not matter. The point here is that, by definition, SPARQL follows a sequential semantics when evaluating join groups and, although in many cases the order of evaluating patterns does not change the outcome of the query , there are some situations where the order is crucial to get the expected query result.

In Blazegraph 1.5.2 we’ve fixed some internal issues regarding the evaluation order within join groups, making Blazegraph’s join group evaluation strategy compliant with the official SPARQL W3C semantics. As a consequence, for some query patterns, the behavior of Blazegraph changed and if you’re upgrading your Blazegraph installation it might make sense to review your queries for such patterns, in order to avoid regressions.

A Simple Example

To illustrate why order in SPARQL indeed matters, assume we want to find all person URIs and, if present, associated images.  Instantiating this scenario, let us consider query

### Q1a:
SELECT * WHERE {
  ?person rdf:type <http://example.com/Person>
  OPTIONAL { ?person <http://example.com/image> ?image }
}

over a database containing the following triples:

<http://example.com/Alice> rdf:type                    <http://example.com/Person> .
<http://example.com/Alice> <http://example.com/image>  "Alice.jpg" .
<http://example.com/Bob>   rdf:type                    <http://example.com/Person> .

The result of this query is quite obvious and in line with our expectations: We get one result row for Alice, including the existing image, plus one result row for Bob, without an image:

?person                    | ?image
---------------------------|------------
<http://example.com/Alice> | "Alice.jpg"
<http://example.com/Bob>   |

Now let’s see what happens when swapping the order of the triple pattern and the OPTIONAL block in the query:

### Q1b:
SELECT * WHERE {
  OPTIONAL { ?person <http://example.com/image> ?image } .
  ?person rdf:type <http://example.com/Person>
}

The result of Q1b may come at a surprise:

?person                    | ?image
---------------------------|------------
<http://example.com/Alice> | "Alice.jpg"

Where’s Bob gone?

As mentioned before, SPARQL evaluates the query sequentially. This is, in the first step it evaluates the OPTIONAL { ?person :image ?image } block.  As a result of this first step, we obtain:

?person                    | ?image
---------------------------|------------
<http://example.com/Alice> | "Alice.jpg"

No Bob in sight.

In a second step, this partial result joined with the (non-optional) triple pattern ?person rdf:type <http://example.com/Person>. Intuitively speaking, the join with the triple patterns now serves as an assertion, retaining those result rows for which the value of variable ?person can be shown to be of rdf:type <http://example.com/Person>. Obviously, this assertion holds for our (single) result, the URI of Alice, so the previous intermediate result is left unmodified.

So what we get in this case is the “set of all persons that have an image associated, including the respective images”. But is this informal description really capturing what Q2b does? Consider the evaluation of Q2b over a database where none of the persons has an image associated, say:

<http://example.com/Alice> rdf:type <http://example.com/Person> .
<http://example.com/Bob>   rdf:type <http://example.com/Person> .

Here’s the result:

?person                    | ?image
---------------------------|------------------------------------
<http://example.com/Alice> | 
<http://example.com/Bob>   | 

Surprised again?

Well, here’s the explanation: SPARQL evaluation starts out over the so-called “empty mapping” (also called “universal mapping”): the OPTIONAL block is not able to retrieve any results, and given the semantics of OPTIONAL, this step simply retains the empty mapping. This empty mapping is then joined with the non-optional triple pattern, giving us the two bindings for  ?person as result, with the ?image being unbound in both.

Taking our observations together, the semantics of Q1b could be phrased as: “If there is at least one triple with predicate <http://example.com/image> in our database, return all persons that have an image and including the image, otherwise return all persons.”

That’s quite different from our intended search request, isn’t it?

Practical Implications

The example above illustrates an interesting edge case, which (given that the semantics sketched informally above is somewhat odd) does not frequently show up in practice though. For join groups composed out of (non-optional) statement patterns only, for instance, the order in which you denote the triple patterns does not matter at all. But for operators OPTIONAL and MINUS we may run into such ordering problems.

Interestingly, even for patterns involving OPTIONAL (and MINUS), in many cases there are different orders that lead to the same result (independently on the data in the underlying database). Let’s look at the following variant of our initial query Q1a which, in addition, extracts the person label (and assume we have labels in our sample database for the persons, as well):

### Q2a
SELECT * WHERE {
  ?person rdf:type <http://example.com/Person> .
  ?person rdfs:label ?label .
  OPTIONAL { ?person <http://example.com/image> ?image } .
}

Following the same arguments as in our example before, it is not allowed to move the OPTIONAL to the beginning, but the following query

### Q2b
SELECT * WHERE {
  ?person rdf:type <http://example.com/Person> .
  OPTIONAL { ?person <http://example.com/image> ?image } .
  ?person rdfs:label ?label .
}

can be shown to be semantically equivalent. Without going into details, the key difference here towards our example from the beginning is that variable ?person has definitely been bound (in both Q2a and Q2b) when evaluating the OPTIONAL; this rules out strange effects as discussed in the beginning .

If you’re confused now about where to place your OPTIONAL’s, here are some rules of thumb:

  1. Be aware that order matters
  2. First specify your non-optional parts of the query (unless there’s some good to not do so)
  3. Then specify your OPTIONAL clauses to include optional patterns of your graph

For short, unless you want to encode somewhat odd patterns in SPARQL: it usually makes sense to put OPTIONAL patterns in the end of your join groups. If you’re still unsure about some of your queries, just have a look at the semantics section in the official W3C specs.

Optimizations within Blazegraph 1.5.2

While the official SPARQL semantics uses a sequential approach, one key optimization approach of the Blazegraph query optimizer is to identify a semantics-preserving reordering of join groups in the query that minimizes the amount of work spent in evaluating the query. Thereby, amongst other heuristics (such as cardinality estimations for the individual triple patterns), optimizers typically try to evaluate non-optional patterns first (this holds particularly in the presence of more complex OPTIONAL expressions).

As a consequence, queries like those sketched above challenge the Blazegraph query optimizer: reordering must be done carefully and based on sound theoretical foundations, in order to retain the semantics of the original query within the rewriting process. In Blazegraph 1.5.2, we’ve reworked Blazegraph’s internal join group reordering strategy.

For instance, Blazegraph 1.5.2 is able to detect the equivalence transformations (as sketched in Q2a/Q2b) when evaluating queries and uses this knowledge to find more efficient query execution plans: if you run Q2b query above in Blazegraph and turn the EXPLAIN mode on, you’ll see that the OPTIONAL clause is moved to the end by the optimizer in the optimized abstract syntax tree.

We’d love to hear from you.

Do you have a cool new application using Blazegraph or are interested in understanding how to make Blazegraph work best for your application?   Get in touch or send us an email at blazegraph at blazegraph.com.

facebooktwittergoogle_pluslinkedin

Blazegraph 1.5.2 to Support Hybrid Search using External Solr Indices

While graph databases are a perfect fit for storing and querying structured data, they are not primarily designed to deal efficiently with unstructured data and keyword queries. Therefore, such unstructured data is often kept in dedicated systems that are laid out to tackle the specific challenges for evaluating keyword queries in an efficient way — including advanced techniques such as stemming, TF-IDF indexing, support for complex keyword search requests, scoring, etc.

Graph databases, on the other hand, are about connecting things, so in many scenarios we want to combine the capabilities of structured queries with those of queries against a fulltext index. To give just one simple example, assume we have a structured graph database with data about historical characters and a complementary keyword index over a corpus of historical texts (which may or may not be under our control). Assume we now want to combine structured queries  — asking, e.g., for persons that fall into certain categories such as epochs or countries they lived in  — with historical texts from the index that prominently feature these persons.

blazegraph_by_systap_faviconThe upcoming Blazegraph 1.5.2 release will support such hybrid queries against external Solr fulltext search indices. The fulltext search feature has been implemented as a Blazegraph custom service: using a standard-compliant SPARQL SERVICE call with a reserved service URI, you can now easily combine structured search capabilities over the graph database with information held in an external Solr index.

researchspacelogo_blackonwhite

Blazegraph’s hybrid search capabilities are currently used by the British Museum in the ResearchSpace project, which aims at building a collaborative environment for humanities and cultural heritage research using knowledge representation and Semantic Web technologies.  In this context, Blazegraph’s hybrid search feature supports users in expressing complex search requests for cultural heritage objects. Hybrid SPARQL queries utilizing a Solr index are used to support a semantic autocompletion: As the user types a keyword, hybrid queries are issued in real-time to match keywords against entities in a cultural heritage knowledge graph. Depending on the current context of the search, persons, objects or places are suggested, providing a user friendly means to disambiguate terms as the user types.   If you’re going to be in San Jose for the Smart Data conference, we’re giving a tutorial on the approach.

To illustrate the new hybrid search feature by example, a single SPARQL query like

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax.ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX ex: <http://www.example.com/>
PREFIX fts: <http://www.bigdata.com/rdf/fts#>
SELECT ?person ?kwDoc ?snippet WHERE {
        ?person rdf:type ex:Artist .
        ?person ex:associatedWith ex:TheBlueRider .
        ?person ex:bornIn ex:Germany .
        ?person rdfs:label ?label .
  SERVICE <http://www.bigdata.com/rdf/fts#search> {
        ?kwDoc fts:search ?label .
        ?kwDoc fts:endpoint "http://my.solr.server/solr/select" .
        ?kwDoc fts:params "fl=id,score,snippet" .
        ?kwDoc fts:scoreField "score" .
        ?kwDoc fts:score ?score .
        ?kwDoc fts:snippetField "snippet" .
        ?kwDoc fts:snippet ?snippet .
  }
} ORDER BY ?person ?score

would

  • first extract all persons associated with the group “The Blue Rider” that were born in Germany, then
  • take the label of these persons as search string and send a request against a Solr server, in order to extract a ranked list of articles for the respective persons (including text snippets where these persons are mentioned), next
  • order the results by person and relevance as requested by the ORDER BY, and finally
  • return the identified person URIs (variable ?person, from the graph database), the ID of the keyword index document (variable ?kwDoc, from the fulltext index), and the associated text snippet provided by the keyword index (variable ?snippet).

As the example illustrates, parameterization of the keyword index is made via a reserved, “magic vocabulary”: for instance, within the SERVICE keyword, the object linked through fts:search identifies the search string to be submitted against the keyword index, while fts:endpoint points refers to the address of the Solr server.

Of course, the hybrid search feature is not domain dependent: no matter what data has been loaded into your database and no matter what the keyword index looks like, you can now post hybrid search queries against your data and the external index. The implementation even allows you to query multiple keyword indices within one query and, by the use of SPARQL 1.1 federation, combine this with requests against multiple SPARQL endpoints at a time. The search string can be dynamically extracted from the database (as in the example above, where we bind variable ?label through a structured query and use it as a search string) or can be  a static search string. Even more, nothing prevents you from using more complex Solr keyword search strings using boolean connectives such as AND, OR, or negation: in SPARQL, these complex search strings can be easily composed by the use of BIND in combination with string concatenation. For instance, we may modify the first part of our example as

...
        ?person ex:associatedWith ex:TheBlueRider .
        ?person ex:bornIn ex:Germany .
        ?person rdfs:label ?label .
        BIND(CONCAT("\"", ?label, "\" AND -\"expressionism\"") AS ?search)
        SERVICE <http://www.bigdata.com/rdf/fts#search> {
                ?kwDoc fts:search ?search .
                ...
        }
...

in order to search for keyword index documents mentioning these persons without explicitly mentioning “expressionism” (the “-” in Solr is used to express negation).

If you want to learn more about Blazegraph’s upcoming Solr index support, please check out the documentation in our Wiki.

We’d love to hear from you.

Do you have a cool new application using Blazegraph or are interested in understanding how to make Blazegraph work best for your application?    Get in touch or send us an email at blazegraph at blazegraph.com.

facebooktwittergoogle_pluslinkedin

Blazegraph 1.5.2 Preview and Bloor Group Briefing Room

Blazegraph 1.5.2 Release Preview

We’re in the final stages of the Blazegraph 1.5.2 release.    This is major release for Blazegraph and brings both exciting new features as well as important performance and query optimizations.  You can see the full set of tickets for the release here (it’s our first one since we’ve migrated to JIRA).

Check it out when the release goes final (we’ll send a mailing list), but here’s some of the new features and improvements:

  • External SOLR Search Integration:  Use the SPARQL Service keyword to search content externally stored in SOLR.
  • Substantial refactoring of the Query Plan Generator in close collaboration with our partner, Metaphacts.  This both fixes a number of bugs in the query optimizer and improves performance.
  • Fixes and improvements for embedded and remote clients including Blueprints and Rexster.
  • Online Backup for the non-HA Blazegraph Server.

Did you see Blazegraph featured in the Bloor Group Briefing room?

Blazegraph was recently featured in the Bloor Group Briefing room. Check out the video presentation below, if you missed it.

We’d love to hear from you.

Do you have a cool new application using Blazegraph or are interested in understanding how to make Blazegraph work best for your application?    Get in touch or send us an email at blazegraph at blazegraph.com.

facebooktwittergoogle_pluslinkedin