Monthly Archives: April 2015

What pairs well with Veuve Clicquot and Big Data for Graphs?

Have you ever wondered what to pair with Veuve Clicquot and your Big Data graph challenge? The 2015 Big Data Innovations Summit has the answer:  Blazegraph and Mapgraph.   We won the Big Data Startup Award at the 2015 Big Data Innovations Summit in San Jose.

SYSTAP wins Big Data Startup award at the 2015 Big Data Innovations conference.

SYSTAP wins Big Data Startup award at the 2015 Big Data Innovations conference.

Our solutions for scalable graph technologies were recognized with the award.  Naturally, the champagne pairs perfectly with our Blazegraph database platform and Mapgraph technology for GPU-accelerated graph analytics.  Don’t forget our Blazegraph SWAG (must be present to appreciate…) because In graphs, size matters.

SYSTAP's Brad Bebee receives the Big Data Innovation 2015 award for Big Data Startup.

SYSTAP’s Brad Bebee receives the Big Data Innovation 2015 award for Big Data Startup

Blazegraph™ is our ultra high-performance graph database supporting Blueprints and RDF/SPARQL APIs. It supports up to 50 Billion edges on a single machine and has a High Availability and Scale-out architecture. It is in production use for Fortune 500 customers such as EMC, Autodesk, and many others.  The Wikimedia Foundation recently chose Blazegraph to power the Wikidata Query Service.  Mapgraph™ is our disruptive new technology to use GPUs to accelerate data-parallel graph analytics up to 10,000X faster than other approaches. It can traverse billions of edges in milliseconds.

Whether you need an embedded graph database, a 1 Trillion Edge Graph Database, or the ability to traverse billions of edges in milliseconds, SYSTAP’s award-winning graph solutions can meet your needs.  We’d love to hear about your success with our technology or talk to you about how we can help scale your graph challenge.  Via Twitter:  @Blazegraph or via email.



CPU, Disk, Main Memory for Graphs

Bryan and I were chatting back on the train from NYC on CPU memory bandwidth. Here’s a quick write-up of the discussion.

15 years ago database researchers recognized that CPU memory bandwidth was the limiting factor for relational database performance. This observation was made in the context of relatively wide tables in RDBMS platforms that were heavily oriented to key range scan on a primary key. The resulting architectures are similar to the structure of array (SOA) pattern used by the high performance computing community and within our Mapgraph platform.

Graphs are commonly modeled as 3-column tables. These tables intrinsically have a very narrow stride, similar to column stores for relational data. Whether organized for main memory or disk, the goal of the query planner is to generate an execution strategy that is the most efficient for a given query.

Graph stores that are organized for disk maintain multiple indices with a log structured cost (e.g., a B+Tree) that allow them to jump to the page on the disk that has the relevant tuples for any given access path. Due to the relative cost of memory and disk, main memory systems (such as SPARQLcity) sometimes choose a single index and resort to full table scans when the default index is not useful for a query or access pattern. In making such decisions, main memory databases are trading off memory for selectivity. These designs can consume fewer resources, but it can be much more efficient to have a better index for a complex query plan. Thus, a disk based database can often do as well as or better than a memory based database of the disk based system has a more appropriate index or family of indices. In fact, 90%+ of the performance of a database platform comes from the query optimizer. The difference in performance between a good query plan and a bad query plan for the same database and hardware can be easily 10x, 100x, or 10000x depending on the query. A main memory system with a bad query plan can easily be beaten by a disk -based system with a good query plan. This is why we like to work closely with our customers. For example, one of our long term customers recently upgraded from 1.2.x (under a long term support contract) to 1.5.x and obtained a 100% performance improvement without changing a single line of their code.

Main memory systems become critical when queries much touch a substantial portion of the data. This is true for most graph algorithms that are not hop constrained. For example, an unconstrained breadth first search on a scale free graph will tend to visit all vertices in the graph during the traversal. A page rank or connected components computation will tend to visit all vertices on each iteration and may required up to 50 iterations to converge page rank to a satisfactory epsilon. In such cases, CPU memory architectures will spend most of the wall clock time blocked on memory fetches due to the inherent non-local access patterns during graph traversal. Architectures such as the XMT/XMT-2 (the Urika appliance) handle this problem by using very slow cores, zero latency thread switching, a fast interconnect and hash partitioned memory allocations. The bet of the XMT architecture is that non-locality dominates so you might as well spread all data out everywhere and hide the latency by having a large number of memory transactions in flight. We take a different approach with GPUs and achieve a 10x price/performance benefit over the XMT-2 and a 3x cost savings. This savings will increase substantially when the Pascal GPU is released in Q1 2016 due to an additional 4x gain in memory bandwidth driven by the breadth of the commodity market for GPUs. We obtain this dramatic price/performance and actual performance advantage using zero overhead context switching, fast memory, 1000s of threads to get a large number of in flight memory transactions, and paying attention to locality. The XMT-2 is a beautiful architecture, but locality *always* matters at every level of the memory hierarchy.