Monthly Archives: January 2014

MPGraph v2 – pre-release and documentation available

We are in pre-release for MPGraphv2. You can obtain a pre-release by checking out MPGraph from the SVN trunk:

svn checkout svn:// mpgraph-code

The new MPGraph API makes it easy to develop high performance graph analytics on GPUs. The API is based on the Gather-Apply-Scatter (GAS) model as used in GraphLab. To deliver high performance computation and efficiently utilize the high memory bandwidth of GPUs, MPGraph’s CUDA kernels use multiple sophisticated strategies, such as vertex-degree-dependent dynamic parallelism granularity and frontier compaction.

New algorithms can be implemented in a few hours that fully exploit the data-level parallelism of the GPU and offer throughput of up to 3 billion traversed edges per second on a single GPU.

Future releases will include topology compression, compressed vertex and link attributes, partitioned graphs, and Multi-GPU support.


Runtime Query Optimizer

The Runtime Query Optimizer (RTO) provides robust, deep sampling of queries and can detect hidden correlations between your queries and the data. For difficult join optimization problems and high latency queries, the RTO can often deliver query plans that are substantially faster than the standard query optimizer.

The basic problem with join order optimization is to find an ordering of the JOINs in the query that does the least work. The standard query optimizer does this by estimating number the of triples in the database (aka the “cardinality”) for each triple pattern in the query and then analyzing the manner in which variables are shared by triple patterns and filters. For selective, low-latency queries, the standard query optimizer does a good job and does it very quickly. This is called “static analysis” because it is done before the query is execution.

The problem with static analysis is that the error in the estimation of the cardinality of joins can be exponential as a function of the number of joins in a query. If your query needs to read a lot of data, has a lot of joins, and uses joins that have a significant cardinality (that is, the joins produce a lot of solutions), then it becomes critical to have the right join ordering. This is the job of the RTO.

The RTO creates one (or more) join graphs from your query. The vertices in the join graph are the triple patterns in your query. The shared variables appearing in those triple patterns and in FILTERs define the possible joins. The job of the RTO is to find a join path (a specific sequence of joins) that visits all vertices in the join graph with the minimum total cost. This is the best join order.

The RTO uses statistical sampling to directly estimate the actual cardinality of the joins. It begins with the most selective vertex in the join graph — the triple pattern with the smallest cardinality. It then expands from that vertex by one step along each possible edge. Each edge represents a join. Rather than fully executing these joins, the RTO performs cutoff evaluation. It pushes in a sample of source solutions into the join and counts the number of solutions that are produced by the join. This gives it a direct measurement of the actual “hit ratio” of the join (for the input sample) and can be used to estimate the cardinality of that join when executed in a specific join order. If you use the Explain mode, you can see these estimates in a neatly formatted table.

This process of sampling cutoff joins continues until the RTO identifies join paths (aka join orders) that dominate (are strictly better than) the alternatives for the same set of vertices. It then drops the alternatives and continues to extend the join paths until all joins have been ordered. The result is a highly optimized join ordering for your query.

Visit the Query Optimization page on the bigdata wiki for more information on how to understand and optimize your queries and how to get started with the RTO. The RTO is available in the 1.3.x development and maintenance branch. The RTO is alpha code and must be enabled by a query hint. However, it already handles many interesting queries and can provide valuable insight into join orderings that will make your heavy queries much lighter and faster.