Monthly Archives: February 2014

MPGraph – new performance enhancements

It has been gratifying to see the level of interest in MPGraph judging on downloads alone since our v2 release. The v3 release should be out next month and will include 5x – 10x performance gains in algorithms that have large frontiers (Connected Components, Page Rank, etc). This is done using a different kernel that solves a thread scheduling problem and then turns the computation into an embarrassingly parallel problem. This technique has more overhead for small frontiers, but outperforms the existing kernels when the frontier becomes large.

Our short term goals beyond is to increase the data density on the GPU. This will stretch the resources of a single GPU, providing support for graphs with up to 1 billion edges. Increased data density will also work in our favor as we move into multi-GPU support.

We will be talking about MPGraph at CSHALS in Boston next week (2/26/2014) and at GPUTECH in San Jose on March 24th (10am, in the DARPA/XDATA track).

We will also be talking about the new HA replication cluster at CSHALS.

facebooktwittergoogle_pluslinkedin

New trac, wiki, and SVN links

We are following a mandatory migration to the SourceForge Allura platform. As part of this migration, the links for the issue tracker, media wiki, and SVN will all change. SourceForge will continue to host the bigdata (and MPGraph) projects, but starting today we will be hosting the issue tracker and media wiki ourselves. This decision was made to preserve the existing features and data in the trac and media wiki instances.

The new URLs are:

facebooktwittergoogle_pluslinkedin

MPGraph + Bigdata

We have received several questions about the roadmap for MPGraph and bigdata. We are working on a simple integration pattern now. This integration will be based on the export of a topology projection onto the GPU and will support ultra fast graph mining as a SPARQL SERVICE against the exported topology. We are also developing a similar SERVICE for graphing mining on typed graphs over the database indices. Eventually these will converge.

The road map for MPGraph includes topology compression to host large graphs on the GPU (up to 2B edges on a single K40 GPU), column-wise compression to capture schema flexible vertex and link property sets, and a multi-GPU implementation that will target GPU clusters (including EC2).

Schema flexible property and link sets and will allow us to capture the RDF data model on the GPU. At that point we can also implement the world’s fastest SPARQL end point at 3 billion edges per second. The GPU is the world’s fastest single node graph processing platform. This is much faster than the Cray XMT or Urika appliance.

In the scale out architecture, we plan to support GPU accelerated SPARQL and GPU accelerated graph mining using a dedicated 2D decomposition of the graph (vertex cuts). The 2D index decomposition will exist in along side the existing (1D) scale-out indices for SPARQL. This is because you need different index partitioning strategies for high performance on selective SPARQL queries (1D partitioning) and graph mining operations (2D partitioning). The additional index will provide a 2D partitioning of the edges of the graph.

So, look for an initial integration between MPGraph and bigdata in Q1. We will continue to do MPGraph releases with new capabilities on a regular basis. The next MPGraph release will include a very substantial speedup for algorithms with large frontiers, such as page rank and connected components.

facebooktwittergoogle_pluslinkedin

MPGraph v2 – release

Download MPGraph v2 from SourceForge now. Or you can get the latest development version from SVN:

svn checkout svn://svn.code.sf.net/p/mpgraph/code/trunk

The new MPGraph API makes it easy to develop high performance graph analytics on GPUs. The API is based on the Gather-Apply-Scatter (GAS) model as used in GraphLab. To deliver high performance computation and efficiently utilize the high memory bandwidth of GPUs, MPGraph’s CUDA kernels use multiple sophisticated strategies, such as vertex-degree-dependent dynamic parallelism granularity and frontier compaction.

New algorithms can be implemented in a few hours that fully exploit the data-level parallelism of the GPU and offer throughput of up to 3 billion traversed edges per second on a single GPU.

Future releases will include topology compression, compressed vertex and link attributes, partitioned graphs, and Multi-GPU support.

Enjoy!

facebooktwittergoogle_pluslinkedin