B+Tree compression and buffering

We’ve been working on big performance improvements for the B+Tree. The goal is to drastically increase the amount of data that bigdata(R) can buffer in memory so we can really take advantage of those 64-bit JVMs while making search faster, reducing the in-memory footprint and GC churn, and reducing the network size of RMI payloads.

First, we have a new binary image for the B+Tree nodes and leaves with pluggable compression schemes that support search and value retrieval directly on the coded (compressed) representation. This will reduce the footprint for data on the disk, in memory, and on the wire and should boost search performance. We will roll out with prefix-compression (front-coding) for keys and with canonical huffman encoding options for keys and values. These compression schemes all based on fastutils[1] and dsiutils[2], which are fantastic libraries.

Second, we are moving away from a per-B+Tree buffer for persistent leaves and nodes to a shared buffer using a mixture of a LRU and a hash map with weak reference values. Combined with the smaller in-memory footprint for leaves and nodes, this will allow bigdata(R) to buffer vastly more data in RAM.

Finally, we are examining options for record-level compression to reduce the on disk footprint even further. Our thinking here is to offer a pluggable scheme, with initial support for Deflate, zip, gzip and bmz[3]. We are already coding (compressing) the keys and values, but there is an additional win to be had from a fast compression pass over the entire B+Tree node or leaf. Most compression libraries are accessed via JNI and will generally do better on larger records, so we may wind up increasing the default branching factor for the index segments to compensate. You should be able to choose a record level compression provider for the Journal, another for the index segments, and have the option to specify the compression provider separately for all shards for a specific index. Different compression schemes make sense for the journal and the index segments because the journal is geared towards fast write absoption while the index segments are geared towards read-only data. Likewise, some indices may derive more benefit from specific compression schemes.

We should have some new performance results based on these changes in a few weeks.

[1] http://fastutil.dsi.unimi.it/
[2] http://dsiutils.dsi.unimi.it/
[3] http://www.hypertable.org/doxygen/dir_636d9f0bc773a215f705e2de9f182c4e.html


Leave a Reply

Your email address will not be published. Required fields are marked *