Bigdata 1.0 Release

This is a bigdata (R) release. This release is capable of loading 1B triples in
under one hour on a 15 node cluster. JDK 1.6 is required.

Bigdata(R) is a horizontally scaled open source architecture for indexed data
with an emphasis on semantic web data architectures. Bigdata operates in both
a single machine mode (Journal) and a cluster mode (Federation). The Journal
provides fast scalable ACID indexed storage for very large data sets. The
federation provides fast scalable shard-wise parallel indexed storage using
dynamic sharding and shard-wise ACID updates. Both platforms support fully
concurrent readers with snapshot isolation.

Distributed processing offers greater throughput but does not reduce query or
update latency. Choose the Journal when the anticipated scale and throughput
requirements permit. Choose the Federation when the administrative and machine
overhead associated with operating a cluster is an acceptable tradeoff to have
essentially unlimited data scaling and throughput.

See [1,2,8] for instructions on installing bigdata(R), [4] for the javadoc, and
[3,5,6] for news, questions, and the latest developments. For more information
about SYSTAP, LLC and bigdata, see [7].

Starting with this release, we offer a WAR artifact [8] for easy installation of
the Journal mode database. For custom development and cluster installations we
recommend checking out the code from SVN using the tag for this release. The
code will build automatically under eclipse. You can also build the code using
the ant script. The cluster installer requires the use of the ant script. You
can checkout this release from the following URL:

New features:

– Single machine data storage to ~50B triples/quads (RWStore);
– Simple embedded and/or webapp deployment (NanoSparqlServer);
– 100% native SPARQL 1.0 evaluation with lots of query optimizations;

Feature summary:

– Triples, quads, or triples with provenance (SIDs);
– Fast RDFS+ inference and truth maintenance;
– Clustered data storage is essentially unlimited;
– Fast statement level provenance mode (SIDs).

The road map [3] for the next releases includes:

– High-volume analytic query and SPARQL 1.1 query, including aggregations;
– Simplified deployment, configuration, and administration for clusters; and
– High availability for the journal and the cluster.

For more information, please see the following links:


About bigdata:



2 thoughts on “Bigdata 1.0 Release

  1. William Sanchez

    This is great, We will install it and test it soon. I have a couple of questions about journal and federation. What is the requirements for hardware and software to load 50B in one server. The second question, what is the node configuration requirements. You mentioned 15 nodes cluster to load 1Bt, what was the node configuration.

  2. Bryan Thompson Post author


    The cluster configuration was 15 32G nodes with 8 cores and 512G of local disk, 1G Ethernet, 64 bit OS and 64 bit JVM. We’ve done things like this on Rackspace as well as private clouds. Performance for the cluster (and for analytic query workloads) should improve sharply after our next milestone release which will feature the memory manager to handle very large data sets in main memory without causing GC problems for the JVM.

    The main requirements to load 50B triples on a single node are plenty of disk space and time. A cluster is significantly faster at that data scale. I would recommend 32+G of RAM and SSD by preference. The indices will be pretty deep by the time you approach that limit. The 50B limit is based on the addressing scheme of the RWStore.

Leave a Reply

Your email address will not be published. Required fields are marked *