Monthly Archives: August 2008

bigdata officially registered

Well the catchy little name for our open-source project finally belongs to us in the eyes of the US government. The PTO accepted our Statement of Use and the bigdata mark is now registered on the Principal Register! Very exciting.

Bryan and I (mostly Bryan) are working furiously towards another release. The June 30th alpha release included a stable bigdata core capable of running distributed by manually locating different indices on different machines in a cluster. For example, our RDF database application consists of around six indices to represent the lexicon and support the various access paths necessary for RDF (spo, osp, pos). We successfully split these indices onto two machines and ran load tests that exhibited better than linear scale-out in terms of throughput, due to increased parallelization. Quite exciting! However this was really just an intermediate step.

The real test will be dynamic scale-out, where indices do not need to be manually positioned on different machines. What we are working towards now is full dynamic partioning of indices. Just add a machine to a cluster and have it automatically start managing index segments for your application. The first application of this technology will obviously be our RDF database, but any application requiring ordered data can benefit. Bigdata can pick up where Hadoop leaves off.