Monthly Archives: July 2008

If the shoe fits

Many open source projects dealing with cloud computing are knockoffs of systems published on by Google (this also appears to be true of commercial businesses trying to compete in the cloud as service space). To my mind, this lack of innovation is problematic. The systems (GFS, map/reduce and bigtable) that Google developed are fairly specialized and are designed for specific problem classes.

However, I did not see much awareness among the potential users of these systems of the tradeoffs that had been considered and accepted by Google in their development. For example, map/reduce is suited to non-transactional, high-latency processing where there is good locality in the inputs while Google


Cloud Computing with bigdata (OSCON 2008)

Bryan Thompson presented on “Cloud Computing with bigdata” at OSCON 2008. The presentation gave an overview of cloud computing, the bigdata architecture and how it relates to similar systems, why RDF is an interesting technology (fluid schema and declarative schema and instance alignment make for great mashups and facilitates data reuse), some of the additional challenges that need to be addressed for scale-out on RDF (secondary indices and distributed JOINs to support high-level query), and some of the work we have done on scale-out for RDF processing, including statement level provenance and combining map/reduce processing with indexed data. See for a copy of the slides.

bigdata(R) is a scale-out storage and computing fabric supporting optional transactions, very high concurrency, and very high aggregate IO rates. For more information, see,, and