Monthly Archives: January 2009

zookeeper integration, bigdata services manager

We are wrapping up an integration with Zookeeper. Zookeeper is a distributed system developed by Yahoo! and now living at The Apache Foundation.

bigdata is using zookeeper to provide global synchonous locks for coordinating distributed processes, for configuration management, and for service master elections — so we are that much closer to handling service failures. Using zookeeper we were able to write a services manager which accepts a configuration file in which you declare how many services you need (data services, metadata services, transaction service, load balancer, jini, and even zookeeper itself). You can also specify constraints on which machines may run which services. You then start up a services manager instance on each machine in the cluster and it will allocate services as necessary across the cluster. You can push an updated service configuration by sending SIGHUP to a services manager.

Cluster management is a bear. We’ve developed an init.d style script to make life easier on un*x platforms so now you can bigdata [startstopstatusetc]. We used puppet (but capistrano would be another choice) to poll a master for a commend to execute on each machine every 30 seconds. Using the services manager, a configuration file, the bigdata script, it is now easy (well, easier) to get bigdata up and running on a cluster.

The Zookeeper API is very simple, and you can do some very powerful things with it, but making use of it properly can be a little complex. You need to understand that events are asynchronous, that you may miss events, and that your clients may be disconnected and need to handle reconnects smoothly.

Zookeeper events are delivered to your application via the zookeeper client in a single thread – the zookeeper event thread. In order to get good performance you need to do as little work as possible in the zookeeper event thread since zookeeper is blocked for your application while it is in your Watcher. The basic paradigm for writing watchers is to do as little as possible in the Watcher interface and notify (or signal) a companion thread which reaches out to zookeeper and does any necessary synchronous queries.