André Schiper, Babak Kalantari
In this paper we discuss the problem of synchronization in ZooKeeper, a fault-tolerant distributed coordination framework. One of the key features of ZooKeeper is to move away from blocking API such as locks, in order to avoid problems with slow or faulty clients. Instead, it provides an event like synchronization mechanism, allowing clients to be notified upon state change on the server. However, such a mechanism leads to very inecient implementation of synchronization objects such as queues or barriers. We propose a new solution to this problem. The solution is to handle a sequence of client operations completely on the server. This means that the client implements the required sequence of operations as a single request, which is sent to the server for execution via a generic API. We present a prototype that shares some of the concepts of ZooKeeper but, contrary to ZooKeeper, allows a very efficient implementation of synchronization objects. The solution requires a deterministic multi-threaded server, which we implement thanks to a coroutine mechanism. Experiments show the signicant gain in eciency of our solution on producer-consumer queues and synchronization barriers.
2012