A distributed system is a set of resources which cooperate together by sending messages in order to achieve a common goal. Leslie Lamport, a famous computer scientist defines it as:
“A distributed system is one in which the failure of a computer you didn’t even know existed can render your own computer unusable.”
This sentence summarizes very well the issues that we encounter in distributed systems. In fact, in the era of cloud computing the gathering of resources become a necessity to satisfy new needs in terms of computation and storage. These systems are characterized by the massive number of resources to manage on which the number of failures are proportional.
In distributed system, failures are the rule, not the exception. We must deal with them. This observation pushed the community to create specialized tools in order to help developers life and Apache ZooKeeper is one of them :-).
ZooKeeper is a very useful, battle-tested and widely used middleware for building distributed applications. In OpenStack, ZooKeeper is one of the backend of the Nova ServiceGroup API. Recently, it has been integrated in Ceilometer to make the Central Agent highly available. We will study this case in another article ;-).
Why do you need ZooKeeper?
Generally, when you design a distributed application, you identify the processes which will cooperate together to perform the wanted task. In most cases, this cooperation relies on distributed coordination primitives.
Heat is the OpenStack Orchestration program. You can use it to create a set of cloud resources, they are specified in a template file called a stack. Heat allows to update a stack but this should be done atomically otherwise it may create conflicts like duplicated resources or broken associations. This can happen when concurrent updates are issued. To fix this, Heat components first acquire a distributed mutex lock before updating the stack.
The development of those kind of primitives is an extremely hard exercise and source of headaches as it implies to solve the well known problem of consensus in distributed systems.
In order to make developers life easier, Yahoo! Labs initiated the Apache ZooKeeper project which aims to provide a centralized API for those coordination primitives. Thanks to the ZooKeeper’s API we can easily implement different protocols like distributed locks, barriers, queues and so on.
Architecture and benefits of a ZooKeeper application
A ZooKeeper application is composed of one or several ZK servers, called an ‘ensemble’ and a set of ZK clients on the application side.
The idea here is that every node of the distributed application relies on the ZooKeeper servers by using the exposed API at the application level through the use of a ZK client.
There are several benefits of this architecture:
- We extracted most of the distributed synchronization burden from the application level so that we obtain a KISS (Keep It Simple, Stupid) architecture.
- The usual distributed coordination primitives are working out of the box so the developer doesn’t have to deal with them.
- The developer doesn’t need to handle the failures of such a service because it is very resilient. ZK is the nervous center of the application since it is in charge of the whole coordination and thus many components will depend on it. For these reasons, ZooKeeper has been designed upon well-tried distributed algorithms to provide the required high reliability and availability. A ZooKeeper ensemble is quorum-based and usually composed of three or five servers.
It may be used in various context, let’s see how it works in practice.
ZooKeeper in practice
ZooKeeper’s API aims to be very simple and straightforward, the data model is based on a hierarchical namespace stored as an in memory tree. Elements of this tree are called znode and contain data like a file and can have sub znodes like a directory.
First, make sure you satisfy the system requirements and let’s deploy a ZK server:
$ wget http://apache.crihan.fr/dist/zookeeper/zookeeper-3.4.6/zookeeper-3.4.6.tar.gz $ tar xzf zookeeper-3.4.6.tar.gz $ cd zookeeper-3.4.6 $ cp conf/zoo_sample.cfg conf/zoo.cfg $ ./bin/zkServer.sh start
The ZooKeeper server is now running in a standalone mode and listening to 127.0.0.1:2181 by default, for deploying an ensemble of servers you can take a look at the administration guide.
ZooKeeper command line interface
We can use the ZooKeeper command line interface (./bin/zkCli.sh) to do some basic operations. It’s very similar to a shell console with a file system like behavior.
List the children znodes of the root znode “/”:
[zk: localhost:2181(CONNECTED) 0] ls / [zookeeper]
Create a znode with the path “/myZnode” and its associated data “myData”:
[zk: localhost:2181(CONNECTED) 1] create /myZnode myData Created /myZnode [zk: localhost:2181(CONNECTED) 2] ls / [myZnode, zookeeper]
Delete a znode:
[zk: localhost:2181(CONNECTED) 3] delete /myZnode
You can see more operations by typing the “help” command. In our case, we will use the Application Programming Interface (API) to write a distributed application.
Python ZooKeeper API
The ZooKeeper server is built with the Java programming language and comes with a large collection of client bindings in various languages. In this article we will discover the API through Kazoo the Python binding.
Kazoo is easily installable with pip in a virtual environment:
$ pip install kazoo
First we need to connect to a ZooKeeper ensemble:
from kazoo import client as kz_client my_client = kz_client.KazooClient(hosts='127.0.0.1:2181') def my_listener(state): if state == kz_client.KazooState.CONNECTED: print("Client connected !") my_client.add_listener(my_listener) my_client.start(timeout=5)
In the code above we created a ZK client with the KazooClient class. The “hosts” argument defines the ZK servers addresses separated by commas so that if one server fails then the client will automatically try to connect to another one.
Kazoo can notify us when the connection state has changed, this is very useful in order to trigger some actions depending on the current state. For instance when the connection is lost the client should stop sending commands, this is the purpose of the add_listener() method.
The start() method constructs the connection between the client and a ZK server, once it’s established a session is created. Each server keeps track of a session with every client, we’ll see that it is very important for implementing a distributed coordination primitive.
CRUD the znodes
Interacting with the znodes is pretty straightforward:
# create a znode with data my_client.create(“/my_parent_znode”) my_client.create(“/my_parent_znode/child_1”, “child_data_1”) my_client.create(“/my_parent_znode/child_2”, “child_data_2”) # get the children of a znode my_client.get_children(“/my_parent_znode”) # get the data of a znode my_client.get(“/my_parent_znode/child_1”) # update a znode my_client.set(“/my_parent_znode/child_1”, b"child_new_data_1") # delete a znode my_client.delete(“/my_parent_znode/child_1”)
The set() method accepts a version parameter which allows us to perform CAS-like operations, this way nobody updated the data without reading the latest version.
Sometimes, you want to ensure that a znode name is unique, you can achieve that by using a sequential znode which tells the server to append a monotonically increasing counter to the end of path.
At this point ZooKeeper acts like a normal database but things get interesting thereafter :-).
Watchers
The watcher mechanism is a key feature of ZooKeeper, it allows to be notified on znode events. In other words, every client can subscribe to the events of a specific znode and be notified when its state has changed. To be notified, the client must register a callback method which is called (by a background thread) when an event of interest occurs, you can check the different event types here.
Here is an example for being notified when the children set of a znode has changed:
def my_func(event): # check to see what the children are now # Set a watcher on "/my_parent_znode", call my_func() when its children change children = zk.get_children("/my_parent_znode", watch=my_func)
It’s worth noting that once a callback has been executed, the client must set it again in order to be notified for the next event.
Ephemeral znodes
As said earlier, when the client is connected to the server a session is established. This session is kept open by sending heartbeat messages to the server. After some amount of idle time, if the server has not heard from the client then it will close its session. Thanks to the session, the server knows which clients are alive.
An ephemeral znode is like a normal znode except that it will be automatically dropped by the server as soon as the session has expired.
When we combine the watchers with ephemeral znodes we obtain the killer feature of ZooKeeper. In fact, these features open a lot of possibilities for implementing distributed coordination primitives. Let’s see the case of a distributed lock.
Distributed lock
The distributed lock is the most common primitive used in distributed applications because we often need to access some resources in a mutually exclusive manner.
ZooKeeper makes this task trivial:
my_lock = my_client.Lock("/lockpath", "my-identifier") with lock: # blocks waiting for lock acquisition # do something with the lock
It’s the same API as using a local lock, but what happened underneath ? Let’s describe how a distributed algorithm is designed.
A distributed algorithm must fulfill these two properties: the safety and the liveness.
The safety ensure that the algorithm must not deviate from its goal, for the distributed lock it means that only one node can acquire the lock. Intuitively, two nodes must not acquire the lock at the same time.
The liveness ensure that the algorithm move forward, for the distributed lock it means that if a node wants to acquire the lock then it will eventually acquire it.
Implementing a lock locally is a well known problem, there is a lot of algorithms – like the Dekker’s algorithm – that every modern programming language have in their standard library. Thank heavens, in a distributed environment it’s more complicated :-). These properties are hard to achieve since nodes could fail anytime. This creates a large number of possible failure scenarios.
ZooKeeper ensures these properties for us:
- The liveness property is ensured by combining the ephemeral znodes to detect a failed node and the watcher mechanism to notify the other nodes. Thus, if a node acquires the lock and fails, it’s detected by the others.
- The safety property is ensured by using the sequential znodes which ensure a unique name to the contenders so that only one node can acquire the lock.
I strongly suggest you to take a look at the lock recipe in the documentation, Kazoo comes with several implementations.
Conclusion
Building a distributed application could easily be a nightmare because we must expect the unexpectable (failures happen randomly) and deal with the combinatory explosion (bigger is the system bigger is the number of states to handle). ZooKeeper is a handy tool and a wise choice in your infrastructure stack. It helps you to focus on the application logic.
In Openstack, we share the ZooKeeper philosophy: solve the hard problems of distributed systems in one common tool. Thus, we created a library called Tooz which implements some common distributed coordination primitives. Tooz relies on different backend drivers – of course ZooKeeper is one of them – and aims to be used in all Openstack projects.
In the next article we will study how OpenStack Ceilometer made the central agent highly available, by leveraging another distributed primitive: group membership. This will be our first development of a real application based on ZooKeeper!
Your article is really helpful. Thank you for sharing please give me your twitter account
Nice article, great introduction about Zookeeper. Thanks.