Zookeeper: The Secret Weapon of Distributed Systems for Seamless Data Management — Part 1 (Basics)

Akhilesh Mahajan
4 min readApr 27, 2024

--

Introduction 👇:

In today’s fast-paced digital world, distributed systems have become the backbone of many modern applications, from social media platforms to financial systems. However, managing data in these systems can be a challenging task, as it involves coordinating multiple nodes and ensuring data consistency.

In this blog, we will explore the key features of Zookeeper, its architecture, and its use cases. We will also delve into best practices for using Zookeeper in production environments, and discuss how it can help you build scalable and reliable distributed systems. Whether you’re a seasoned developer or a newcomer to distributed systems, this guide will provide you with a solid understanding of Zookeeper and its role in modern data management.

Need of Zookeeper 👇:

In a distributed system, multiple nodes or machines need to communicate with each other and coordinate their actions. ZooKeeper provides a way to ensure that these nodes are aware of each other and can coordinate their actions.
ZooKeeper is widely used in distributed systems such as Hadoop, Kafka, and HBase, and it has become an essential component of many distributed applications.

Problems that Zookeeper solves 👀👇:

  1. Configuration Management: The idea here is to maintain a centralised configuration for different applications, so that any change in the configuration can be visible from all servers.
  2. Leader Election: Electing a leader in multinode servers.
  3. Locks in Distributed Systems: It enables system to work on shared resources in mutually exclusive way.
  4. Manage Cluster Membership: To check whether a cluster joins or leaves the cluster and maintains the same information in the cluster.

How does Zookeeper solves this problem 👀👇:

To understand this, we need to understand the architecture and components of Zookeeper.

👉 Architecture:

Architecture

👉 Zookeeper Components:

  • Request Processor — Active in Leader Node and is responsible for processing write requests. After processing, it sends changes to the follower nodes
  • Atomic Broadcast — Present in both Leader Node and Follower Nodes. It is responsible for sending the changes to other Nodes.
  • In-memory Databases (Replicated Databases)-It is responsible for storing the data in the zookeeper. Every node contains its own databases. Data is also written to the file system providing recoverability in case of any problems with the cluster.

👉 Zookeeper Data Model:

Zookeeper solves these problems using its magical tree structure file system called znodes, somewhat similar to the Unix file system. These znodes are analogous to folders and files in a Unix file system with some additional magical abilities. Zookeeper provides primitive operations to manipulate these znodes, through which we will solve our distributed system problems.

👉 Some Key Features about ZNode:

  • ZNode can store data and have child ZNodes at the same time.
  • ZNode can store information like current version, transaction id etc.
  • Each ZNode has Access Control List (ACL).
  • Supports username/password-based authentication on individual znodes too.
  • Clients can set a watch on these Znodes and get notified if any changes occur in these znodes.
ZNode structure

👉 Types of ZNodes:

  1. Persistent Node: Node which stays even after client connection gets close to the service.
  2. Ephiermal Node: Node which gets deleted after client connection gets close to the service.
  3. Persistent Sequential Node: A Persistent Node with 10 digit sequential number attached to the end of the node name.
  4. Ephiermal Sequental Node: An Ephiermal Node with 10 digit sequential number attached to the end of the node name.

👉 Zookeeper Operations:

  • Create znodes
  • Get data
  • Watch znode for changes
  • Set data
  • Create children of a znode
  • List children of a znode
  • Check Status
  • Remove / Delete a znode

Name itself specifies what each operation does😅.

Now let’s implement some code😅. The implemented code make a connection to the local zookeeper service and executes basic functionalities of the zookeeper.

Don’t worry will share all details on how to setup local zookeeper service in references section😉.

--

--

Akhilesh Mahajan

Full-Stack Developer | Golang, Java, Rust, Node, React Developer | AWS☁️, Docker, Kubernetes | Passionate about distributed systems and cloud-native application