In this blog series will try to understand the design and working of Kafka. This is Part-1 where we will see what is kafka, how it works and what are its components.
WHAT IS KAFKA
Apache Kafka, which is an open-source distributed event streaming platform. It was originally developed by LinkedIn and later donated to the Apache Software Foundation.
Apache Kafka is designed to handle large-scale, high-throughput, and fault-tolerant real-time data streaming. It provides a messaging system that allows various components of an application or system to communicate and exchange data in a distributed and scalable manner.
APACHE KAFKA ARCHITECTURE
We will refer to the below image to understand the Architecture of Kafka.
There are two major layers in Kafka.
- Compute Layer
- Storage Layer
There are two main API’s in Kafka.
- Producer API
- Consumer API
We will deep dive into all portions. But in this blog, we will understand only high-level things.
A) COMPUTE LAYER:
As the name suggests, compute layer simply means the layer which does all the computations/ processing of data.
B) STORAGE LAYER:
As the name suggests, this layer will store the data efficiently that the producer will produce and the consumer will consume.
C) KAFKA PRODUCER API:
The API is responsible to push the events/ data to Kafka Queue produced by clients.
D) KAFKA CONSUMER API:
The API is responsible to consume the events/ data from Kafka Queue by clients.
BASIC WORKFLOW OF KAFKA
Events Source → Event Stream → Consumer
CORE CONCEPTS OF KAFKA
The core concepts of Kafka include:
A) Topics: Topics are the categories or feeds of messages in Kafka. They represent a particular stream of data.
a) In simple words, think of the Topic as a Database table, where similar kinds of events will be stored.
b) Whenever we push data to Kafka, we need to specify which topic data should go to and while reading we should specify from which set (or single) of topics data should be read from.
c) Kafka Topics are Immutable.
B) Producers: Producers are applications or processes that publish (produce) data to Kafka topics.
C) Consumers: Consumers are applications or processes that subscribe to Kafka topics and consume data from them.
D) Broker: Brokers are the Kafka server instances responsible for storing and managing topics and messages.
E) Partitions: Topics are divided into partitions to allow for parallel processing and scalability. Each partition is an ordered, immutable sequence of messages.
a) Partition is a unit for parallelism.
b) When data is pushed it must go to the specified partition of Kafka Topic.
c) Each partition can be accessed parallelly and independently.
F) Offsets: Each message within a partition is assigned a unique identifier called an offset. Offsets are used to maintain the ordering of messages within a partition.
I hope the basic workings and design of Kafka are understandable.