3 - Consumer and Consumer Groups
A Consumer reads events from Kafka topics. A Consumer Group is a collection of consumers working together to consume data from a topic.
For example, we can have multiple consumer groups like:
notificationgroupanalyticsgroup
Different applications can create their own consumer groups depending on their use case.
A notification service may process events to send emails or push notifications, while an analytics service may process the same events for reporting and dashboards.
Inside a single consumer group:
Two consumers cannot read the same partition simultaneously.
But:
Consumers belonging to different groups can read the same partition.
This is one of the most fundamental ideas in Kafka.
Example
Imagine a topic with 3 partitions:
Partition 0
Partition 1
Partition 2Now suppose there are two consumer groups:
- Notification Group
- Analytics Group
Both groups can independently read the same partitions.
This works because each group maintains its own reading progress independently.
Partition Assignment Rules
Consumers Equal to Partitions
If there are 3 partitions and 3 consumers:
Consumer 1 -> Partition 0
Consumer 2 -> Partition 1
Consumer 3 -> Partition 2Perfect distribution.
Consumers Less Than Partitions
A single consumer can handle multiple partitions.
Consumer 1 -> Partition 0, Partition 1
Consumer 2 -> Partition 2Kafka distributes the work as evenly as possible.
Consumers Greater Than Partitions
Some consumers remain idle.
3 Partitions
5 ConsumersOnly 3 consumers will actively consume data.
The remaining 2 consumers wait idle unless rebalancing happens later.
Offsets
Each message inside a partition has an offset.
An offset is simply a sequential number representing the position of a message inside a partition.
Example:
Partition 0
Offset 0
Offset 1
Offset 2
...
Offset 500If a consumer says:
“I have processed till offset 500”
It means:
- Messages from
0 → 500are completed - Next read should start from
501
Consumer Groups Maintain Offsets Independently
Kafka stores offsets group-wise, not consumer-wise. Each consumer group maintains partition wise offset independently.
For example,
| Consumer Group | Topic | Partition | Committed Offset (processed upto this offset) |
|---|---|---|---|
| notification | order-events | 0 | 510 |
| notification | order-events | 1 | 342 |
| analytics | order-events | 0 | 825 |
| analytics | order-events | 1 | 690 |
Why Group-Wise Instead of Consumer-Wise?
Suppose:
Notification Group
Consumer 1 -> Partition 0Consumer 1 processes messages till offset 500. Now Consumer 1 crashes. Kafka may assign Partition 0 to Consumer 2. Consumer 2 should continue from offset 501.
To make this possible, Kafka stores progress like this:
Group: notification
Topic: order-events
Partition: 0
Committed Offset: 500Notice:
- Kafka does NOT care which consumer processed it
- Kafka only tracks the group’s progress
This allows seamless failover.
Where Are Offsets Stored?
Kafka stores offsets inside an internal Kafka topic called:
__consumer_offsetsKafka itself creates this topic automatically. So offset information is not stored in some special database. Kafka uses its own core mechanism:
- Topics
- Partitions
- Logs
Even Kafka's internal metadata relies on partitions.
How Offset Storage Works
Suppose:
- Topic =
order-events - Consumer Group =
notification - Partition =
0 - Processed till offset =
500
The consumer publishes a message into __consumer_offsets.
That message contains:
Group ID
Topic
Partition
Committed OffsetConceptually:
How Kafka Chooses the Offset Partition
The __consumer_offsets topic has many partitions.
Kafka computes the partition by:
hash(groupId) % number_of_partitionsExample:
partition_number = hash("notification_group_id") % 50 = 23So all offset data for the notification_group_id goes into Partition 23 of __consumer_offsets.
Why This Partition Calculation Matters
In a real Kafka cluster:
- Different brokers host different partitions of topics
- No broker stores everything
So Kafka must determine:
- Which partition should store this offset
- Which broker hosts that partition
Example:
Broker 1 -> Partitions 1-20
Broker 2 -> Partitions 21-50If offsets map to Partition 23:
- Kafka knows Broker 2 hosts it
- Consumer communicates with Broker 2
This is why partition calculation matters.
Consumer Recovery After Crash
Suppose Consumer 1 crashes. Consumer 2 takes over. What happens?
Consumer 2:
- Computes offset partition
- Finds the broker hosting that partition
- Reads committed offsets
- Resumes from the next offset
Example:
Last committed offset = 500
Resume from = 501This mechanism allows Kafka consumers to recover without losing their place.
Offset Commit Strategies
Kafka provides different strategies for committing offsets.
Auto Commit
Kafka can automatically commit offsets periodically.
Example:
auto.commit.interval.ms = 5000It means every 5 seconds Kafka commits processed offsets automatically.
Problem with Auto Commit
Suppose:
Consumer polled messages 0-99Auto commit occurs.
Kafka records:
Offset 99 processedBut the consumer crashes while processing message 50.
After restart:
- Kafka thinks
99was already processed - Consumer resumes from
100
Messages 51-99 are lost.
This is dangerous.
Manual Commit
The safer approach is manual commit.
The consumer:
- Polls a batch
- Processes the batch
- Commits offsets only after successful processing
Example:
Poll -> Process -> CommitIf crash occurs before commit:
- Kafka reprocesses messages
- Duplicate processing may happen
- But messages are not lost
Kafka usually prefers:
Duplicate processing over data loss.
Why Not Commit After Every Message?
Technically possible, but inefficient. Kafka is designed for high throughput.
Committing after every event means:
- Constant network calls
- Reduced performance
- Higher overhead
Instead, applications choose a reasonable batch size. This becomes a tradeoff between:
- Performance
- Duplicate processing risk
Kafka Cluster
A Kafka Cluster is simply:
Multiple brokers working together.
Goals of clustering:
- Scalability
- Fault tolerance
- High availability
Scalability
Load is distributed across multiple servers.
Instead of one machine handling everything:
Broker 1 -> Some partitions
Broker 2 -> Some partitions
Broker 3 -> Some partitionsWork is shared.
Fault Tolerance
Kafka continues operating even if brokers fail.
This is achieved using replication.
High Availability
Kafka avoids a single point of failure.
Even if:
- A broker crashes
- A partition replica fails
Kafka can continue serving requests.
Leader and Follower Partitions
Each partition has:
- One Leader
- One or more Followers
Example:
Replication Factor = 2Means:
- 1 Leader
- 1 Follower copy
Example Cluster
Topic:
order-eventsPartitions:
P0
P1
P2Replication Factor:
2Total replicas:
6 replicas- 3 leaders
- 3 followers
Distributed across brokers.
Leader-Follower Architecture
Responsibilities of Leaders
Leaders handle:
- Producer writes
- Consumer reads
- Partition logs
- Synchronization with followers
Clients always communicate with leaders.
Responsibilities of Followers
Followers:
- Replicate data from leaders
- Stay synchronized
- Become leaders if current leader fails
Followers do NOT directly serve clients.
Think of followers as hot standby replicas.
How Producers Find the Correct Broker
Suppose a producer wants to write an event.
It calculates:
hash(key) % number_of_partitionsExample:
hash(orderId) % 3 = Partition 1Now producer must determine:
Which broker hosts Partition 1 Leader?Then it sends data directly to that broker.
This is why Kafka clients need metadata about:
- Topics
- Partitions
- Leaders