Basic concepts in Apache Kafka

To work with Apache Kafka, you need to understand some basic concepts about:

  • Producer
  • Consumer
  • Broker
  • Cluster
  • Topic
  • Topic Partitions
  • Partition Offset
  • Consumer Group

Producer

Producers are applications that produce data and send data to Apache Kafka Server. This data will be formatted messages, sent as byte arrays to the Apache Kafka server. For example, if you have a .txt file containing text inside, we can use Producer to read each line in this file and then send it to the Apache Kafka server.

Consumer

The consumer is an application that receives messages from the Apache Kafka server with messages sent from Publisher. The consumer needs to subscribe to an Apache Kafka server topic in order to receive all messages emitted to this topic.

After receiving the data, the Consumer can add code to process the data according to its needs.

Broker

Broker is an Apache Kafka server, a bridge between Message Publisher and Message Consumer, allowing them to exchange messages with each other.

Cluster

A cluster is a group of Brokers or in other words a group of Apache Kafka servers.

Topic

The topic is defined in Apache Kafka server, where Publisher sends data and also where Consumer subscribes to receive data from Publisher. Using topics helps Apache Kafka to classify messages and Consumers also know where they get their data from.

Topic Partitions

Apache Kafka is a distributed messaging system and we can setup an Apache Kafka server with a cluster. In case a topic receives too many messages at the same time, we can divide this topic into partitions that are shared between Apache Kafka servers in a cluster that can handle these messages.

A partition will be small and independent of other partitions. The number of partitions for each topic depends on the needs of the application that we can decide.

Partition Offset

In a topic partition, messages are stored and marked with offsets. Every time a new message enters a topic, the Apache Kafka server determines which partition the message is located in, which partition’s offset. The values ​​of the offsets will increment over a partition and are only available to an Apache Kafka server.

To retrieve a message in the Apache Kafka server, we need to specify the topic name, partition number, and offset number.

Consumer Group

A consumer group is a group of Consumer consuming messages from the Apache Kafka server. Each Consumer Group will share the handling of the message.

Add Comment