How to Install Kafka | Kafka Quick Start Guide for Linux

Developer presenting detailed install Kafka cli methods within functional kafka quick start class

Handling real-time data streams are vital for our automated process at at IOFLOOD. While discovering improved tools, kafka CLI stood out as a specialized command-line tool for managing Apache Kafka and real-time data. We have translated our practices into this tutorial for our bare metal cloud server customers looking for help within the Apache Kafka getting started phases.

In this tutorial, we will guide you through Kafka install steps for Linux systems. We will show you methods for both APT and YUM-based distributions, delve into compiling Kafka cli from source, installing a specific version, and finally, how to use the Kafka command and ensure it’s installed correctly.

So, let’s dive in and begin our Linux Kafka quick start process!

TL;DR: How To Install Kafka CLI on Linux?

You can install Kafka by first downloading the Kafka binaries from the Apache website with wget https://downloads.apache.org/kafka/3.7.1/kafka_2.13-3.7.1.tgz. Then, extract the tar file with tar xzf kafka_2.13-3.7.1.tgz, configure, and start the Kafka server.

Here’s a basic example:

wget https://downloads.apache.org/kafka/3.7.1/kafka_2.13-3.7.1.tgz

tar xzf kafka_2.13-3.7.1.tgz

cd kafka_2.13-3.7.1

bin/zookeeper-server-start.sh config/zookeeper.properties &

bin/kafka-server-start.sh config/server.properties &

This is just a basic way to install Kafka on Linux, but there’s much more to learn about installing and using Kafka. Continue reading for more detailed information and advanced usage scenarios.

The Basics: Install Kafka on Linux

Kafka is an open-source distributed streaming platform by Apache. It’s used to build real-time data pipelines and streaming apps. It’s horizontally scalable, fault-tolerant, and incredibly fast. This makes Kafka a go-to solution for a large variety of real-time data streaming needs.

Now, let’s dive into how to install Kafka on your Linux system.

Kafka Install with APT

For Debian-based distributions, we use the APT package manager. Here’s how you can install Kafka using APT:

sudo apt-get update
sudo apt-get install kafka

# Output:
# 'kafka is already the newest version (2.13-2.8.0).'

This command updates your package lists and then installs Kafka. The output message confirms that Kafka has been installed successfully.

Kafka Install with YUM

For Red Hat-based distributions, we use the YUM package manager. Here’s how you can install Kafka using YUM:

sudo yum update
sudo yum install kafka

# Output:
# 'Package kafka-2.13_2.8.0.noarch already installed and latest version'

Similar to the APT commands, these commands update your package lists and then install Kafka. The output message confirms that Kafka has been installed successfully.

Remember, these are basic installations. For more advanced scenarios, such as installing from source or installing different versions, continue reading to the Advanced Use section.

Source Code Kafka Quick Start

Sometimes, you may need to install Kafka from the source code. This could be due to the need for a specific version or to customize the installation.

Here’s how you can install Kafka from the source code:

git clone https://github.com/apache/kafka.git

cd kafka

gradle

# Output:
# 'BUILD SUCCESSFUL in 2m 36s'

This command clones the Kafka repository, changes the directory to the Kafka folder, and then builds Kafka using Gradle. The output message confirms that Kafka has been built successfully.

Different Versions of Kafka CLI

Kafka has numerous versions, and you might need a specific one for compatibility or to use a particular feature. Here’s how you can install a specific version of Kafka.

Installing from Source

To install a specific version from source, you need to checkout the specific version before building with Gradle:

git clone https://github.com/apache/kafka.git
cd kafka
git checkout 2.3.0
gradle

# Output:
# 'BUILD SUCCESSFUL in 2m 36s'

Installing with APT

To install a specific version with APT, you need to specify the version during installation:

sudo apt-get update
sudo apt-get install kafka=2.3.0

# Output:
# 'kafka is already the newest version (2.3.0).'

Installing with YUM

To install a specific version with YUM, you need to specify the version during installation:

sudo yum update
sudo yum install kafka-2.3.0

# Output:
# 'Package kafka-2.3.0.noarch already installed and latest version'
VersionKey ChangesCompatibility
2.3.0Added support for Java 11Compatible with all previous versions
2.4.0Introduced Incremental Cooperative RebalancingCompatible with all previous versions
2.5.0Added support for non-key joining in KTableNot backwards compatible with versions below 2.3.0

Use and Verify New Kafka Install

Once Kafka is installed, you can use the kafka-topics command to create a topic and then list all topics to verify that Kafka is working correctly.

Here’s an example:

kafka-topics --create --topic test --partitions 1 --replication-factor 1 --bootstrap-server localhost:9092
kafka-topics --list --bootstrap-server localhost:9092

# Output:
# 'test'

This command creates a topic named ‘test’ and then lists all topics. The output shows the ‘test’ topic, confirming that Kafka is installed and working correctly.

How to Install Kafka in Docker

Docker is a popular platform used to develop, ship, and run applications. By installing Kafka using Docker, you can ensure that the software will run the same, regardless of its environment.

Here’s how to install Kafka using Docker:

docker pull confluentinc/cp-kafka

docker run -d --name=kafka -p 9092:9092 -e KAFKA_ZOOKEEPER_CONNECT=your.zookeeper.connect:2181 -e KAFKA_ADVERTISED_LISTENERS=PLAINTEXT://your.kafka.host.name:9092 -e KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR=1 confluentinc/cp-kafka

# Output:
# 'Successfully tagged confluentinc/cp-kafka:latest'

This command pulls the Kafka image and then runs a new Kafka container. The output message confirms that the Kafka Docker image has been successfully created.

Install Kafka with Kubernetes

Kubernetes is an open-source platform designed to automate deploying, scaling, and operating application containers. Here’s how to install Kafka using Kubernetes:

helm repo add confluentinc https://confluentinc.github.io/cp-helm-charts/
helm repo update
helm install my-kafka confluentinc/cp-helm-charts

# Output:
# 'NAME: my-kafka'
# 'STATUS: deployed'

This command adds the Confluent Helm chart repository, updates the repo, and then installs Kafka using Helm. The output message confirms that Kafka has been successfully deployed on Kubernetes.

MethodAdvantagesDisadvantages
DockerSimplifies configuration, Isolation from other applications, Easy version controlRequires Docker knowledge, More resources usage
KubernetesScalability, High Availability, Automated rollouts and rollbacksMore complex, Requires Kubernetes knowledge

While these alternative methods might seem more complex, they offer greater control and scalability, especially in a microservices environment.

Solving Kafka Install Issues

While installing Kafka on Linux, you may encounter a few common issues. Let’s discuss these problems and their solutions to ensure a smooth Kafka installation process.

Issue: Kafka Service Not Starting

Sometimes, after installation, the Kafka service might not start. This can be due to various reasons such as incorrect configuration or port conflicts.

To check if Kafka is running, you can use the following command:

systemctl status kafka

# Output:
# 'kafka.service - Apache Kafka
# Loaded: loaded (/usr/lib/systemd/system/kafka.service; disabled; vendor preset: disabled)
# Active: inactive (dead)'

If Kafka is not running, you can start it using the following command:

systemctl start kafka

# Output:
# No output on success

Issue: Kafka Version Conflict

If you have installed Kafka from the source and there is a version already installed through the package manager, it can lead to a version conflict. To resolve this, you need to uninstall the existing version before installing a new one.

Here’s how to uninstall Kafka:

sudo apt-get remove kafka

# Output:
# 'The following packages will be REMOVED:
#   kafka'

Issue: Kafka Port Conflict

Kafka uses port 9092 by default. If another service is using this port, Kafka will not start. You can check the port usage using the following command:

sudo lsof -i:9092

# Output:
# 'COMMAND  PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
# java    2034 kafka  152u  IPv6  21560      0t0  TCP *:9092 (LISTEN)'

If the port is in use, you need to stop the service using the port or change the Kafka port.

Remember, troubleshooting is a normal part of any installation process. Don’t get discouraged if you encounter issues. With these tips, you should be able to resolve common problems and get your Kafka service running on Linux.

Understanding What is Kafka

Before delving into the installation process of Kafka on Linux, it’s essential to understand what Kafka is and its underlying architecture. This will not only make the installation process more meaningful but also enhance your ability to troubleshoot any potential issues.

What is Kafka?

Apache Kafka is an open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications.

# Kafka version check
kafka-topics --version

# Output:
# '2.8.0'

The above command checks the installed Kafka version. The output shows the version of the Kafka installed on your system.

Kafka Architecture

The architecture of Kafka is made up of a few key components: Producers, Consumers, Brokers, Topics, Partitions, and Clusters.

  • Producers are processes that publish data (push messages) into Kafka topics within the broker.
  • Consumers read data from brokers.
  • Brokers are a system running the Kafka software, and brokers are containers that hold topic log partitions.
  • Topics are a particular stream of data. Topics in Kafka are always multi-subscriber.
  • Partitions allow you to parallelize a topic by splitting the data in a particular topic across multiple brokers.
  • Clusters are a group of computers. The Kafka cluster stores stream records in categories called topics.

The Role of Kafka in Data Pipelines

Kafka plays a significant role in real-time data pipelines. It’s designed to handle real-time data feeds with low latency and high throughput. It also integrates well with Big Data infrastructure and data processing frameworks, making it a popular choice for real-time analytics.

The Importance of Real-Time Data Processing

Real-time data processing allows businesses to react without delay. It can help businesses to make crucial decisions promptly, offer better customer service, and stay ahead of the competition.

Understanding these fundamentals will make the Kafka installation process on Linux more comprehensible and manageable.

Practical Uses of Kafka

Kafka has become an integral part of big data applications due to its ability to handle real-time data feeds. It’s often used in data architectures and big data workflows, especially in scenarios where real-time processing is required.

A common use case is log or event data aggregation where Kafka consolidates data from different sources and pushes them into a real-time analytics system or a big data store.

Kafka in Microservices Architecture

In a microservices architecture, Kafka serves as a backbone for enabling communication between different services. It provides an asynchronous, decoupled communication mechanism that is ideal for microservices.

With Kafka, microservices can publish events to Kafka topics, and other microservices can consume from these topics. This allows for a loosely coupled architecture where services can evolve independently.

Exploring Further Topics

Kafka Streams is a client library for building applications and microservices, where the input and output data are stored in Kafka clusters. It combines the simplicity of writing and deploying standard Java and Scala applications on the client side with the benefits of Kafka’s server-side cluster technology.

Kafka Connect, on the other hand, is a tool for scalably and reliably streaming data between Apache Kafka and other systems. It makes it simple to quickly define connectors that move large data sets into and out of Kafka.

Further Resources for Kafka Proficiency

To further your understanding of Kafka and its applications, here are a few resources that you might find helpful:

  1. Apache Kafka CLI: A Streaming System – A comprehensive guide to understanding Kafka and its architecture.

  2. Kafka Quick Start: The Definitive Guide – From the official Apache Kafka website provides a detailed introduction.

  3. Apache Kafka Getting Started Tutorial – A practical guide to getting started with Kafka.

Recap: Install Kafka CLI in Linux

In this comprehensive guide, we’ve ventured into the world of Kafka, a powerful open-source distributed streaming platform. We’ve explored how to install Kafka on Linux, starting from basic methods and moving on to more advanced techniques.

We started off with the basics, understanding how to install Kafka using package managers like APT and YUM. We then delved into more advanced territory, learning how to install Kafka from the source code, how to install specific versions, and how to use and verify the Kafka installation.

Along the way, we tackled common challenges you might face when installing Kafka on Linux, such as service not starting, version conflicts, and port conflicts, providing you with solutions for each issue.

We also looked at alternative approaches to Kafka installation like Docker and Kubernetes, weighing their advantages and disadvantages. Here’s a quick comparison of these methods:

MethodAdvantagesDisadvantages
DockerSimplifies configuration, Isolation from other applications, Easy version controlRequires Docker knowledge, More resources usage
KubernetesScalability, High Availability, Automated rollouts and rollbacksMore complex, Requires Kubernetes knowledge

Whether you’re just starting out with Kafka or you’re looking to level up your skills, we hope this guide has given you a deeper understanding of Kafka installation on Linux.

With its ability to handle real-time data feeds with low latency and high throughput, Kafka is a powerful tool for data streaming. Now, you’re well equipped to install and use Kafka on your Linux system. Happy coding!