• Guides
  • Build Your Own High Pub/Sub Cluster on CoreOS

Build Your Own High Pub/Sub Cluster on CoreOS

Build Your Own High Pub/Sub Cluster on CoreOS

An Automation Friendly Guide to Deploying Apache Kafka on CoreOS with Packet Bare Metal

In this article I’ll walk you through a quick and easy way of setting up a Kafka cluster on a single Packet bare-metal machine. The entire setup will have the default Zookeeper instance that ships with Kafka, and three message brokers. I'll show you how to do a dry-run as well the process for publishing some messages to the Producer and receiving them via Consumer.

If your application is collecting, streaming or measuring a large volume of events, then you’ve probably heard of Kafka.  Now an Apache project, Kafka was originally created at LinkedIn to solve their need for highly scalable, low latency message brokering service (although you could also call it a pub/sub system or a really good queuing solution).

Written in Scala, the application has evolved significantly since graduating into the formal Apache foundation in late 2012.  It’s now a popular alternative to other message brokers like ActiveMQ, mainly due to its speed and reliability, even at huge volumes.

At Packet, we love learning about new technologies or helping to dig up solutions to challenges that our clients face.  The best part of my job is that I get paid to touch all kinds of different deployed technologies and help automate them on super fast bare metal machines (like our Type 3 server and it’s NVMe flash cards!).  It’s a bit like putting a geeky kid in an “ops focused” candy store!

Recently I stumbled across a LinkedIn article where they conducted some benchmarking performance on Kafka, getting up to 2 million writes per second on 3 machines.  Crazy!  I had to check Kafka out further and see how we could make it super easy to deploy on Packet.

Getting Started

So let’s get started...first manually and then using some nice cloud-init YAML for automation.

I’ve chosen to install a bare metal Type 1 machine, which includes a quad core processor, 16GB of RAM and 2 x 120GB enterprise SSD drives for $0.40/hr.  I’ll use CoreOS Stable, as it is really awesome for cluster automation.

Setting up a Single Broker

Step 1

First, we need to download kafka and untar it from your root directory.  Hop on to your Packet server using your SSH private key and logging in as root.  First, download the latest kafka stable release here.  Then, let's extract it:

# tar -xzf kafka_2.10-0.8.2.0.tgz cd kafka_2.10-0.8.2.0/ 

We also need to make sure that we have a Java runtime environment installed so that we can run Kafka and Zookeeper. We can use the default JRE that is available from the local repository:

# apt-get install default-jre

Step 2

Next, let’s setup Zookeeper. Why Zookeeper? Imagine, herding a sheep without a shepherd? Zookeeper plays a vital part in coordinating nodes in a Kafka cluster.  For the same reason why it is described as "a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services".

Good thing is, I'm demonstrating a quick and easy way to deploy Kafka cluster on a single bare-metal machine. So, we can just use the default Zookeeper config that ships with Kafka and get right on to running some Kafka tests.  So head into your Kafka directory (/root/kafka_2.10-0.8.2.0/) and run:

# bin/zookeeper-server-start.sh config/zookeeper.properties

Keep in mind that Zookeeper runs by default on TCP port 2181.  Now that we have Zookeeper running, we can start the Kafka server by invoking this command inside kafka_2.10-0.8.2.0/ directory:

# bin/kafka-server-start.sh config/server.properties

Step 3

Once both Zookeeper and Kafka are up and running, we need to create a sample Topic so we can run some tests.  Topics are basically a grouping of messages in Kafka’s data model.  I’ll  setup a Topic called “TestRun” by running the following command:

# bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic TestRun

To check the topic that we just created, you can run a list, as follows:

# bin/kafka-topics.sh --list --zookeeper localhost:2181

Step 4

Then, we need to try and send some messages by starting and running the Producer:

# bin/kafka-console-producer.sh --broker-list localhost:9092 --topic TestRun

Hello World

Hello World 123

Step 5

Lastly, let's run a Consumer so we can see the messages published by the Producer:

# bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic test --from-beginning

The output we should see after running the command above are the messages that we published when we ran the Producer (Hello World and Hello World 123).

Part 2 - Running Two or More Brokers in a Cluster

Now that Kafka and Zookeeper are running, we will try to setup three brokers. Three brokers is better than one, right? and hey! it's not a cluster if we are running a single instance, correct? plus it's more fun! so let's begin.

First, we need to create a config file for each brokers:

config/server.properties config/server-1.properties
config/server.properties config/server-2.properties

Then we need to edit each (newly created) config files and update the following:

config/server-1.properties:

    broker.id=1

    port=9093

    log.dir=/tmp/kafka-logs-1

 

config/server-2.properties:

    broker.id=2

    port=9094

    log.dir=/tmp/kafka-logs-2

Since we have Zookeeper and our first Kafka instance already running, we can kick start these two extra nodes to join the mix!

# bin/kafka-server-start.sh config/server-1.properties

# bin/kafka-server-start.sh config/server-2.properties

Create a Topic

Let's create a topic with replication factor of three

# bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 3 --partitions 1 --topic replicated-topics

We can use "describe topics" command to describe how these brokers are structured. Remember we are now in cluster, and not a single broker instance.

# bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic replicated-topics

Topic:replicated-topics    PartitionCount:1    ReplicationFactor:3    Configs:

    Topic: replicated-topics    Partition: 0    Leader: 1    Replicas: 1,2,0    Isr: 1,2,0

To understand more about the output of each partitions. You can visit: http://kafka.apache.org/documentation.html#quickstart

Sending Messages

So let’s move forward and try to send some messages to our Topic:

# bin/kafka-console-producer.sh --broker-list localhost:9092 --topic replicated-topic

hello

hello 123

Then let's consume these messages

# bin/kafka-console-consumer.sh --zookeeper localhost:2181 --from-beginning --topic replicated-topic

hello

hello 123

Testing the Configuration

For the final leg of this quick tutorial, let's see if our set-up is fault tolerant by breaking our Server-1 (Broker) who is currently the leader.

# bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic replicated-topics | grep Leader

    Topic: replicated-topics    Partition: 0    Leader: 1    Replicas: 1,2,0    Isr: 1,2,0

# ps aux | grep server-1.properties

root      5054  1.7  1.5 4836168 248592 pts/4  Sl+  05:12   0:19 java -Xmx1G -Xms1G -server -XX:+UseParNewGC -XX:+UseConcMarkSweepGC.........

# kill -9 5054
# bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic replicated-topics | grep Leader
    Topic: replicated-topics    Partition: 0    Leader: 2    Replicas: 1,2,0    Isr: 2,0

You'll notice the leader switched to "server-2".  The messages should still be available for consumption even we shut our server-1.

# bin/kafka-console-consumer.sh --zookeeper localhost:2181 --from-beginning --topic replicated-topic
hello
hello 123

Reference: http://kafka.apache.org/documentation.html

Now for Automation

We can deploy a CoreOS instance using the cloud-init script to automate some of the tasks that were noted from our earlier (Ubuntu) tutorial.

So First, deploy a coreos instance and create a cloud-init script like the sample file I have below to start the essential services we need (i.e. docker, etcd2, and fleet). At the same time, this will also create the unit files which in this case, our Kafka and Zookeeper instances.

(make sure to generate a new discovery token)

After the installation, log-in to the machine (using 'core' as user), and ensure that docker, etcd2, and fleet are set to start everytime the machine reboot:

- systemctl enable docker
- systemctl enable etcd2
- systemctl enable fleet

Reboot the machine. Upon start-up, log-in again, and check the status of these services.

- systemctl status docker
- systemctl status etcd2
- systemctl status fleet

All three services should be up and running.

Now, we need bring up both Zookeeper and Kafka nodes. Since we already set the unit files, we can simply invoke these commands:

- fleetctl submit pktzoo-01.service
- fleetctl start pktzoo-01
- fleetctl status pktzoo-01

You can repeat the same procedure to start your Kafka node.