You can do unit testing, integration testing, and schema compatibility testing
of your application in a CI/CD pipeline. This can be done locally or in a test
environment in Confluent Cloud. The Testing Your Streaming Application blog post
describes how to use tools to simulate parts of the Kafka services, including:
Once your application is up and running to Confluent Cloud, verify that all the
functional pieces of the architecture work and check the data flows from end to
end. After you complete functional validation, you can benchmark and optimize
your applications based on your service goals to tune performance.
Benchmarking
Benchmark testing is important because there is no one-size-fits-all
recommendation for the configuration parameters you need to develop Kafka
applications to Confluent Cloud. Proper configuration always depends on the use case,
other features you have enabled, the data profile, and more. You should run
benchmark tests if you plan to tune Kafka clients beyond the defaults. Regardless
of your service goals, you should understand what the performance profile of
your application is—it is especially important when you want to optimize for
throughput or latency. Your benchmark tests can also feed into the calculations
for determining the correct number of partitions and the number of producer and
consumer processes.
First, measure your bandwidth using the Kafka tools kafka-producer-perf-test
and kafka-consumer-perf-test
. This provides a baseline performance to your
Confluent Cloud instance, taking application logic out of the equation.
Then test your application, starting with the default Kafka configuration
parameters, and familiarize yourself with the default values.
Determine the baseline input performance profile for a given producer by
removing dependencies on anything upstream from the producer. Rather than
receiving data from upstream sources, modify your producer to generate its own
mock data at high output rates, such that the data generation is not a
bottleneck. Ensure the mock data reflects the type of data used in production to
produce results that more accurately reflect performance in production. Or,
instead of using mock data, consider using copies of production data or cleansed
production data in your benchmarking.
If you test with compression, be aware of how the mock data is
generated. Sometimes mock data is unrealistic, containing repeated substrings or
being padded with zeros, which may result in a better compression performance
than what would be seen in production.
- Run a single producer client on a single server and measure the resulting
throughput using the available JMX metrics for the Kafka producer. Repeat the
producer benchmarking test, increasing the number of producer processes on
the server in each iteration to determine the number of producer processes
per server to achieve the highest throughput.
- Determine the baseline output performance profile for a given
consumer in a similar way. Run a single consumer client on a single server
and repeat this test, increasing the number of consumer processes on the
server in each iteration to determine the number of consumer processes per
server to achieve the highest throughput.
- Run benchmark tests for different permutations of configuration parameters
that reflect your service goals. Focus on a subset of configuration
parameters, and avoid the temptation to discover and change other parameters
from their default values without understanding exactly how they impact the
entire system.
Tune the settings on each iteration, run a test, observe the results, tune
again, and so on, until you identify settings that work for your throughput and
latency requirements.
Refer to this blog post
when considering partition count in your benchmark tests.
Determining your Service Goals
Though it may take only a few seconds to get your Kafka client application up and
running to Confluent Cloud, you should tune your application before going into
production. Since different use cases have different sets of requirements that
drive different service goals, you must decide what is your service goal:
You should consider the following criteria when identifying your service goals:
- The use cases your Kafka applications serves.
- Your applications and business requirements–elements that can’t fail for the
use case to be satisfied.
- How Kafka fits into the pipeline of your business.
While it may be hard to answer the question of which metrics to optimize, it is
important that you discuss the original business use cases and main goals with
your team for the following two reasons:
You are unable to maximize all goals at the same time.
There are occasionally trade-offs between throughput, latency, durability,
and availability. You may be familiar with the common trade-off in
performance between throughput and latency and perhaps between durability and
availability as well. As you consider the whole system, you may find that you
can’t consider about any of them in isolation, which is why this paper looks
at all four service goals together. This doesn’t mean that optimizing one of
these goals results in completely losing out on the others. It just means
that they are all interconnected, and thus you can’t maximize all of them at
the same time.
You must identify the service goals you want to optimize so you
can tune your Kafka configuration parameters to achieve them, and you must
understand what your users expect from the system to ensure you are
optimizing Kafka to meet their needs. You should take time to answer the
following questions:
Do you want to optimize for high throughput, which is the rate that
data is moved from producers to brokers or brokers to consumers?
Use case: an application with millions of writes per second. Because of
Kafka’s design, writing large volumes of data into it isn’t a hard thing to
do. It’s faster than trying to push volumes of data through a traditional
database or key-value store, and it can be done with modest hardware.
Do you want to optimize for low latency, which is the time elapsed
moving messages end to end (from producers to brokers to consumers)?
Use case: A chat application, where the recipient of a message needs to
get the message with as little latency as possible. Other examples include
interactive websites where users follow posts from friends in their
network, or real-time stream processing for the Internet of Things (IoT).
Do you want to optimize for high durability, which guarantees that
committed messages will not be lost?
Use case: An event streaming microservices pipeline using Kafka as the
event store. Another is for integration between an event streaming source
and some permanent storage (for examples, Amazon S3) for mission-critical
business content.
Do you want to optimize for high availability, which minimizes
downtime in case of unexpected failures?
Use case: An application that must be always on. Kafka should be
optimized to recover from failures as quickly as possible.