Confluent Control Center is a web-based tool for managing and monitoring Apache Kafka®. Control Center
provides a user interface that allows developers and operators to get a quick
overview of cluster health, observe and control messages, topics, and Schema Registry, and to develop
and run ksqlDB queries.
Control Center includes the following pages where you can drill down to view data and
configure features in your Apache Kafka® environment.
Architecture
Control Center is comprised of these parts:
- Metrics interceptors that collect metric data on clients (producers and consumers).
- Kafka to move metric data.
- The Control Center application server for analyzing stream metrics.
Here is a common Kafka environment that uses Kafka to transport
messages from a set of producers to a set of consumers that
are in different data centers, and uses Replicator to copy data from one cluster to another:
Confluent Control Center helps you detect any issues when moving data, including any late,
duplicate, or lost messages. By adding
lightweight code to clients, stream monitoring can count every
message sent and received in a streaming application. By
using Kafka to send metrics information, stream monitoring metrics are transmitted quickly
and reliably to the
Control Center application.
Time windows and metrics
Stream monitoring is designed to efficiently audit the set of messages that are
sent and received. To do this, Control Center
uses a set of techniques to measure and verify delivery.
The interceptors work by collecting metrics on messages
produced or consumed on each client, and
sending these to
Control Center for analysis and reporting. Interceptors use Kafka message timestamps to group messages.
Specifically, the interceptors will collect metrics during a one minute time window based on this
timestamp. You can
calculate this by a function like floor(messageTimestamp / 60) * 60
. Metrics are
collected for each combination
of producer, consumer group, consumer, topic, and partition. Currently, metrics include a
message count and cumulative
checksum for producers and consumer, and latency information from consumers.
Latency and system clock implications
Latency is measured by calculating the difference between the system clock time on the consumer
and the timestamp in
the message. In a distributed environment, it can be difficult to keep clocks synchronized.
If the clock on the consumer
is running faster than the clock on the producer, then Control Center might show latency
values that are higher
than the true values. If the clock on the consumer is running slower than the clock on the
producer, then Control Center
might show latency values that are lower than the true values (and in the worst case, negative values).
If your clocks are out of sync, you might notice some unexpected results in Confluent Control Center. Confluent recommends using
a mechanism like NTP to synchronize time between production machines; this can help
keep clocks synchronized to
within 20ms over the public internet, and to within 1 ms for servers on the same local network.
Tip
NTP practical example:
In an environment where messages take one second or more to be produced and
consumed, and NTP is used to synchronize
clocks between machines, the latency information should be accurate to within 2%.