CONFLUENT PLATFORM
Apache Kafka® brokers and clients report many internal metrics. JMX is the default reporter, though you can add any pluggable reporter.
You can deploy Confluent Control Center for out-of-the-box Kafka cluster monitoring so you don’t have to build your own monitoring system. Control Center makes it easy to manage the entire Confluent Platform. Control Center is a web-based application that allows you to manage your cluster, to monitor Kafka system health in predefined dashboards and to alert on triggers. Additionally, Control Center reports end-to-end stream monitoring to assure that every message is delivered from producer to consumer, measures how long messages take to be delivered, and determines the source of any issues in your cluster.
Tip
See also
For an example that shows this in action, see the Confluent Platform demo. Refer to the demo’s docker-compose.yml file for a configuration reference.
Confluent Control Center and Confluent Cloud monitors the following important operational broker metrics aggregated across the cluster, and per broker or per topic where applicable. Control Center provides built-in dashboards for viewing these metrics, and we recommend that you set alerts at least on the first three.
kafka.server:type=ReplicaManager,name=UnderMinIsrPartitionCount
kafka.server:type=ReplicaManager,name=UnderReplicatedPartitions
kafka.server:type=ReplicaManager,name=ReassigningPartitions
kafka.cluster:type=Partition,topic={topic},name=UnderMinIsr,partition={partition}
acks=all
kafka.controller:type=KafkaController,name=OfflinePartitionsCount
kafka.controller:type=KafkaController,name=ActiveControllerCount
kafka.controller:type=KafkaController,name=GlobalPartitionCount
kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec
kafka.server:type=BrokerTopicMetrics,name=BytesOutPerSec
kafka.network:type=RequestMetrics,name=RequestsPerSec,request={Produce|FetchConsumer|FetchFollower}
kafka.server:type=BrokerTopicMetrics,name=TotalProduceRequestsPerSec
kafka.server:type=BrokerTopicMetrics,name=TotalFetchRequestsPerSec
kafka.server:type=BrokerTopicMetrics,name=FailedProduceRequestsPerSec
kafka.server:type=BrokerTopicMetrics,name=FailedFetchRequestsPerSec
kafka.server:type=BrokerTopicMetrics,name=ReassignmentBytesInPerSec
kafka.server:type=BrokerTopicMetrics,name=ReassignmentBytesOutPerSec
kafka.controller:type=ControllerStats,name=LeaderElectionRateAndTimeMs
kafka.controller:type=ControllerStats,name=UncleanLeaderElectionsPerSec
kafka.server:type=ReplicaManager,name=PartitionCount
kafka.server:type=ReplicaManager,name=LeaderCount
auto.leader.rebalance.enable
true
kafka.server:type=ReplicaFetcherManager,name=MaxLag,clientId=Replica
replica.lag.max.messages
kafka.server:type=KafkaRequestHandlerPool,name=RequestHandlerAvgIdlePercent
0
1
kafka.server:type=Produce,name=DelayQueueSize
kafka.network:type=SocketServer,name=NetworkProcessorAvgIdlePercent
kafka.network:type=RequestChannel,name=RequestQueueSize
kafka.network:type=RequestChannel,name=ResponseQueueSize
kafka.server:type=socket-server-metrics,listener={listener_name},networkProcessor={#},name=connection-count
kafka.server:type=socket-server-metrics,listener={listener_name},networkProcessor={#},name=connection-creation-rate
kafka.network:type=RequestMetrics,name=TotalTimeMs,request={Produce|FetchConsumer|FetchFollower}
kafka.network:type=RequestMetrics,name=RequestQueueTimeMs,request={Produce|FetchConsumer|FetchFollower}
kafka.network:type=RequestMetrics,name=LocalTimeMs,request={Produce|FetchConsumer|FetchFollower}
kafka.network:type=RequestMetrics,name=RemoteTimeMs,request={Produce|FetchConsumer|FetchFollower}
kafka.network:type=RequestMetrics,name=ResponseQueueTimeMs,request={Produce|FetchConsumer|FetchFollower}
kafka.network:type=RequestMetrics,name=ResponseSendTimeMs,request={Produce|FetchConsumer|FetchFollower}
Here are other available metrics you may optionally observe on a Kafka broker.
kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec
kafka.log:type=LogFlushStats,name=LogFlushRateAndTimeMs
kafka.server:type=ReplicaManager,name=IsrShrinksPerSec
kafka.server:type=ReplicaManager,name=IsrExpandsPerSec
kafka.server:type=FetcherLagMetrics,name=ConsumerLag,clientId=([-.\w]+),topic=([-.\w]+),partition=([0-9]+)
kafka.server:type=DelayedOperationPurgatory,delayedOperation=Produce,name=PurgatorySize
kafka.server:type=DelayedOperationPurgatory,delayedOperation=Fetch,name=PurgatorySize
fetch.wait.max.ms
Confluent Control Center monitors the following important operational broker metrics relating to ZooKeeper. We expose counts for ZooKeeper state transitions, which can help to spot problems, e.g., with broker sessions to ZooKeeper. The metrics currently show the rate of transitions per second for each one of the possible states. Here is the list of the counters we expose, one for each possible ZooKeeper client states.
kafka.server:type=SessionExpireListener,name=ZooKeeperDisconnectsPerSec
kafka.server:type=SessionExpireListener,name=ZooKeeperExpiresPerSec
The ZooKeeper session has expired. When a session expires, we can have leader changes and even a new controller. It is important to keep an eye on the number of such events across a Kafka cluster and if the overall number is high, then we have a few recommendations:
zookeeper.session.timeout.ms
Here are other available ZooKeeper metrics you may optionally observe on a Kafka broker.
kafka.server:type=SessionExpireListener,name=ZooKeeperSyncConnectsPerSec
kafka.server:type=SessionExpireListener,name=ZooKeeperAuthFailuresPerSec
kafka.server:type=SessionExpireListener,name=ZooKeeperReadOnlyConnectsPerSec
kafka.server:type=SessionExpireListener,name=ZooKeeperSaslAuthenticationsPerSec
kafka.server:type=SessionExpireListener,name=SessionState
CONNECTED
Starting with 0.8.2, the new producer exposes the following metrics:
MBean: kafka.producer:type=producer-metrics,client-id=([-.w]+)
request-latency-avg
request-latency-max
request-rate
response-rate
incoming-byte-rate
outgoing-byte-rate
connection-count
connection-creation-rate
connection-close-rate
io-ratio
io-time-ns-avg
io-wait-ratio
select-rate
io-wait-time-ns-avg
MBean: kafka.producer:type=producer-node-metrics,client-id=([-.w]+),node-id=([0-9]+)
Besides the Global Request Metrics, the following metrics are also available per broker:
request-size-max
request-size-avg
MBean: kafka.producer:type=producer-topic-metrics,client-id=([-.w]+),topic=([-.w]+)
Besides the Global Request Metrics, the following metrics are also available per topic:
byte-rate
record-send-rate
compression-rate
record-retry-rate
record-error-rate
confluent-audit-metrics:name=audit-log-rate-per-minute
confluent-audit-metrics:name=fallback-rate-per-minute
confluent-authorizer-metrics:name=authorization-request-rate-per-minute
confluent-authorizer-metrics:name=authorization-allowed-rate-per-minute
confluent-authorizer-metrics:name=authorization-denied-rate-per-minute
confluent.metadata:type=LdapGroupManager,name=failure-start-seconds-ago
confluent.metadata:type=KafkaAuthStore,name=writer-failure-start-seconds-ago
confluent.metadata:type=KafkaAuthStore,name=reader-failure-start-seconds-ago
confluent.metadata:type=KafkaAuthStore,name=remote-failure-start-seconds-ago
confluent.metadata:type=KafkaAuthStore,name=active-writer-count
confluent.metadata:type=KafkaAuthStore,name=metadata-status,topic=([-.\w]+),partition=([0-9]+)
confluent.metadata:type=KafkaAuthStore,name=record-send-rate,topic=([-.\w]+),partition=([0-9]+)
confluent.metadata:type=KafkaAuthStore,name=record-error-rate,topic=([-.\w]+),partition=([0-9]+)
confluent-auth-store-metrics:name=rbac-role-bindings-count
confluent-auth-store-metrics:name=rbac-access-rules-count
confluent-auth-store-metrics:name=acl-access-rules-count
Starting with Kafka 0.9.0.0, the new consumer exposes the following metrics:
MBean: kafka.consumer:type=consumer-fetch-manager-metrics,client-id=([-.w]+)
records-lag-max
fetch-size-avg
fetch-size-max
bytes-consumed-rate
records-per-request-avg
records-consumed-rate
fetch-rate
fetch-latency-avg
fetch-latency-max
fetch-throttle-time-avg
fetch-throttle-time-max
MBean: kafka.consumer:type=consumer-fetch-manager-metrics,client-id=([-.w]+),topic=([-.w]+)
MBean: kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.w]+)
assigned-partitions
commit-latency-avg
commit-latency-max
commit-rate
join-rate
join-time-avg
join-time-max
sync-rate
sync-time-avg
sync-time-max
heartbeat-rate
heartbeat.interval.ms
heartbeat-response-time-max
last-heartbeat-seconds-ago
MBean: kafka.consumer:type=consumer-metrics,client-id=([-.w]+)
MBean: kafka.consumer:type=consumer-node-metrics,client-id=([-.w]+),node-id=([0-9]+)
kafka.consumer:type=ConsumerFetcherManager,name=MaxLag,clientId=([-.\w]+)
kafka.consumer:type=ConsumerFetcherManager,name=MinFetchRate,clientId=([-.\w]+)
kafka.consumer:type=ConsumerTopicMetrics,name=MessagesPerSec,clientId=([-.\w]+)
The following metrics are available only on the high-level consumer:
kafka.consumer:type=ZookeeperConsumerConnector,name=KafkaCommitsPerSec,clientId=([-.\w]+)
offsets.storage=kafka
kafka.consumer:type=ZookeeperConsumerConnector,name=ZooKeeperCommitsPerSec,clientId=([-.\w]+)
offsets.storage=zookeeper
kafka.consumer:type=ZookeeperConsumerConnector,name=RebalanceRateAndTime,clientId=([-.\w]+)
kafka.consumer:type=ZookeeperConsumerConnector,name=OwnedPartitionsCount,clientId=([-.\w]+),groupId=([-.\w]+)