CONFLUENT PLATFORM
Several performance stress tests were run to determine efficiency, time, and resource usage of Self-Balancing for various use cases. The tests profile the Kafka controller node because that is the node on which Self-Balancing code will be running.
High level results of performance testing for the following common use cases is provided below:
The goal is to test a large number of replicas/partitions on a 5 broker cluster and then expand the cluster to verify that Self-Balancing automatically adds the brokers to the cluster and redistributes the partitions across all nodes.
The test checks the impact of generating a rebalancing plan for “add broker” operations to incrementally expand a 5 broker cluster to 8 brokers. Once the plan is computed and submitted, Self-Balancing throttles and reassigns partitions in batches. The test validates that the reassignment completes and that it results in a balanced cluster with an evenly distributed workload on all 8 brokers.
The goal is to test scalability of Self-Balancing on a 40 broker cluster (high end) with up to 1000 replicas per broker by expanding and then shrinking the cluster.
The test assesses the impact of generating a rebalancing plan for “add broker” operations to expand the cluster from 39 brokers to 48 brokers with about 21,600 replicas on them.
The test then increases replicas to 2250 per broker (108,000 on 48 brokers) and checks the impact of generating a plan for “remove broker” operations to shrink the cluster.
metric.sampling.interval.ms
The goal of this test is to verify that when a controller is lost, Kafka executes failover efficiently, and assigns a new controller with no disruption from Self-Balancing to controller startup.
The test removes the Kafka controller several times to determine the impact of Self-Balancing failover on controller startup.
Self-Balancing runs on same node as Kafka controller, so controller failover also triggers Self-Balancing failover, which runs Self-Balancing startup code at the same time as controller startup code.