Confluent Replicator
Confluent Replicator allows you to easily and reliably replicate
topics from one Apache Kafka® cluster to another. In addition to copying the
messages, this connector will create topics as needed preserving the
topic configuration in the source cluster. This includes preserving
the number of partitions, the replication factor, and any
configuration overrides specified for individual topics.
The Replicator Connector supports replication in the context of several use cases, including:
Features
Replicator supports the following features:
Install Replicator Connector
Important
This connector is bundled natively with Confluent Platform. If you have Confluent Platform installed and running, there are no additional
steps required to install.
If you are using Confluent Platform using only Confluent Community components, you can install the connector using the Confluent Hub Client (recommended) or you can manually download the ZIP file.
Install the connector using Confluent Hub
- Prerequisite
- Confluent Hub Client must be installed. This is installed by default with Confluent Enterprise.
Navigate to your Confluent Platform installation directory and run the following command to install the latest (latest
) connector version. The connector must be installed on every machine where Connect will run.
confluent-hub install confluentinc/kafka-connect-|crep|:latest
You can install a specific version by replacing latest
with a version number. For example:
confluent-hub install confluentinc/kafka-connect-replicator:6.0.0
Quick Start
See Tutorial: Replicating Data Between Clusters in Multi-DC Deployment Architectures.
Note
If deploying Confluent Platform on AWS VMs and running Replicator as a connector,
be aware that VMs with burstable CPU types (T2, T3, T3a, and T4g) will not support high throughput streaming workloads. Replicator worker nodes running on these VMs
experience throughput degradation due to credits expiring, making these VMs unsuitable for Confluent Platform nodes expected to run at elevated CPU levels for a sustained period of time,
and supporting workloads that are above and beyond their baseline resource rates.