Kafka Connect is a tool for scalably and reliably streaming data between Apache Kafka® and other data
systems. It makes it simple to quickly define connectors that move large data sets into and out of Kafka.
Kafka Connect can ingest entire databases or collect metrics from all your application servers into
Kafka topics, making the data available for stream processing with low latency. An export connector
can deliver data from Kafka topics into secondary indexes like Elasticsearch or into batch systems
such as Hadoop for offline analysis.
What is Kafka Connect?
Kafka Connect is a free, open-source component of Apache Kafka® that works as a centralized data hub for
simple data integration between databases, key-value stores, search indexes, and file systems. The
information provided here is specific to Kafka Connect for Confluent Platform. For information about
Confluent Cloud connectors, see Connect External Systems to Confluent Cloud.
Tip
Confluent Cloud offers pre-built, fully managed, Kafka
connectors that make it easy to instantly connect to popular data sources and
sinks. With a simple GUI-based configuration and elastic scaling with no
infrastructure to manage, Confluent Cloud connectors make moving data in and out of
Kafka an effortless task, giving you more time to focus on application
development. For information about Confluent Cloud connectors, see Connect
External Systems to Confluent Cloud.
The benefits of Kafka Connect include:
- Data Centric Pipeline – Connect uses meaningful data abstractions to
pull or push data to Kafka.
- Flexibility and Scalability – Connect runs with streaming and
batch-oriented systems on a single node (standalone) or scaled to an
organization-wide service (distributed).
- Reusability and Extensibility – Connect leverages existing connectors
or extends them to tailor to your needs and provides lower time to production.
Kafka Connect is focused on streaming data to and from Kafka, making it simpler
for you to write high quality, reliable, and high performance connector plugins.
Kafka Connect also enables the framework to make guarantees that are difficult
to achieve using other frameworks. Kafka Connect is an integral component of
an ETL pipeline, when combined with Kafka and a stream processing framework.
How Kafka Connect Works
You can deploy Kafka Connect as a standalone process that runs jobs on a
single machine (for example, log collection), or as a distributed, scalable,
fault-tolerant service supporting an entire organization. Kafka Connect
provides a low barrier to entry and low operational overhead. You can start
small with a standalone environment for development and testing, and then scale
up to a full production environment to support a large organization’s data
pipeline.
Kafka Connect includes two types of connectors:
- Source connector – Ingests entire databases and streams table
updates to Kafka topics. A source connector can also collect metrics from all
your application servers and store these in Kafka topics, making the data
available for stream processing with low latency.
- Sink connector – Delivers data from Kafka topics into secondary indexes such
as Elasticsearch, or batch systems such as Hadoop for offline analysis.
To deploy Kafka Connect in your environment, see Getting Started with Kafka Connect.