Kudu Connector (Source and Sink) for Confluent Platform
You can use the Kafka Connect Kudu source connector to import data from columnar relational database Kudu with Impala JDBC driver into Apache Kafka®
topics. You can use the Kudu sink connector to export data from Kafka topics to Kudu with Impala JDBC driver.
Install the Kudu Connector
You can install this connector by using the instructions or you can
manually download the ZIP file.
If you are running a multi-node Connect cluster, the Kudu connector and Impala JDBC driver JARs must be installed on every
Connect worker in the cluster. See below for details.
Install the connector using Confluent Hub
- Prerequisite
- Confluent Hub Client must be installed. This is installed by default with Confluent Enterprise.
Navigate to your Confluent Platform installation directory and run the following command to install the latest (latest
) connector version. The connector must be installed on every machine where Connect will run.
confluent-hub install confluentinc/kafka-connect-kudu:latest
You can install a specific version by replacing latest
with a version number. For example:
confluent-hub install confluentinc/kafka-connect-kudu:1.0.0-preview
Installing Impala JDBC Driver
The Kudu source and sink connectors use the
Java Database Connectivity (JDBC) API .
In order for this to work, the connectors must use Impala to query Kudu database, and have Impala JDBC Driver installed.
The basic steps of installation are:
- Download Impala JDBC Connector, and unzip to get the JAR files.
- Place these JAR files into the
share/confluent-hub-components/confluentinc-kafka-connect-kudu/lib
directory in
your Confluent Platform installation on each of the Connect worker nodes.
- Restart all of the Connect worker nodes.
General Guidelines
The following are additional guidelines to consider:
- Use the most recent version of the Impala JDBC driver available.
- Use the correct JAR file for the Java version used to run Connect workers.
Make sure to use the correct JAR file for the Java version in use.
If you install and try to use the Impala JDBC driver JAR file for the wrong version of Java,
starting any Kudu source connector
or Kudu sink connector will likely fail with
UnsupportedClassVersionError
.
If this happens, remove the Impala JDBC driver JAR file you installed and repeat the driver installation process
with the correct JAR file.
- The
share/confluent-hub-components/confluentinc-kafka-connect-kudu/lib
directory mentioned above is for Confluent Platform.
If you are using a different installation, find the location where the Confluent Kudu source and sink connector
JAR files are located, and place the Impala JDBC driver JAR file(s) for the target databases into the same directory.
- If the Impala JDBC driver is not installed correctly, the
Kudu source or sink connector will fail on startup. Typically, the system throws the error
No suitable driver found
. If this happens, install the Impala JDBC driver again.
Limitations
- Kudu does not support
DATE
and TIME
types. Connect Date
, Time
and Timestamp
types all will be mapped to Impala TIMESTAMP
type, which corresponds to Kudu unixtime_micros
type.
- Impala does not support
BINARY
type, so our connectors will not accept binary data as well.
- Complex data types like
Array
, Map
and Struct
are not supported.
- For
Decimal
type, both Impala and Kudu allow at most 38 precision. And our connector shall observe the cap.