Snowflake Sink Connector for Confluent Cloud
The Kafka Connect Snowflake Sink connector for Confluent Cloud maps and persists
events from Apache Kafka® topics directly to a Snowflake database. The connector
supports Avro, JSON Schema, Protobuf, or JSON (schemaless) data from Apache Kafka®
topics. It ingests events from Kafka topics directly into a Snowflake
database, exposing the data to services for querying, enrichment, and analytics.
Important
If you are still on Confluent Cloud Enterprise, please contact your Confluent Account
Executive for more information about using this connector.
Features
The Snowflake sink connector provides the following features:
- Database authentication: Uses private key authentication.
- Input data formats: The connector supports Avro, JSON Schema, Protobuf, or JSON (schemaless) input data formats. Schema Registry must be enabled to use a Schema Registry-based format (for example, Avro, JSON_SR (JSON Schema), or Protobuf).
- Select configuration properties: The following properties determine what metadata is included in the
RECORD_METADATA
column in the Snowflake database table.
snowflake.metadata.createtime
: If this value is set to false, the CreateTime
property value is omitted from the metadata in the RECORD_METADATA
column. The default value is true.
snowflake.metadata.topic
: If this value is set to false, the topic
property value is omitted from the metadata in the RECORD_METADATA
column. The default value is true.
snowflake.metadata.offset.and.partition
: If value is set to false, the Offset
and Partition
property values are omitted from the metadata in the RECORD_METADATA
column. The default value is true.
snowflake.metadata.all
: If value is set to false, the metadata in the RECORD_METADATA
column is completely empty. The default value is true.
You can manage your full-service connector using the Confluent Cloud API. For details, see the Confluent Cloud API documentation.
Configuration properties that are not shown in the Confluent Cloud UI use the default
values. For more information, see the Snowflake Sink Connector Configuration Properties.
For more information, see the Confluent Cloud connector limitations.
Target table naming guidelines
Note the following table naming guidelines and limitations:
Confluent Cloud and the Confluent Cloud managed Snowflake Sink connector allow you to configure topic:table
name mapping. This feature is also supported by the self-managed Snowflake Sink connector.
Snowflake itself has limitations on object (table) naming conventions. See Identifier Requirements for details.
Kafka is much more permissive with topic naming conventions. You are allowed to use Kafka topic names that break the table name mapping in the Confluent Cloud Snowflake Sink connector.
When a Kafka topic name does not conform to Snowflake’s table
naming limitations (for example, my-topic-name
), the connector will rename
the topic to a safe name with an appended hash (for example,
my_topic_name_021342
). A conforming topic name (for example,
my_topic_name
) will send results to the expected table named
my_topic_name
.
- If the connector needs to adjust the name of the table created for a Kafka topic, there is the potential for identical table names. For example, if you are reading data from Kafka topics
numbers+x
and numbers-x
, the tables created for these topics will both be named NUMBERS_X
. To avoid table name duplication, the connector appends a suffix to the table name. The suffix is an underscore followed by a generated hash.
Generate a Snowflake key pair
Before the connector can sink data to Snowflake, you need to generate a key pair. Snowflake authentication requires 2048-bit (minimum) RSA. You add the public key to a Snowflake user account. You add the private key to the connector configuration (when completing the Quick Start instructions).
Note
This procedure generates an unencrypted private key. You can generate and use an encrypted key. If you generate an encrypted key, you add the passphrase to your connector configuration in addition to the private key. For information about generating an encrypted key, see Using Key Pair Authentication in the Snowflake documentation.
Creating the key pair
Complete the following steps to generate a key pair.
Generate a private key using OpenSSL.
openssl genrsa -out snowflake_key.pem 2048
Generate the public key referencing the private key.
openssl rsa -in snowflake_key.pem -pubout -out snowflake_key.pub
List the generated Snowflake key files.
ls -l snowflake_key*
-rw-r--r-- 1 1679 Jun 8 17:04 snowflake_key.pem
-rw-r--r-- 1 451 Jun 8 17:05 snowflake_key.pub
Show the contents of the public key file.
cat snowflake_key.pub
-----BEGIN PUBLIC KEY-----
MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA2zIuUb62JmrUAMoME+SX
vsz9KUCp/cC+Y+kTGfYB3jRDQ06O0UT+yUKMO/KWuc0dUxZ8s9koW5l/n+TBfxIQ
... omitted
1tD+Ktd/CTXPoVEI2tgCC9Avf/6/9HU3IpV0gL8SZ8U0N5ot4Uw+CSYB3JjMagEG
bBWZ8Qc26pFk7Fd17+ykH6rEdLeQ9OElc0ZruVwSsa4AxaZOT+rqCCP7FQPzKTtA
JQIDAQAB
-----END PUBLIC KEY-----
Copy the key. You will add it to a new user in Snowflake. Copy only the part of the key between --BEGIN PUBLIC KEY--
and --END PUBLIC KEY--
). You can do this manually or you can use the following command:
grep -v "BEGIN PUBLIC" snowflake_key.pub | grep -v "END PUBLIC"|tr -d '\r\n'
In the following section you create a user and add the public key.
Creating a user and adding the public key
Open your Snowflake project. Complete the following steps to create a user account and add the public key to this account.
Go to the Worksheets panel and switch to the SECURITYADMIN role.
Important
Make sure you set the SECURITYADMIN role in the Worksheets panel (shown below) and not using the user account drop-down selection. For additional information, see User Management.
Run the following query in Worksheets to create a user, add the public key copied earlier, and grant the SYSADMIN role to the user.
CREATE USER admin RSA_PUBLIC_KEY='<public-key>';
Make sure to add the public key as a single line in the statement.The following shows what this looks like in Snowflake Worksheets:
Tip
If you did not set the role to SECURITYADMIN, or if you set the role using the user account drop-down menu, an SQL access control error is displayed.
SQL access control error: Insufficient privileges to operate on account '<account-name>'
Configuring user privileges
Complete the following steps to set the correct privileges for the user added.
For example: Suppose you want to send Apache Kafka® records to a database named
PRODUCTION
using the schema PUBLIC
. The following shows the required
queries to configure the necessary user privileges.
# Use a role that can create and manage roles and privileges:
use role securityadmin;
# Create a Snowflake role with the privileges to work with the connector
create role kafka_connector_role;
# Grant privileges on the database:
grant usage on database PRODUCTION to role kafka_connector_role;
# Grant privileges on the schema:
grant usage on schema PRODUCTION.PUBLIC to role kafka_connector_role;
grant create table on schema PRODUCTION.PUBLIC to role kafka_connector_role;
grant create stage on schema PRODUCTION.PUBLIC to role kafka_connector_role;
grant create pipe on schema PRODUCTION.PUBLIC to role kafka_connector_role;
# Grant the custom role to an existing user:
grant role kafka_connector_role to user admin;
# Make the new role the default role:
alter user admin set default_role=kafka_connector_role;
Quick Start
Use this quick start to get up and running with the Confluent Cloud Snowflake sink
connector. The quick start provides the basics of selecting the connector and
configuring it to consume data from Kafka and persist the data to a Snowflake
database.
- Prerequisites
- Authorized access to a Confluent Cloud cluster on Amazon Web Services (AWS), Microsoft Azure (Azure), or Google Cloud Platform (GCP).
- The Confluent Cloud CLI installed and configured for the cluster. See Install and Configure the Confluent Cloud CLI
- Schema Registry must be enabled to use a Schema Registry-based format (for example, Avro, JSON_SR (JSON Schema), or Protobuf).
- A Snowflake account and key pair to use for connector authentication with the Snowflake database.
- The user created must be granted privileges in Snowflake to modify the database and schema. For more information, see Access Control Privileges.
- The Snowflake database and the Kafka cluster should be in the same region.
- Kafka cluster credentials. You can use one of the following ways to get credentials:
- Create a Confluent Cloud API key and secret. To create a key and secret, go to Kafka API keys in your cluster or you can autogenerate the API key and secret directly in the UI when setting up the connector.
- Create a Confluent Cloud service account for the connector.
Using the Confluent Cloud GUI
Step 2: Add a connector.
Click Connectors. If you already have connectors in your cluster, click Add connector.
Step 3: Select your connector.
Click the Snowflake Sink connector icon.
Step 4: Set up the connection.
Complete the following and click Continue.
Note
- Make sure you have all your prerequisites completed.
- An asterisk ( * ) designates a required entry.
Select one or more topics.
Enter a connector name.
Enter your Kafka Cluster credentials. The credentials are either the API key and secret or the service account API key and secret.
Select an Input message format (data coming from the Kafka topic): AVRO, JSON_SR (JSON Schema), PROTOBUF, or JSON (schemaless). A valid schema must be available in Schema Registry to use a schema-based message format (for example, Avro, JSON_SR (JSON Schema), or Protobuf).
Enter the Snowflake connection details:
- Connection URL: Enter the URL for accessing your Snowflake account. Use the format
https://<account_name>.<region_id>.snowflakecomputing.com:443
. The https://
and 443
port number are optional. Do not use the region ID if your account is in the AWS US West region and you are using AWS PrivateLink.
- Connection user name: Enter the user name created earlier.
- Private key:: Enter the private key created earlier as a single line. Enter only the part of the key between
--BEGIN RSA PRIVATE KEY--
and --END RSA PRVATE KEY--
.
- Database name: Enter the database name containing the table to insert rows into.
Enter the Snowflake Schema name that contains the table to insert rows into.
(Optional) Enter the private key passphrase. This is required if you created an encrypted key when generating the key pair.
(Optional) Select whether or not to include the following metadata in the RECORD_METADATA
column in the database table.
- createtime: If this value is set to false, the
CreateTime
property value is omitted from the metadata in the RECORD_METADATA
column. The default value is true.
- topic: If this value is set to false, the
topic
property value is omitted from the metadata in the RECORD_METADATA
column. The default value is true.
- offset and partition: If this value is set to false, the
Offset
and Partition
property values are omitted from the metadata in the RECORD_METADATA
column. The default value is true.
- all metadata: If this value is set to false, the metadata in the
RECORD_METADATA
column is completely empty. The default value is true.
For details about metadata, see Schema of Topics in the Snowflake documentation.
Enter the number of tasks for the connector. Refer to Confluent Cloud connector limitations for additional information.
Step 5: Launch the connector.
Verify the connection details and click Launch.
Step 6: Check the connector status.
The status for the connector should go from Provisioning to Running. It may take a few minutes.
Step 7: Check Snowflake
After the connector is running, verify that messages are populating your Snowflake database table.
You can manage your full-service connector using the Confluent Cloud API. For details, see the Confluent Cloud API documentation.
Tip
When you launch a connector, a Dead Letter Queue topic is automatically created. See Dead Letter Queue for details.
For Snowflake troubleshooting, see Troubleshooting Issues in the Snowflake documentation.
Note
- The Snowflake Sink connector does not remove Snowflake pipes when a connector is deleted. For instructions to manually clean up Snowflake pipes, see Dropping Pipes.
- Snowflake Snowpipe failure can prevent messages from showing up in the target table despite being successfully written by the Snowflake Sink connector. If this happens, please check the Snowflake COPY_HISTORY view, internal stage, or table stage to find the message and associated error. For more on the workflow of Snowflake Sink connector, see Workflow for the Kafka Connector.
For additional information about this connector, see the Snowflake Connector for Kafka documentation. Note that not all connector features are provided in the Confluent Cloud connector.
See also
For an example that shows fully-managed Confluent Cloud connectors in action with Confluent Cloud ksqlDB, see the Cloud ETL Demo. This example also shows how to use Confluent Cloud CLI to manage your resources in Confluent Cloud.
Using the Confluent Cloud CLI
Complete the following steps to set up and run the connector using the Confluent Cloud CLI.
Step 1: List the available connectors.
Enter the following command to list available connectors:
ccloud connector-catalog list
Step 2: Show the required connector configuration properties.
Enter the following command to show the required connector properties:
ccloud connector-catalog describe <connector-catalog-name>
For example:
ccloud connector-catalog describe SnowflakeSink
Example output:
Following are the required configs:
connector.class: SnowflakeSink
name
kafka.api.key
kafka.api.secret
input.data.format
snowflake.url.name
snowflake.user.name
snowflake.private.key
snowflake.schema.name
tasks.max
topics
Step 3: Create the connector configuration file.
Create a JSON file that contains the connector configuration properties. The following example shows the required connector properties.
{
"connector.class": "SnowflakeSink",
"name": "<connector-name>",
"kafka.api.key": "<my-kafka-api-key>",
"kafka.api.secret": "<my-kafka-api-secret>",
"topics": "<topic1>, <topic2>",
"input.data.format": "JSON",
"snowflake.url.name": "https://wm83168.us-central1.gcp.snowflakecomputing.com:443",
"snowflake.user.name": "<login-username>",
"snowflake.private.key": "<private-key>",
"snowflake.database.name": "<database-name>",
"snowflake.schema.name": "<schema-name>",
"tasks.max": "1"
}
Note the following required property definitions:
"connector.class"
: Identifies the connector plugin name.
"name"
: Enter a name for your connector.
"topics"
: Enter one topic or multiple comma-separated topics.
"input.data.format"
: Sets the input message format (data coming from the Kafka topic). Valid entries are AVRO, JSON_SR, PROTOBUF, or JSON. You must have Confluent Cloud Schema Registry configured if using a schema-based message format (for example, Avro, JSON_SR (JSON Schema), or Protobuf).
"snowflake.url.name"
: Enter the URL for accessing your Snowflake account. Use the format https://<account_name>.<region_id>.snowflakecomputing.com:443
. The https://
and 443
port number are optional. Do not use the region ID if your account is in the AWS US West region and you are using AWS PrivateLink.
"snowflake.user.name"
: Enter the user name created earlier.
"snowflake.private.key"
:
- Enter the private key created earlier as a single line.
- Enter only the part of the key between
--BEGIN RSA PRIVATE KEY--
and --END RSA PRVATE KEY--
.
"snowflake.database.name"
: Enter the database name containing the table to insert rows into.
"snowflake.schema.name"
: Enter the Snowflake Schema name that contains the table to insert rows into.
"tasks.max"
: Enter the number of tasks for the connector. Refer to Confluent Cloud connector limitations for additional information.
The following are optional properties to include in the configuration. These
properties affect what metadata is included in the RECORD_METADATA
column in
the Snowflake database table.
"snowflake.metadata.createtime"
: If this value is set to "false"
, the CreateTime
property value is omitted from the metadata in the RECORD_METADATA
column. The default value is "true"
.
"snowflake.metadata.topic"
: If this value is set to "false"
, the topic
property value is omitted from the metadata in the RECORD_METADATA
column. The default value is "true"
.
"snowflake.metadata.offset.and.partition"
: If value is set to "false"
, the Offset
and Partition
property values are omitted from the metadata in the RECORD_METADATA
column. The default value is "true"
.
"snowflake.metadata.all"
: If value is set to "false"
, the metadata in the RECORD_METADATA
column is completely empty. The default value is "true"
.
Step 4: Load the properties file and create the connector.
Enter the following command to load the configuration and start the connector:
ccloud connector create --config <file-name>.json
For example:
ccloud connector create --config snowflake-sink.json
Example output:
Created connector confluent-snowflake lcc-ix4dl
Step 5: Check the connector status.
Enter the following command to check the connector status:
Example output:
ID | Name | Status | Type
+-----------+-------------------------+---------+------+
lcc-ix4dl | confluent-snowflake | RUNNING | sink
Step 6: Check Snowflake
After the connector is running, verify that records are populating your Snowflake database.
You can manage your full-service connector using the Confluent Cloud API. For details, see the Confluent Cloud API documentation.
Tip
When you launch a connector, a Dead Letter Queue topic is automatically created. See Dead Letter Queue for details.
For Snowflake troubleshooting, see Troubleshooting Issues in the Snowflake documentation.
Note
- The Snowflake Sink connector does not remove Snowflake pipes when a connector is deleted. For instructions to manually clean up Snowflake pipes, see Dropping Pipes.
- Snowflake Snowpipe failure can prevent messages from showing up in the target table despite being successfully written by the Snowflake Sink connector. If this happens, please check the Snowflake COPY_HISTORY view, internal stage, or table stage to find the message and associated error. For more on the workflow of Snowflake Sink connector, see Workflow for the Kafka Connector.
For additional information about this connector, see the Snowflake Connector for Kafka documentation. Note that not all connector features are provided in the Confluent Cloud connector.
Troubleshooting
For Snowflake troubleshooting, see Troubleshooting Issues in the Snowflake documentation.
Tip
When you launch a connector, a Dead Letter Queue topic is automatically created. See Dead Letter Queue for details.
Next Steps
See also
For an example that shows fully-managed Confluent Cloud connectors in action with Confluent Cloud ksqlDB, see the Cloud ETL Demo. This example also shows how to use Confluent Cloud CLI to manage your resources in Confluent Cloud.