Frequently Asked Questions
What are the benefits of ksqlDB?
ksqlDB allows you to query, read, write, and process data in Apache Kafka®
in real-time and at scale using intuitive SQL-like syntax. ksqlDB does not
require proficiency with a programming language such as Java or Scala,
and you don’t have to install a separate processing cluster technology.
What are the technical requirements of ksqlDB?
ksqlDB only requires:
- A Java runtime environment
- Access to an Apache Kafka cluster for reading and writing data in
real-time. The cluster can be on-premises or in the cloud. ksqlDB works
with clusters running vanilla Apache Kafka as well as with clusters
running the Kafka versions included in Confluent Platform.
We recommend the use of Confluent
Platform or
Confluent Cloud for
running Apache Kafka.
Is ksqlDB owned by the Apache Software Foundation?
No, ksqlDB is owned and maintained by Confluent
Inc. as part of its Confluent Platform
product. However, ksqlDB is licensed under the Confluent Community License.
How does ksqlDB compare to Apache Kafka’s Streams API?
ksqlDB is complementary to the Kafka Streams API, and indeed executes queries through Kafka Streams applications. They share some similarities such as having very flexible deployment models so you can integrate them easily into your existing technical and organizational processes and tooling, regardless of whether you have opted for containers, VMs, bare-metal machines, cloud services, or on-premise environments.
One of the key benefits of ksqlDB is that it does not require the user to develop any code in Java or Scala. This enables users to leverage a SQL-like interface alone to construct streaming ETL pipelines, to respond to real-time, continuous business requests, to spot anomalies, and more. ksqlDB is a great fit when your processing logic can be naturally expressed through SQL.
For full-fledged stream processing applications Kafka Streams remains a more appropriate choice. For example, implementing a finite state machine that is driven by streams of data is easier to achieve in a programming language such as Java or Scala than in SQL. In Kafka Streams you can also choose between the DSL (a functional programming API) and the Processor API (an imperative programming API), and even combine the two.
As with many technologies, each has its sweet-spot based on technical requirements, mission-criticality, and user skillset.
Does ksqlDB support Kafka’s exactly-once processing semantics?
Yes, ksqlDB supports exactly-once processing, which means it will compute
correct results even in the face of failures such as machine crashes.
Is ksqlDB fully compliant to ANSI SQL?
ksqlDB is a dialect inspired by ANSI SQL. It has some differences because
it is geared at processing streaming data. For example, ANSI SQL has no
notion of “windowing” for use cases such as performing aggregations on
data grouped into 5-minute windows, which is a commonly required
functionality in the streaming world.
How do I shut down a ksqlDB environment?
Exit ksqlDB CLI:
If you’re running with Confluent CLI, use the confluent stop
command:
If you’re running ksqlDB in Docker containers, stop the
cp-ksqldb-server
container:
docker stop <cp-ksqldb-server-container-name>
If you’re running ksqlDB as a system service, use the systemctl stop
command:
sudo systemctl stop confluent-ksql
For more information on shutting down Confluent Platform, see
Install and Upgrade Confluent Platform.
How do I add ksqlDB servers to an existing ksqlDB cluster?
You can add or remove ksqlDB servers during live operations. ksqlDB servers that have been configured to use the same
Kafka cluster (bootstrap.servers
) and the same ksqlDB service ID (ksql.service.id
) form a given ksqlDB cluster.
To add a ksqlDB server to an existing ksqlDB cluster the server must be configured with the same bootstrap.servers
and
ksql.service.id
settings as the ksqlDB cluster it should join. For more information, see Configure ksqlDB Server
and Scaling ksqlDB.
How can I lock-down ksqlDB servers for production and prevent interactive client access?
You can configure your servers to run a set of predefined queries by using ksql.queries.file
or the
--queries-file
command line flag. For more information, see Configure ksqlDB Server.
How do I use Avro data and integrate with Confluent Schema Registry?
Configure the ksql.schema.registry.url
property in the ksqlDB server configuration to point to Schema Registry
(see Configure ksqlDB for Avro).
Important
- To use Avro data with ksqlDB you must have Schema Registry installed. This is included by default with Confluent Platform.
- Avro message values are supported. Avro keys are not yet supported.
How can I scale out ksqlDB?
The maximum parallelism depends on the number of partitions.
- To scale out: start additional ksqlDB servers with same config. This can be done during live operations.
See How do I add ksqlDB servers to an existing ksqlDB cluster?
- To scale in: stop the desired running ksqlDB servers, but keep at least one server running. This can be done during live
operations. The remaining servers should have sufficient capacity to take over work from stopped servers.
Tip
Idle servers will consume a small amount of resource. For example, if you have 10 ksqlDB servers and run a query
against a two-partition input topic, only two servers perform the actual work, but the other eight will run an
“idle” query.
Can ksqlDB connect to an Apache Kafka cluster over SSL?
Yes. Internally, ksqlDB uses standard Kafka consumers and producers.
The procedure to securely connect ksqlDB to Kafka is the same as connecting any app to Kafka. For more information,
see Configure ksqlDB for Secured Apache Kafka clusters.
Can ksqlDB connect to an Apache Kafka cluster over SSL and authenticate using SASL?
Yes. Internally, ksqlDB uses standard Kafka consumers and producers.
The procedure to securely connect ksqlDB to Kafka is the same as connecting any app to Kafka.
For more information, see Configure Kafka Authentication.
Which ksqlDB queries read or write data to Kafka?
SHOW STREAMS and EXPLAIN <query> statements run against the ksqlDB server that
the ksqlDB client is connected to. They don’t communicate directly with Kafka.
CREATE STREAM WITH <topic> and CREATE TABLE WITH <topic> write metadata to the
ksqlDB command topic.
Persistent queries based on CREATE STREAM AS SELECT and CREATE TABLE AS SELECT
read and write to Kafka topics.
Non-persistent queries based on SELECT that are stateless only read from Kafka
topics, for example SELECT … FROM foo WHERE ….
Non-persistent queries that are stateful read and write to Kafka, for example,
COUNT and JOIN. The data in Kafka is deleted automatically when you terminate
the query with CTRL-C.
How do I check the health of a ksqlDB server?
Use the ps
command to check whether the ksqlDB server process is running,
for example:
Your output should resemble:
jim 2540 5.2 2.3 8923244 387388 tty2 Sl 07:48 0:33 /usr/lib/jvm/java-8-oracle/bin/java -cp /home/jim/confluent-5.0.0/share/java/monitoring-interceptors/* ...
If the process status of the JVM isn’t Sl
or Ssl
, the ksqlDB server may be down.
If you’re running ksqlDB server in a Docker container, run the docker ps
or
docker-compose ps
command, and check that the status of the ksql-server
container is Up
. Check the health of the process in the container by running
docker logs <ksql-server-container-id>
.
- Check runtime stats for the ksqlDB server that you’re connected to.
- Run SHOW STREAMS or SHOW TABLES, then run DESCRIBE EXTENDED <stream|table>.
- Run SHOW QUERIES, then run EXPLAIN <query>.
The ksqlDB REST API supports a “server info” request (for example, http://<ksql-server-url>/info
),
which returns info such as the ksqlDB version. For more info, see
REST API Index.
What if automatic topic creation is turned off?
If automatic topic creation is disabled, ksqlDB and Kafka Streams applications
continue to work. ksqlDB and Kafka Streams applications use the Admin Client,
so topics are still created.