Pull queries preview with Confluent Cloud ksqlDB¶

Pull queries are a relatively new but integral feature offered by ksqlDB. In contrast to push queries, which perpetually stream incremental query results to clients, pull queries follow a traditional request/response model, which means that a pull query retrieves a finite result from the ksqlDB server and terminate on completion, similar to the way queries work with traditional databases.

Pull queries are available as a preview feature to all Confluent Cloud ksqlDB users. While a feature is in preview, we recommend against using it in production workloads. The remainder of this document provides details about using pull queries in Confluent Cloud and describes the limitations of pull queries while in preview status, and beyond.

Limitations¶

Preview pull queries in Confluent Cloud ksqlDB have the following limitations:

Throughput: The request throughput for a pull query is rate-limited at approximately 41 queries per second, per CSU. For example, a 4 CSU ksqlDB application can process approximately 167 pull queries per second. This rate limit may change when pull queries are generally available.
Consistency: Pull query results use an eventually consistent consistency model, which means that some writes to the table being queried may not yet appear in a pull query’s result, even if these writes occurred before the pull query request. In practice, this window of “inconsistency” is very narrow, but you should account for this in your applications.
Performance impact: Because pull queries are processed by the ksqlDB application, pull queries do have the potential to reduce the throughput of push queries that are being run by the same application. Any potential performance impact depends strongly on the underlying workload, but in general we recommend that you perform realistic benchmarks to understand how a pull query workload will affect the performance of your specific ksqlDB workload.
SLA: While in preview, pull queries are not covered by the Confluent Cloud ksqlDB SLA.

Pricing¶

While in preview, pull queries are free of charge. When pull queries are generally available, they will have associated consumption-based cost, because they consume bandwidth.

CLI configuration¶

You can issue pull queries directly from the Confluent Cloud ksqlDB web interface, with no additional configuration.

If you want to use pull queries from a CLI session, or via HTTP requests, you must create a ksqlDB API key to enable your client to connect with your application.

Note

ksqlDB API keys are different from Kafka API keys and must be created for your specific ksqlDB application.

Log in to Confluent Cloud by using the ccloud CLI:
```
ccloud login
```

Use the following command to list your ksqlDB applications:

ccloud ksql app list

Your output should resemble:

       Id      |    Name     | Topic Prefix |   Kafka   | Storage |                       Endpoint                        | Status
+--------------+-------------+--------------+-----------+---------+-------------------------------------------------------+--------+
  lksqlc-ok1yo | ksqldb-app1 | pksqlc-e8816 | lkc-2ry82 |     500 | https://pksqlc-e8816.us-west2.gcp.confluent.cloud:443 | UP

Note the Id, Kafka, and Endpoint values, which in this example are lksqlc-ok1yo, lkc-2ry82, and https://pksqlc-e8816.us-west2.gcp.confluent.cloud:443.

The application ID is used to create an application-specific API key and secret that clients use to connect to the application.
Kafka specifies the ID of your Kafka cluster.
The application Endpoint is used to connect to your application.

Run the following command to specify the Kafka cluster that your CLI session should use:
```
ccloud kafka cluster use <kafka-cluster-id>
```
Obtain your Confluent Cloud service account ID to use for security configuration. You can find it in the Id column of the following command’s output:
```
ccloud service-account list
```
After you get your service account ID, run the following command to grant the required permissions for the topic:
```
ccloud kafka acl create --allow --service-account <service-account-id> --operation READ --operation WRITE --operation CREATE --topic 'pq_' --prefix
```
The --topic 'pq_' --prefix option applies the ACLs to all topics that have a name that is prefixed with pq_.
Run the following command to create an application-specific API key and secret by passing the application ID as the --resource.
```
ccloud api-key create --resource <ksqldb-application-id>
```
Use the key and secret returned by the previous command to connect to your application by using the ksqlDB CLI. Pass the key as the username, and the secret as the password.
```
$CONFLUENT_HOME/bin/ksql -u <api-key> -p <api-key-secret> <endpoint>
```

Example workload¶

In this section, you build an example workload that enables experimenting with pull queries in Confluent Cloud.

In the ksqlDB CLI, run the following statement to create an input stream.

CREATE STREAM pq_pageviews (user_id INTEGER KEY, url STRING, status INTEGER) WITH
 (kafka_topic='pq_pageviews', value_format='json', partitions=1);

Create a materialized view over this input stream. The materialized view aggregates events from the pageviews stream, grouped on the events’ url field:
```
CREATE TABLE pq_pageviews_metrics AS
 SELECT url, COUNT(*) AS num_views
  FROM pq_pageviews
  GROUP BY url
  EMIT CHANGES;
```

Populate the materialized view by writing events to the pageviews input stream.

INSERT INTO pq_pageviews (user_id, url, status) VALUES (0, 'https://confluent.io', 200);
INSERT INTO pq_pageviews (user_id, url, status) VALUES (1, 'https://confluent.io/blog', 200);

Now that the materialized view contains some data, issue a pull query against it to retrieve the latest count for a given URL.

SELECT * FROM pq_pageviews_metrics WHERE url = 'https://confluent.io';

Your output should resemble:

+-------------------------------------------------+-------------------------------------------------+
|URL                                              |NUM_VIEWS                                        |
+-------------------------------------------------+-------------------------------------------------+
|https://confluent.io                             |1                                                |

This pull query should return precisely one row. Each time the pull query runs, it will return the latest value for the targeted row.