Confluent Cloud Metrics API¶

The Confluent Cloud Metrics API provides actionable operational metrics about your Confluent Cloud deployment. This is a queryable HTTP API in which the user will POST a query written in JSON and get back a time series of metrics specified by the query.

This page is meant to be instructional and to help you get started with the Confluent Cloud Metrics API. For more information, see the API Reference.

Metrics API Quick Start¶

Prerequisites

Access to Confluent Cloud
Internet connectivity.

The following examples use HTTPie rather than cURL. This software package can be installed using most common software package managers by following the documentation .

Create a Cloud API key to authenticate to the Metrics API. For example:

ccloud login
ccloud kafka cluster use lkc-XXXXX
ccloud api-key create --resource cloud

Note

You must use a Cloud API Key to communicate with the Metrics API. Using the Cluster API Key that is used to communicate with Kafka will result in an authentication error.

List the available metrics¶

Get a description of the available metrics by sending a GET request to the descriptors endpoint of the API:

http -v https://api.telemetry.confluent.cloud/v1/metrics/cloud/descriptors --auth '<API_KEY>:<SECRET>'

This returns a JSON blob with details on the available metrics to query. The current list of available metrics is:

Name	Labels
`io.confluent.kafka.server/received_bytes`	cluster, topic, partition
`io.confluent.kafka.server/sent_bytes`	cluster, topic, partition
`io.confluent.kafka.server/received_records`	cluster, topic, partition
`io.confluent.kafka.server/sent_records`	cluster, topic, partition
`io.confluent.kafka.server/retained_bytes`	cluster, topic, partition
`io.confluent.kafka.server/active_connection_count`	cluster
`io.confluent.kafka.server/request_count`	cluster, type
`io.confluent.kafka.server/partition_count`	cluster
`io.confluent.kafka.server/successful_authentication_count`	cluster

List the available topics for a given metric in a specified interval¶

Create a file named attributes_query.json using the following template. Be sure to change lkc-XXXXX and the timestample values to match your needs:

{
    "filter": {
        "field": "metric.label.cluster_id",
        "op": "EQ",
        "value": "lkc-XXXXX"
    },
    "group_by": [
        "metric.label.topic"
    ],
    "intervals": [
        "2020-01-13T10:30:00-05:00/2020-01-13T11:00:00-05:00"
    ],
    "limit": 25,
    "metric": "io.confluent.kafka.server/sent_bytes"
}

Submit the query as a POST using the following command. Be sure to change API_KEY and SECRET to match your environments.

http -v https://api.telemetry.confluent.cloud/v1/metrics/cloud/attributes --auth '<API_KEY>:<SECRET>' < attributes_query.json

Your output should resemble:

Note

Be aware that topics without timeseries values during the specified interval will not be returned.

{
    "data": [
        {},
        {
            "metric.label.topic": "test-topic"
        }
    ],
    "meta": {
        "pagination": {
            "page_size": 25
        }
    }
}

Query for bytes sent to consumers per minute grouped by topic¶

Create a file named sent_bytes_query.json using the following template. Be sure to change lkc-XXXXX and the timestamp values to match your needs:

{
    "aggregations": [
        {
            "agg": "SUM",
            "metric": "io.confluent.kafka.server/sent_bytes"
        }
    ],
    "filter": {
        "filters": [
            {
                "field": "metric.label.cluster_id",
                "op": "EQ",
                "value": "lkc-XXXXX"
            }
        ],
        "op": "AND"
    },
    "granularity": "PT1M",
    "group_by": [
        "metric.label.topic"
    ],
    "intervals": [
        "2019-12-19T11:00:00-05:00/2019-12-19T11:05:00-05:00"
    ],
    "limit": 25
}

Submit the query as a POST using the following command. Be sure to change API_KEY and SECRET to match your environments.

http -v https://api.telemetry.confluent.cloud/v1/metrics/cloud/query --auth '<API_KEY>:<SECRET>' < sent_bytes_query.json

Your output should resemble:

Note

Be aware that if you have not produced data during the time window, the dataset will be empty for a given topic.

{
     "data": [
         {
             "timestamp": "2019-12-19T16:01:00Z",
             "metric.label.topic": "test-topic",
             "value": 0.0
         },
         {
             "timestamp": "2019-12-19T16:02:00Z",
             "metric.label.topic": "test-topic",
             "value": 157.0
         },
         {
             "timestamp": "2019-12-19T16:03:00Z",
             "metric.label.topic": "test-topic",
             "value": 371.0
         },
         {
             "timestamp": "2019-12-19T16:04:00Z",
             "metric.label.topic": "test-topic",
             "value": 0.0
         }
     ]
 }

Query for bytes sent by producers per minute grouped by topic¶

Create a file named received_bytes_query.json using the following template. Be sure to change lkc-XXXXX and the timestamp values to match your needs:

{
    "aggregations": [
        {
            "agg": "SUM",
            "metric": "io.confluent.kafka.server/received_bytes"
        }
    ],
    "filter": {
        "filters": [
            {
                "field": "metric.label.cluster_id",
                "op": "EQ",
                "value": "lkc-XXXXX"
            }
        ],
        "op": "AND"
    },
    "granularity": "PT1M",
    "group_by": [
        "metric.label.topic"
    ],
    "intervals": [
        "2019-12-19T11:00:00-05:00/2019-12-19T11:05:00-05:00"
    ],
    "limit": 25
}

Submit the query as a POST using the following command. Be sure to change API_KEY and SECRET to match your environments.

http -v https://api.telemetry.confluent.cloud/v1/metrics/cloud/query --auth '<API_KEY>:<SECRET>' < received_bytes_query.json

Your output should resemble:

Note

Be aware that if you have not produced data during the time window, the dataset will be empty for a given topic.

{
    "data": [
        {
            "timestamp": "2019-12-19T16:00:00Z",
            "metric.label.topic": "test-topic",
            "value": 72.0
        },
        {
            "timestamp": "2019-12-19T16:01:00Z",
            "metric.label.topic": "test-topic",
            "value": 139.0
        },
        {
            "timestamp": "2019-12-19T16:02:00Z",
            "metric.label.topic": "test-topic",
            "value": 232.0
        },
        {
            "timestamp": "2019-12-19T16:03:00Z",
            "metric.label.topic": "test-topic",
            "value": 0.0
        },
        {
            "timestamp": "2019-12-19T16:04:00Z",
            "metric.label.topic": "test-topic",
            "value": 0.0
        }
    ]
}

Query for max retained bytes per hour over 2 hours for topic named test-topic¶

Create a file named retained_bytes_query.json using the following template. Change lkc-XXXXX and the timestamp values to match your needs:

{
    "aggregations": [
        {
            "agg": "SUM",
            "metric": "io.confluent.kafka.server/retained_bytes"
        }
    ],
    "filter": {
        "filters": [
            {
                 "field": "metric.label.topic",
                 "op": "EQ",
                 "value": "test-topic"
            },
            {
                "field": "metric.label.cluster_id",
                "op": "EQ",
                "value": "lkc-XXXXX"
            }
        ],
        "op": "AND"
    },
    "granularity": "PT1M",
    "group_by": [
        "metric.label.topic"
    ],
    "intervals": [
        "2019-12-19T11:00:00-05:00/P0Y0M0DT2H0M0S"
    ],
    "limit": 25
}

Submit the query as a POST using the following command. Be sure to change API_KEY and SECRET to match your environments.

http -v https://api.telemetry.confluent.cloud/v1/metrics/cloud/query --auth '<API_KEY>:<SECRET>' < retained_bytes_query.json

Your output should resemble:

{
    "data": [
        {
            "timestamp": "2019-12-19T16:00:00Z",
            "metric.label.topic": "test-topic",
            "value": 406561.0
        },
        {
            "timestamp": "2019-12-19T17:00:00Z",
            "metric.label.topic": "test-topic",
            "value": 406561.0
        }
    ]
}

Query for max retained bytes per hour over 2 hours for a cluster lkc-XXXX¶

Create a file named cluster_retained_bytes_query.json using the following template. Be sure to change lkc-XXXXX and the timestamp values to match your needs:

{
    "aggregations": [
        {
            "agg": "SUM",
            "metric": "io.confluent.kafka.server/retained_bytes"
        }
    ],
    "filter": {
        "filters": [
            {
                "field": "metric.label.cluster_id",
                "op": "EQ",
                "value": "lkc-xr36q"
            }
        ],
        "op": "AND"
    },
    "granularity": "PT1H",
    "group_by": [
        "metric.label.cluster_id"
    ],
    "intervals": [
        "2019-12-19T11:00:00-05:00/P0Y0M0DT2H0M0S"
    ],
    "limit": 5
}

Submit the query as a POST using the following command. Be sure to change API_KEY and SECRET to match your environments.

http -v https://api.telemetry.confluent.cloud/v1/metrics/cloud/query --auth '<API_KEY>:<SECRET>' < cluster_retained_bytes_query.json

Your output should resemble:

{
    "data": [
        {
            "timestamp": "2019-12-19T16:00:00Z",
            "value": 507350.0
        },
        {
            "timestamp": "2019-12-19T17:00:00Z",
            "value": 507350.0
        }
    ]
}

FAQ¶

Can the Metrics API be used to reconcile my bill?¶

No, the Metrics API is intended to provide information for the purposes of monitoring, troubleshooting, and capacity planning. It is not intended as an audit system for reconciling bills as the metrics do not include request overhead for the Kafka protocol at this time. For more details, see the billing documentation.

Why am I seeing empty data sets for topics that exist on queries other than for `retained_bytes`?¶

If there are only values of 0.0 in the time range queried, than the API will return an empty set. When there is non-zero data within the time range, time slices with values of 0.0 are returned.

Why didn’t `retained_bytes` decrease after I changed the retention policy for my topic?¶

The value of retained_bytes is calculated as the maximum over the interval for each data point returned. If data has been deleted during the current interval, you will not see the effect until the next time range window begins. For example, if you produced 4GB of data per day over the last 30 days and queried for retained_bytes over the last 3 days with a 1 day interval, the query would return values of 112GB, 116GB, 120GB as a time series. If you then deleted all data in the topic and stopped producing data, the query would return the same values until the next day. When queried at the start of the next day, the same query would return 116GB, 120GB, 0GB.

What are the the supported granularity levels?¶

The following table shows the granularity levels in the Granularity enumeration.

Reporting interval	Symbol
1 minute	`PT1M`
5 minutes	`PT5M`
15 minutes	`PT15M`
30 minutes	`PT30M`
1 hour	`PT1H`
4 hours	`PT4H`
All granularities	`ALL`

Why don’t I see consumer lag in the Metrics API?¶

In Kafka, consumer lag is not tracked as a metric on the server side. This is because it is a cluster-level construct and today, Kafka’s metrics are derived from instrumentation at a lower level abstraction. Consumer lag may be added to the Metrics API at a later date. At this time, there are multiple other ways to monitor Consumer lag including the client metrics, UI, CLI, and Admin API. These methods are all available when using Confluent Cloud.

What is the retention time of metrics in the Metrics API?¶

Metrics are retained for seven days.

What is the granularity of data in the Metrics API?¶

Data is available at a granularity of one minute. However, the allowed granularity for a query is restricted by the size of the query’s interval.

Granularity	Maximum Interval
1 minute	6 hours
5 minutes	1 day
15 minutes	4 days
30 minutes	7 days
1 hour	Unlimited
4 hours	Unlimited
All	Unlimited

How do I know if a given metric is in preview or generally available (GA)?¶

We are always looking to add new metrics, but when we add a new metric, we need to take some time to stabilize how we expose it, to ensure that it’s suitable for most use cases. Each metric’s lifecycle stage (preview, generally available, etc.) is included in the response from the /descriptors endpoint. While a metric is in preview we may make breaking changes to its attributes without an API version change, as we iterate to provide the best possible experience.

What should I do if a query to Metrics API returns a timeout response (HTTP error code 504)?¶

If queries are exceeding the timeout (maximum query time is 60s) it is best to take one of two approaches:

Break up the query on the client side to return fewer data points. For example, you can query for specific topics instead of all topics at once, or you can reduce the time interval.
Reduce the granularity of data returned.

These approaches are especially important to when querying for partition-level data over days-long intervals.

What should I do if a query returns a 5xx response code?¶

We recommended retrying these type of responses. Usually, this is an indication of a transient server-side issue. You should design your client implementations for querying the Metrics API to be resilient to this type of response for minutes-long periods.

Suggested Resources¶

Podcast: Multi-Cloud Monitoring and Observability with the Metrics API ft. Dustin Cote
To learn how to architect, monitor, and optimize your Kafka applications on Confluent Cloud, refer to Developing Client Applications on Confluent Cloud.

Confluent Cloud Metrics API¶

Metrics API Quick Start¶

List the available metrics¶

List the available topics for a given metric in a specified interval¶

Query for bytes sent to consumers per minute grouped by topic¶

Query for bytes sent by producers per minute grouped by topic¶

Query for max retained bytes per hour over 2 hours for topic named test-topic¶

Query for max retained bytes per hour over 2 hours for a cluster lkc-XXXX¶

FAQ¶

Can the Metrics API be used to reconcile my bill?¶

Why am I seeing empty data sets for topics that exist on queries other than for retained_bytes?¶

Why didn’t retained_bytes decrease after I changed the retention policy for my topic?¶

What are the the supported granularity levels?¶

Why don’t I see consumer lag in the Metrics API?¶

What is the retention time of metrics in the Metrics API?¶

What is the granularity of data in the Metrics API?¶

How do I know if a given metric is in preview or generally available (GA)?¶

What should I do if a query to Metrics API returns a timeout response (HTTP error code 504)?¶

What should I do if a query returns a 5xx response code?¶

Suggested Resources¶

Why am I seeing empty data sets for topics that exist on queries other than for `retained_bytes`?¶

Why didn’t `retained_bytes` decrease after I changed the retention policy for my topic?¶