ExtractTopic

The following provides usage information for the Confluent SMT io.confluent.connect.transforms.ExtractTopic.

Description

Extract data from a message and use it as the topic name. You can either use the entire key/value (which should be a string), or use a field from a map or struct. Use the concrete transformation type designed for the record key (io.confluent.connect.transforms.ExtractTopic$Key) or value (io.confluent.connect.transforms.ExtractTopic$Value). You can also extract the entire value from a message header value (string) by using the concrete type (io.confluent.connect.transforms.ExtractTopic$Header).

Installation

This transformation is developed by Confluent and does not ship by default with Apache Kafka® or Confluent Platform. You can install this transformation via the Confluent Hub Client:

confluent-hub install confluentinc/connect-transforms:latest

Examples

The configuration snippet below shows how to use and configure the ExtractTopic SMT.

transforms=KeyExample, ValueFieldExample, KeyFieldExample, FieldJsonPathExample, HeaderExample

Use the key of the message as the topic name.

transforms.KeyExample.type=io.confluent.connect.transforms.ExtractTopic$Key

Extract a required field named f2 from the value, and use it as the topic name.

transforms.ValueFieldExample.type=io.confluent.connect.transforms.ExtractTopic$Value
transforms.ValueFieldExample.field=f2

Extract a field named f3 from the key, and use it as the topic name. If the field is null or missing, leave the topic name as-is.

transforms.KeyFieldExample.type=io.confluent.connect.transforms.ExtractTopic$Value
transforms.KeyFieldExample.field=f3
transforms.KeyFieldExample.skip.missing.or.null=true

Extract the value of a field named f3 in the f1 field in the key, and use it as the topic name. Here the format of the field is defined with JSON Path (e.g., ["f1"]["f3"]). If the field is null or missing, leave the topic name as-is.

transforms.FieldJsonPathExample.type=io.confluent.connect.transforms.ExtractTopic$Value
transforms.FieldJsonPathExample.field=$["f1"]["f3"]
transforms.FieldJsonPathExample.field.format=JSON_PATH
transforms.FieldJsonPathExample.skip.missing.or.null=true

Extract the value of a message header (as a string) with key h1 (required) and use it as the topic name.

transforms.HeaderExample.type=io.confluent.connect.transforms.ExtractTopic$Header
transforms.HeaderExample.field=h1
transforms.FieldJsonPathExample.skip.missing.or.null=true

Properties

Name Description Type Default Valid Values Importance
field Field name to use as the topic name. If left blank, the entire key or value is used (and assumed to be a string). string “”   medium
field.format Specify field path format. Currently two formats are supported: JSON_PATH and PLAIN. If set to JSON_PATH, the transformer will interpret the field with JSON path intepreter, which supports nested field extraction. If left blank or set to PLAIN, the transformer will evaluate the field config as a non-nested field name. When using ExtractTopic$Header, only the default PLAIN format can be used, which will extract the header value as a string. string “PLAIN” “JSON_PATH”, “PLAIN” medium
skip.missing.or.null How to handle missing fields and null fields, keys, and values. By default, this transformation will throw an exception if a field specified via the field configuration is missing or null, or if no field is specified but the message’s key or value is null. If this configuration is set to true, the transformation will instead silently ignore these conditions and allow the record to pass through unaltered. boolean false   low

Predicates

Transformations can be configured with predicates so that the transformation is applied only to records which satisfy a condition. You can use predicates in a transformation chain and, when combined with the Apache Kafka® Filter, predicates can conditionally filter out specific records.

Predicates are specified in the connector configuration. The following properties are used:

  • predicates: A set of aliases for predicates applied to one or more transformations.
  • predicates.$alias.type: Fully qualified class name for the predicate.
  • predicates.$alias.$predicateSpecificConfig: Configuration properties for the predicate.

All transformations have the implicit config properties predicate and negate. A predicular predicate is associated with a transformation by setting the transformation’s predicate configuration to the predicate’s alias. The predicate’s value can be reversed using the negate configuration property.

Kafka Connect includes the following predicates:

  • org.apache.kafka.connect.predicates.TopicNameMatches: Matches records in a topic with a name matching a particular Java regular expression.
  • org.apache.kafka.connect.predicates.HasHeaderKey: Matches records which have a header with the given key.
  • org.apache.kafka.connect.predicates.RecordIsTombstone: Matches tombstone records (that is, records with a null value).

Predicate Examples

Example 1:

You have a source connector that produces records to many different topics and you want to do the following:

  • Filter out the records in the foo topic entirely.
  • Apply the ExtractField transformation with the field name other_field to records in all topics, except the topic bar.

To do this, you need to first filter out the records destined for the topic foo. The Filter transformation removes records from further processing.

Next, you use the TopicNameMatches predicate to apply the transformation only to records in topics which match a certain regular expression. The only configuration property for TopicNameMatches is a Java regular expression used as a pattern for matching against the topic name. The following example shows this configuration:

transforms=Filter
transforms.Filter.type=org.apache.kafka.connect.transforms.Filter
transforms.Filter.predicate=IsFoo

predicates=IsFoo
predicates.IsFoo.type=org.apache.kafka.connect.predicates.TopicNameMatches
predicates.IsFoo.pattern=foo

Using this configuration, ExtractField is then applied only when the topic name of the record is not bar. The reason you can’t use TopicNameMatches directly is because it would apply the transformation to matching topic names, not topic names which do not match. The transformation’s implicit negate configuration properties inverts the set of records which a predicate matches. This configuration addition is shown below:

transforms=Filter,Extract
transforms.Filter.type=org.apache.kafka.connect.transforms.Filter
transforms.Filter.predicate=IsFoo

transforms.Extract.type=org.apache.kafka.connect.transforms.ExtractField$Key
transforms.Extract.field=other_field
transforms.Extract.predicate=IsBar
transforms.Extract.negate=true

predicates=IsFoo,IsBar
predicates.IsFoo.type=org.apache.kafka.connect.predicates.TopicNameMatches
predicates.IsFoo.pattern=foo

predicates.IsBar.type=org.apache.kafka.connect.predicates.TopicNameMatches
predicates.IsBar.pattern=bar

Example 2:

The following configuration shows how to use a predicate in a transformation chain with the ExtractField transformation and the negate=true configuration property:

transforms=t2
transforms.t2.predicate=has-my-prefix
transforms.t2.negate=true
transforms.t2.type=org.apache.kafka.connect.transforms.ExtractField$Key
transforms.t2.field=c1
predicates=has-my-prefix
predicates.has-my-prefix.type=org.apache.kafka.connect.predicates.TopicNameMatch
predicates.has-my-prefix.pattern=my-prefix-.*

The transform t2 is only applied when the predicate has-my-prefix is false (using the negate=true parameter). The predicate is configured by the keys with prefix predicates.has-my-prefix. The predicate class is org.apache.kafka.connect.predicates.TopicNameMatch and it’s pattern parameter has the value my-prefix-.* . With this configuration, the transformation is applied only to records where the topic name does not start with my-prefix-.

Tip

The benefit of defining the predicate separately from the transform is it makes it easier to apply the same predicate to multiple transforms. For example, you can have one set of transforms use one predicate and another set of transforms use the same predicate for negation.

Predicate Properties

Name Description Type Default Valid Values Importance
TopicNameMatches A predicate which is true for records with a topic name that matches the configured regular expression. string   non-empty string, valid regex medium
HasHeaderKey A predicate which is true for records with at least one header with the configured name. string   non-empty string medium
RecordIsTombstone A predicate which is true for records which are tombstones (that is, records with a null value).       medium