Storage options
By default, Operator manages storage using dynamic storage provisioning that
Kubernetes provides.
If you must rely on statically provisioned storage volumes, you can manually
provision and attach storage to your Kubernetes worker nodes, expose those to
the platform as PersistentVolumes, and then use Confluent Operator to deploy Confluent Platform
clusters so that the broker instances mount those PersistentVolumes.
Depending on how you are specifying storage requirements to Confluent Operator, you have
the following options:
- Create a StorageClass and specify the class name for Confluent Operator to use
- Use the default Kubernetes StorageClass
- Specify storage provisioner and other details for Confluent Operator Helm charts to create a StorageClass
Confluent Operator does not support migration from one storage class to another.
Use pre-defined StorageClass
Starting in Confluent Operator 5.5, you can instruct Operator to use a specific StorageClass
for all PersistentVolumes it creates.
You can provide a storage class to use for the entire Confluent Platform, or you can specify
different storage classes for different components such as ZooKeeper, Kafka, ksqlDB,
and Control Center.
Create or use a pre-defined StorageClass you want to use in your Kubernetes cluster.
You must ensure that volumeBindingMode
is set to
WaitForFirstConsumer
on your storage class for use with Confluent Operator. For
more information, see Volume Binding Mode for Kubernetes Storage Class.
You need to have sufficient permissions to create and modify StorageClasses
in your Kubernetes cluster if you intend to create a new StorageClass to use
rather than using a pre-existing one.
In the configuration file ($VALUES_FILE
), specify the name of the StorageClass to use for deploying Confluent Platform.
To specify a StorageClass for all component deployments, specify the storage class name in global.storageClassName
:
global:
storageClassName:
To specify a StorageClass for a particular component, specify the storage class name in <component>.storageClassName
. For example, for Kafka:
Use the Kubernetes default StorageClass
Starting in Confluent Operator 5.5, you can configure Confluent Operator to use the Kubernetes
default storage class.
To use this option, you must ensure that volumeBindingMode
is set to
WaitForFirstConsumer
. For more information, see Volume Binding Mode for
Kubernetes Storage Class.
The process for using statically provisioned storage is the same as above.
Ensure that the storageClassName
specified in your PersistentVolume
definitions matches the name of your Kubernetes cluster’s default StorageClass.
To use the Kubernetes default storage class, in the configuration file
($VALUES_FILE
):
- Do not specify the global level
storageClassName
values or set it to an empty string (""
).
- Do not specify the component level
storageClassName
value or set it to an empty string (""
).
- Do not specify the
global.provider.storage
object.
The associated volumes will use the default StorageClass of your Kubernetes
cluster. The support for default StorageClasses is enabled by default in
versions 1.11 and higher of Kubernetes.
Use the StorageClass created by Confluent Operator Helm charts
To have Confluent Operator Helm charts create a storage class for your Confluent Platform cluster, set
the following in the configuration file ($VALUES_FILE
):
global:
provider:
kubernetes:
deployment:
zones:
- us-central1-a
storageClassName:"" ----- [1]
storage: ----- [2]
provisioner: kubernetes.io/gce-pd
allowVolumeExpansion: ----- [3]
parameters:
type: pd-ssd
[1] Set storageClassName:""
, or do not specify storageClassName
, either at the global level or at the component level.
[2] If you are configuring a multi-zone cluster, Confluent Operator creates a storage class for each zone specified in global.provider.kubernetes.deployment.zone
.
See Storage Class Provisioners
for configuration examples. This example uses GCE persistent disk storage
(gce-pd
) and solid-state drives (pd-ssd
) as described in GCE PD.
[3] Set allowVolumeExpansion: false
to disable automatic storage expansion. Confluent Operator supports automatic expansion of storage, by default, for the storage classes created by Confluent Operator Helm charts.
When creating Confluent Platform clusters, Confluent Operator creates multiple StorageClasses on
the fly using the data under global.provider.storage
as the spec for each
StorageClass, and they will be named according to the following pattern:
{cp-component-helm-chart-name}-standard-ssd-{zone}
If doing this, then the process for using statically provisioned storage is the
same as above.
Precedence of the storage options
Precedence rules for the possible storage configuration options are as follows:
- If
storageClassName
is specified both at the global level (in the global
object) and component levels (in the component objects), the component-level storageClassName
is used.
- If
storageClassName
is specified at the component level, and the global.provider.storage
object is specified, the component-level storageClassName
is used.
- If both the global level
storageClassName
and global.provider.storage
are specified, Operator will return an error.
Tiered Storage
Starting in Confluent Platform 6.0, you can use Operator to enable Tiered Storage with AWS S3 or Google GCS.
Enable Tiered Storage
When you enable Tiered Storage, you need to configure Kafka with the following:
- The type of blob storage to use. Currently AWS S3 and GCS are supported.
- The name of the storage bucket to use. You must have created this bucket in advance, Operator does not create this bucket on your behalf.
You also need to ensure that the Kafka brokers have appropriate access to the
storage bucket. You can use one of the following options:
Use a Service Account
You can map cloud IAM permissions to the Kubernetes ServiceAccount associated
with your Kafka broker pods.
AWS provides the ability to natively associate AWS IAM permissions with
ServiceAccounts in EKS.
Similarly, GCP provides the ability to map IAM permissions with ServiceAccounts
in GKE.
You can map the appropriate bucket permissions to the default ServiceAccount
in the Kubernetes namespace where you plan to deploy Kafka, or you can map them
to a separate ServiceAccount and use Operator to ensure the Kafka broker pods are
associated with that ServiceAccount. The primary benefit of this approach is
that you do not need to actually manage sensitive credentials for bucket
access when deploying Confluent Platform via Operator.
For more on associating AWS IAM roles for service accounts on EKS, see IAM
roles for service accounts.
For more on associating GCP IAM roles for service accounts on GKE, see
Workload Identity.
For more information on configuring which Kubernetes Service Account to
associate with Confluent Platform components managed by Operator, see Provide Service Account.
Use a Secret object
You can put your AWS or GCP credentials in a Secret object and configure Kafka
to use the credentials in that object, when deploying Kafka via the Operator.
In addition to the above required settings, you can configure other Tiered
Storage settings using configOverrides
in the kafka
section. For the
available settings, see Tiered Storage.
When a Kafka cluster is deleted, Operator does not perform a garbage collection of
the Tiered Storage bucket contents. You can either wait for the set interval or
manually delete the objects in the Tiered Storage bucket. For more information,
see Time Interval for Topic Deletes.