How to run data on Kubernetes: 6 starting principles

How to run data on Kubernetes: 6 starting principles

Kubernetes is fast becoming an industry standard, with up to 94% of organizations deploying their services and applications on the container orchestration platform, according to a survey. One of the main reasons enterprises implement Kubernetes is standardization, which allows advanced users to see up to twofold productivity gains.

Standardizing on Kubernetes gives organizations the ability to deploy any workload, anywhere. But there was a missing piece: The technology assumed that workloads were ephemeral, meaning only stateless workloads could be securely deployed on Kubernetes. However, the community has recently changed the paradigm and introduced features like StatefulSets and Storage Classes, which make it possible to consume data on Kubernetes.

While it is possible to run stateful workloads on Kubernetes, it is still challenging. In this article, I provide ways to make it happen and why it’s worth it.

Do it progressively

Kubernetes is about to become as popular as Linux and the de facto way to run any application, anywhere, in a distributed manner. Using Kubernetes involves learning a lot of technical concepts and vocabulary. For example, newcomers may struggle with the many Kubernetes logical units such as containers, pods, nodes, and clusters.

If you’re not yet running Kubernetes in production, don’t go directly to data workloads. Instead, start by moving your applications stateless to avoid losing data when things go sideways.

If you can’t find an operator that meets your needs, don’t worry, because most of them are open-source.

Understand the limits and specificities

Once you’re familiar with the general concepts of Kubernetes, dive into the specifics of stateful concepts. For example, because applications may have different storage needs, such as performance or capacity requirements, it is necessary to provide the correct underlying storage system.

What the industry generally calls storage “profiles” are called storage classes in Kubernetes. They provide a way to describe the different types of classes that a Kubernetes cluster can access. Storage classes can have different levels of quality of service, such as I/O operations per GiB, backup policies, or arbitrary policies such as binding modes and allowed topologies.

Another critical component to understand is the StatefulSet. It is the Kubernetes API object used to manage stateful applications and offers key features such as:

  • Stable and unique network identifiers that allow you to track volume and detach and reattach them at your convenience;
  • Stable and persistent storage so your data is safe;
  • Neat and orderly deployment and scaling, needed for many day two operations.

While StatefulSet was a successful replacement for the infamous (now deprecated) PetSet, it is still imperfect and has limitations. For example, the StatefulSet controller lacks built-in support for volume scaling (PVC), which is a major challenge if the application dataset size is about to grow above the current allocated storage capacity. There are workarounds, but these limitations need to be understood well in advance so that the engineering team knows how to deal with them.

#run #data #Kubernetes #starting #principles

Leave a Reply

Your email address will not be published. Required fields are marked *