This is the third tutorial of the Kubernetes Tutorial Series. The first two were:

We all know that Storage is vital to any organization and earlier storage in the container world was not easily manageable. This is where Kubernetes shines. In this blog post we will look at what Persistent Volumes are and why they are important if you want to work on Kubernetes.

As we all know by now that Pods are ephemeral in nature that means when a Pod is terminated or recreated the data inside the Pod gets lost, there’s no way you can retain that data. This where the concept of Volumes come in where we can store the data that is used by the Pod in disks which will persist even when the Pod terminates or gets recreated. Kubernetes Volumes decouple storage from Pods. You can also share volumes between multiple Pods and we will look at all the cool features it provides in this article.

Kubernetes supports various types of Volumes, but we will not cover them all in these article, some of the important ones are:

  • AWS EBS
  • Azure Disk
  • GCE Persistent Disk
  • EmptyDir
  • Portworx Volume
  • And many more, you can find the complete list here.

Container Storage Interface(CSI)

You can skip this section if you want, but I recommend to read this to get an overview of how CSI made life simpler for Kubernetes community.

First of all, let me tell you this is not Kubernetes specific but without this you will not be able to do storage in Kubernetes with any third party storage provider like AWS, Portworx. So, you should know a little bit of this.

Before CSI, Volume plugins were serving the storage needs for container workloads in Kubernetes and the code was written in the main Kubernetes tree. Having external third party storage provider code in Kubernetes had many drawbacks like Volume plugin development is tightly coupled and dependent on Kubernetes releases. Bugs in volume plugins can crash critical Kubernetes components, instead of just the plugin.

CSI is out of tree and open-standard. Out of the tree means all of the code that hooks the external storage systems into Kubernetes gets ripped out of the main Kubernetes code and totally decoupled. There are many advantages to this, the first is, the storage providers can release updates on their own schedule without being dependent on Kubernetes release, this will make the development faster and independent. And from Kubernetes maintainers perspective they don’t have to take care of in tree third party code. One of the goals of CSI is to have single plugin for an external storage provider like AWS, Portworx, etc that can fit in any of the Container Orchestrator Systems (COs) like Kubernetes, Mesos, Docker Swarm, etc.

The below diagram gives the overview of how everything gets fixed together, on the left side we have external storage, the storage vendors write their own CSI plugins and plug it with Kubernetes using Container Storage Interface. The PV Subsystem on the right which we will talk about in a moment is totally Kubernetes concept which uses the external storage provided by CSI to provide storage on Kubernetes.

Now let’s dive to Kubernetes stuff. We will look into Persistent Volumes, Persistent Volume Claims and Storage Classes.

Persistent Volume and Persistent Volume Claims

Persistent Volume

A PersistentVolume (PV) is a piece of storage in the cluster. It is a resource in the cluster just like a node is a cluster resource. PVs are volume plugins like Volumes, but have a lifecycle independent of any individual pod that uses the PV. This API object captures the details of the implementation of the storage, be that NFS, iSCSI, or a cloud-provider-specific storage system.

The below diagram is Google Cloud specific but this is a general flow. The storage provider will have storage that will be made available to Kubernetes using CSI plugin, in this case GCEPersistentDiskPlugin. And on the Kubernetes side we have PersistentVolume, which exists on the cluster as an independent resource just like Pods.

Below is the YAML file for a PV:

  • You have the general apiVersion, kind and metadata.
  • You have spec section in which you define accessModes. Access Modes are of three types. ReadWriteOnce, ReadWriteMany and ReadOnlyMany. We will discuss these separately in the next section.
  • In the spec section we also define the capacity, volume details like VolumeID, type, in this case my storage provider is AWS so I use awsElasticBlockStore. In case of Google I will use gcePersistentDisk.
  • We define one more thing in spec section which is very important and that is reclaim policy. We will also discuss this in the next section.
apiVersion: v1
kind: PersistentVolume
metadata:
  name: sample-pv
spec:
  accessModes:
  - ReadWriteOnce
  awsElasticBlockStore:
    fsType: ext4
    volumeID: aws://ap-south-1a/vol-0d155bd867daee865
  capacity:
    storage: 3Gi
  persistentVolumeReclaimPolicy: Delete

AccessModes

The access modes are:

  • ReadWriteOnce – the volume can be mounted as read-write by a single node
  • ReadOnlyMany – the volume can be mounted read-only by many nodes
  • ReadWriteMany – the volume can be mounted as read-write by many nodes

A volume can only be mounted using one access mode at a time, even if it supports many. For example, a GCEPersistentDisk can be mounted as ReadWriteOnce by a single node or ReadOnlyMany by many nodes, but not at the same time.

Reclaim Policy

The reclaim policy for a PersistentVolume tells the cluster what to do with the volume after it has been released of its claim. Currently, volumes can either be Retained, Recycled or Deleted.

Retain reclaim policy allows for manual reclamation of the resource. When the PersistentVolumeClaim is deleted, the PersistentVolume still exists and the volume is considered “released”.

Delete Reclaim Policy removes both the PersistentVolume object from Kubernetes, as well as the associated storage asset in the external infrastructure, such as an AWS EBS, GCE PD, or Azure Disk.

Recycle reclaim policy performs a basic scrub (rm -rf /thevolume/*) on the volume and makes it available again for a new claim.

Persistent Volume Claim

A PersistentVolumeClaim (PVC) is another Kubernetes object that is a request for storage by a user. It is similar to a pod. Pods consume node resources and PVCs consume PV resources. Pods can request specific levels of resources (CPU and Memory). Claims can request specific size and access modes (e.g., can be mounted once read/write or many times read-only). You can think of PVC as a ticket for PV. Your apps can use this ticket to access persistent volumes.

Below is the YAML file for a PVC:

  • It has a kind, apiVersion, metadata as usual.
  • In the spec section we have storageClassName which will be used to dynamically provision the persistent volume if it is not available. We will look into storage classes in the next section.
  • The spec section also contains resources section which defines the amout of storage needed.
  • AccessModes are also defined in the spec section. we have already discussed the various types of access modes in the previous section.
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: task-pv-claim
spec:
  storageClassName: gp2
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 3Gi

Storage Classes

Storage Classes just like everything else are Kubernetes Objects that help in dynamically provisioning Persistent Volumes. You can’t scale much if you have to first statically provision the persistent volume and then bind it to the persistent volume claim. This where StorageClasses are of great help by which we can provision persistent volume dynamically.

Overall, this is how it works in real environment. You will have a StorageClass defined for an external storage provider. In most of the clusters this comes by default. If you have setup your Cluster in AWs using AWS you will have a StorageClass for EBS, in GKE you will have StorageClass for GCEPersistentDisk. You will reference this StorageClass in your PersistentVolumeClaim which will automatically create a PersistentVolume for you. And finally you will reference PVC in your Pod or Deployment file.

Let’s see whatever we have discussed in action in the below demo.

Now, I will not show how to implement all of this is an application because I already have an application is using all these features. You can look into it and see how things fall in place. Here is the repository link. In the below demo I will just show dynamic provisioning using Storage Classes.

I have a Kubernetes Cluster which is set up in AWS using kops. I already have a StorageClass defined. We will use the below YAML file to create a PVC:

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: task-pv-claim
spec:
  storageClassName: gp2
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 3Gi

We are requesting 3 GB of EBS volume and the accessMode is ReadWriteOnce. One very important point, by default Reclaim Policy is Delete.

In the below screenshot we can see that I do not have any PV or PVC initially, then I applied the PVC yaml file which created PV and PVC. If you want to use this in an application just refer this PVC name in your application deployment file.

Note that the PVC Staus is Bound. Earlier it was Pending because in the background it was searching for PV which was not present. But StorageClass created a PV dynamically and after the the Status was Bound.

We can also see the PV which was created dynamically.

And when we go to the AWS Management Console we can see the volume which was created.

Now you have a pretty solid understanding of how to persist data on a Kubernetes Cluster. Hope you gained some knowledge from this article. Feel free to leave any comments if you face any issues or have any questions and I will be glad to help.

Next, we will look at how to specify the amount of resources like CPU or memory that a container inside a Pod can use based on our requirement.

Kubernetes Tutorial Series: Resource Allocation for Containers.

Subscribe to my newsletter if you want to learn more about Cloud and related technologies.

One thought to “Kubernetes Tutorial Series: Storage in Kubernetes”

Leave a comment

Your email address will not be published. Required fields are marked *