Horizontal Scale Up/Down the Pods Based on CPU Utilization in Kubernetes

Table of Contents

In this tutorial, we will look into 3 methods to horizontal scale up/down the pods based on CPU utilization in Kubernetes. Scaling is a feature which is used extensively in Kubernetes technology where resources can be increased or decreased depending upon the current Server workload. There are basically two types of scaling – Horizontal and Vertical Scaling. While here we are mostly going to concentrate on Horizontal Scaling, in the later article we will look into Vertical Scaling.

Imagine a scenario where you are running a 3 node Kubernetes Cluster with 20 different applications running on 20 different pods. While you might have created the cluster after understanding the maximum workload it requires to handle but think of a situation when suddenly workload got increased beyond a limit that Cluster can handle. It might result into a Server Crash or application downtime or may be even a production loss. So to avoid this situation Horizontal scaling feature can be much useful. More can be checked on Kubernetes Official Documentation.

What is Horizontal Scaling

The process of adding additional number of resources(pods or nodes) to the existing Server Cluster to share the workload is Known as Horizontal Scaling. It is a usually a preferred scaling method over Vertical Scaling.

Horizontal Scale Up/Down the Pods Based on CPU Utilization in Kubernetes

Also Read: 3 Easy Methods to Deploy/Create Pods in Kubernetes Cluster

Method 1: Horizontal Scale Up/Down the Pods Based on CPU Utilization Using YAML File

In the very first method, we will discuss about using YAML file to horizontal scale up/down the pods. This is also the recommended way where you can specify the minimum replica, maximum replica, CPU utilization percentage after which scaling happens all in a single YAML file.

[root@localhost ~]# vi autoscale.yaml
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: web-app-scaler
spec:
scaleTargetRef:
kind: ReplicaSet
name: web-app
minReplicas: 2
maxReplicas: 8
targetCPUUtilizationPercentage: 60

Now to create the HPA you need to use kubectl apply -f autoscale.yaml command as shown below. This will create the web-app-scaler HPA.

[root@localhost ~]# kubectl apply -f autoscale.yaml
horizontalpodautoscaler.autoscaling/web-app-scaler created

If you want to check all the create HPA then you need to use kubectl get hpa command as used below. As you can see we have only one HPA as of now.

[root@localhost ~]# kubectl get hpa
NAME           REFERENCE             TARGETS    MINPODS MAXPODS REPLICAS AGE
web-app-scaler ReplicaSet/web-app <unknown>/60%  2         8        0    76s

Once you are done with HPA you can easily delete them by using kubectl delete hpa <hpa_name> command. Here we are deleting web-app-scaler HPA by using kubectl delete hpa web-app-scaler command as used below.

[root@localhost ~]# kubectl delete hpa web-app-scaler
horizontalpodautoscaler.autoscaling "web-app-scaler" deleted

Once it is deleted you can verify the hpa list by using kubectl get hpa command.

[root@localhost ~]# kubectl get hpa
No resources found in cyberithub namespace.

NOTE:

Please note that here I am using root user to run all the below commands. You can use any user with sudo access to run all these commands. For more information Please check Step by Step: How to Add User to Sudoers to provide sudo access to the User.

Method 2: Horizontal Scale Up/Down the Pods Based on CPU Utilization Using JSON File

The second method that you will often find to be used by JSON lovers where you can simply change the extension of .yaml file to .json file and use as it is. Sometimes it does happen that you are working in a JSON environment so you require json file instead of yaml file. You can simply rename autoscale.yaml file to autoscale.json file by using mv autoscale.yaml autoscale.json command as shown below.

[root@localhost ~]# mv autoscale.yaml autoscale.json

You can verify autoscale.json contents by opening the file with vi editor or by using cat autoscale.json command. You will find no difference.

[root@localhost ~]# vi autoscale.json
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: web-app-scaler
spec:
scaleTargetRef:
kind: ReplicaSet
name: web-app
minReplicas: 2
maxReplicas: 8
targetCPUUtilizationPercentage: 60

Then you can use the same kubectl command as you used in previous method with only exception of changing the file name to json file instead of yaml file.

[root@localhost ~]# kubectl apply -f autoscale.json
horizontalpodautoscaler.autoscaling/web-app-scaler created

Then you will see web-app-scaler HPA got created successfully with CPU limit set to 60% as specified below.

[root@localhost ~]# kubectl get hpa
NAME               REFERENCE          TARGETS   MINPODS MAXPODS REPLICAS AGE
web-app-scaler ReplicaSet/web-app <unknown>/60%    2       8       0     10s

You can also delete the created HPA just like you have deleted in previous method using kubectl delete hpa web-app-scaler command once you are done with it.

[root@localhost ~]# kubectl delete hpa web-app-scaler
horizontalpodautoscaler.autoscaling "web-app-scaler" deleted

NOTE:

Please note that here we are creating all the resources in current namespace hence we are not using any separate namespace option in any of the commands. You can specify the namespace option if you want to create resources under some other namespace.

Method 3: Horizontal Scale Up/Down the Pods Based on CPU Utilization Using kubectl command

The third method that you can use is through kubectl command in CLI. You can create HPA in a single command where you can specify the maximum and minimum number of pods using --max and --min option and the CPU Utilization can be set by using --cpu-percent option as shown below.

[root@localhost ~]# kubectl autoscale rs web-app --max=8 --min=2 --cpu-percent=60
horizontalpodautoscaler.autoscaling/web-app autoscaled

Once we have Horizontal Pod Autoscaler created you can verify it by using kubectl get hpa command. This will show the list of HPA currently available along with the different options set for them.

[root@localhost ~]# kubectl get hpa
NAME        REFERENCE            TARGETS    MINPODS MAXPODS REPLICAS AGE
web-app ReplicaSet/web-app    <unknown>/60%  2         8       2     35m

Just like above methods you can also delete this Autoscaler by using same kubectl delete hpa web-app command.

[root@localhost ~]# kubectl delete hpa web-app
horizontalpodautoscaler.autoscaling "web-app" deleted

This will delete the Autoscaler as you can confirm from below output.

[root@localhost ~]# kubectl get hpa
No resources found in cyberithub namespace.

Conclusion

In this tutorial, we learnt about the meaning of scaling and the different methods used for scaling. We have learnt about the horizontal scaling and the different methods used for horizontal scaling based on CPU Utilization. We have also gone through different kubectl commands that can be used in Kubernetes to perform the required tasks. Hopefully this tutorial was helpful.

Popular Recommendations:-

How to Create New Custom Namespaces in Kubernetes{3 Best Methods}

Create a Service to Expose Your Apps on Kubernetes(v1.16)

How to Install and Configure Kubernetes on Redhat/CentOS 7 with Best Example

Best 15 Kubectl and Kubeadm Commands

How to Check Stateful and Stateless Pods in Kubernetes Cluster

22 Best Kubectl Command Examples

Migrate CentOS 8 to CentOS Stream 8 in 6 Easy Steps