Kubernetes Horizontal Pod AutoScaling - HPA

 

In Kubernetes, a HorizontalPodAutoscaler automatically updates a workload resource (such as a Deployment or StatefulSet), with the aim of automatically scaling the workload to match demand.

Horizontal scaling means that the response to increased load is to deploy more Pods. 

Horizontal pod autoscaling does not apply to objects that can't be scaled (for example: a DaemonSet.)

From the most basic perspective, the HorizontalPodAutoscaler controller operates on the ratio between desired metric value and current metric value:

desiredReplicas = ceil[currentReplicas * ( currentMetricValue / desiredMetricValue )]

For example, if the current metric value is 200m, and the desired value is 100m, the number of replicas will be doubled, since 200.0 / 100.0 == 2.0 

If the current value is instead 50m, you'll halve the number of replicas, since 50.0 / 100.0 == 0.5. 

The control plane skips any scaling action if the ratio is sufficiently close to 1.0 (within a globally-configurable tolerance, 0.1 by default).

HorizontalPodAutoscaler controller works based on the metrics collected. So, make sure Metric Server is installed. Once you have a metric server in place.

Create a deployment with the service IP attached to it.

root@master-node:/kubernetes# kubectl apply -f php.yml
deployment.apps/php-apache created
service/php-apache created
root@master-node:/kubernetes#

root@master-node:/kubernetes# kubectl get svc
NAME         TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)   AGE
php-apache   ClusterIP   10.104.10.167   <none>        80/TCP    18s
root@master-node:/kubernetes#

Creating a HPA for the deployment.

root@master-node:/kubernetes# kubectl autoscale deployment php-apache --cpu-percent=50 --min=1 --max=5
horizontalpodautoscaler.autoscaling/php-apache autoscaled
root@master-node:/kubernetes# kubectl get hpa
NAME         REFERENCE               TARGETS         MINPODS   MAXPODS   REPLICAS   AGE
php-apache   Deployment/php-apache   <unknown>/50%   1         5         0          6s
root@master-node:/kubernetes#

YAML for reference:

root@master-node:/kubernetes# cat hpa.yml
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
  creationTimestamp: null
  name: php-apache
spec:
  maxReplicas: 5
  minReplicas: 1
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: php-apache
  targetCPUUtilizationPercentage: 50
root@master-node:/kubernetes#

Load testing:

root@master-node:/kubernetes# kubectl run -i --tty load-generator --rm --image=busybox --restart=Never -- /bin/sh -c "while sleep 0.01; do wget -q -O- http://php-apache; done"

root@master-node:/kubernetes# kubectl get pods --all-namespaces | grep -i metrics
kube-system    metrics-server-847d45fd4f-j2fpc       0/1     Running   0               93s
root@master-node:/kubernetes#

root@master-node:/kubernetes# kubectl top pod
NAME                          CPU(cores)   MEMORY(bytes)
php-apache-6766b988d9-wjbb9   1m           9Mi
root@master-node:/kubernetes#

root@master-node:/kubernetes# kubectl get hpa
NAME         REFERENCE               TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
php-apache   Deployment/php-apache   0%/50%    1         5         1          27m
root@master-node:/kubernetes#

Triggering load pod.

root@master-node:~# kubectl get hpa --watch
NAME         REFERENCE               TARGETS         MINPODS   MAXPODS   REPLICAS   AGE
php-apache   Deployment/php-apache   <unknown>/50%   1         5         1          10m
php-apache   Deployment/php-apache   <unknown>/50%   1         5         1          13m
php-apache   Deployment/php-apache   <unknown>/50%   1         5         1          17m
php-apache   Deployment/php-apache   <unknown>/50%   1         5         1          18m
php-apache   Deployment/php-apache   0%/50%          1         5         1          26m
php-apache   Deployment/php-apache   127%/50%        1         5         1          28m

root@master-node:~# kubectl get hpa --watch
NAME         REFERENCE               TARGETS         MINPODS   MAXPODS   REPLICAS   AGE
php-apache   Deployment/php-apache   <unknown>/50%   1         5         1          10m
php-apache   Deployment/php-apache   <unknown>/50%   1         5         1          13m
php-apache   Deployment/php-apache   <unknown>/50%   1         5         1          17m
php-apache   Deployment/php-apache   <unknown>/50%   1         5         1          18m
php-apache   Deployment/php-apache   0%/50%          1         5         1          26m
php-apache   Deployment/php-apache   127%/50%        1         5         1          28m
php-apache   Deployment/php-apache   213%/50%        1         5         3          28m
php-apache   Deployment/php-apache   126%/50%        1         5         5          28m
php-apache   Deployment/php-apache   98%/50%         1         5         5          28m

Now we can see max replicas achieved by HPA.

HPA scaling behavior:

Scaling behavior decides how the scale up/down should happen based on the metrics collected.

The following example shows this behavior while scaling down:

behavior:
  scaleDown:
    policies:
    - type: Pods
      value: 4
      periodSeconds: 60
    - type: Percent
      value: 10
      periodSeconds: 60

Above scale down behavior has 2 policies:

1) To take down 4 pods in 60 sec.
2) To take down 10% of overall pods in 60 sec.

When you have two policies, by default HPA pickups whichever policy will take high number of pods down.

Stabilization Window:

It prevents frequent flapping of replication count.

behavior:
  scaleDown:
    stabilizationWindowSeconds: 300
    policies:
    - type: Pods
      value: 5
      periodSeconds: 120

The above scale down behavior has a stabilizationWindowSeconds which waits for 5 mins before taking down 5 pods in 2 mins.



Comments

Popular posts from this blog

SRE/DevOps Syllabus

AWS Code Commit - CI/CD Series Part 1

Docker - Preventing IP overlapping