Kubernetes Horizontal Pod AutoScaling - HPA
In Kubernetes, a HorizontalPodAutoscaler automatically updates a workload resource (such as a Deployment or StatefulSet), with the aim of automatically scaling the workload to match demand.
Horizontal scaling means that the response to increased load is to deploy more Pods.
Horizontal pod autoscaling does not apply to objects that can't be scaled (for example: a DaemonSet.)
From the most basic perspective, the HorizontalPodAutoscaler controller operates on the ratio between desired metric value and current metric value:
desiredReplicas = ceil[currentReplicas * ( currentMetricValue / desiredMetricValue )]
For example, if the current metric value is 200m, and the desired value is 100m, the number of replicas will be doubled, since 200.0 / 100.0 == 2.0
If the current value is instead 50m, you'll halve the number of replicas, since 50.0 / 100.0 == 0.5.
The control plane skips any scaling action if the ratio is sufficiently close to 1.0 (within a globally-configurable tolerance, 0.1 by default).
HorizontalPodAutoscaler controller works based on the metrics collected. So, make sure Metric Server is installed. Once you have a metric server in place.
Create a deployment with the service IP attached to it.
root@master-node:/kubernetes# kubectl apply -f php.yml
deployment.apps/php-apache created
service/php-apache created
root@master-node:/kubernetes#
root@master-node:/kubernetes# kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
php-apache ClusterIP 10.104.10.167 <none> 80/TCP 18s
root@master-node:/kubernetes#
Creating a HPA for the deployment.
root@master-node:/kubernetes# kubectl autoscale deployment php-apache --cpu-percent=50 --min=1 --max=5
horizontalpodautoscaler.autoscaling/php-apache autoscaled
root@master-node:/kubernetes# kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
php-apache Deployment/php-apache <unknown>/50% 1 5 0 6s
root@master-node:/kubernetes#
YAML for reference:
root@master-node:/kubernetes# cat hpa.yml
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
creationTimestamp: null
name: php-apache
spec:
maxReplicas: 5
minReplicas: 1
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: php-apache
targetCPUUtilizationPercentage: 50
root@master-node:/kubernetes#
Load testing:
root@master-node:/kubernetes# kubectl run -i --tty load-generator --rm --image=busybox --restart=Never -- /bin/sh -c "while sleep 0.01; do wget -q -O- http://php-apache; done"
root@master-node:/kubernetes# kubectl get pods --all-namespaces | grep -i metrics
kube-system metrics-server-847d45fd4f-j2fpc 0/1 Running 0 93s
root@master-node:/kubernetes#
root@master-node:/kubernetes# kubectl top pod
NAME CPU(cores) MEMORY(bytes)
php-apache-6766b988d9-wjbb9 1m 9Mi
root@master-node:/kubernetes#
root@master-node:/kubernetes# kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
php-apache Deployment/php-apache 0%/50% 1 5 1 27m
root@master-node:/kubernetes#
Triggering load pod.
root@master-node:~# kubectl get hpa --watch
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
php-apache Deployment/php-apache <unknown>/50% 1 5 1 10m
php-apache Deployment/php-apache <unknown>/50% 1 5 1 13m
php-apache Deployment/php-apache <unknown>/50% 1 5 1 17m
php-apache Deployment/php-apache <unknown>/50% 1 5 1 18m
php-apache Deployment/php-apache 0%/50% 1 5 1 26m
php-apache Deployment/php-apache 127%/50% 1 5 1 28m
root@master-node:~# kubectl get hpa --watch
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
php-apache Deployment/php-apache <unknown>/50% 1 5 1 10m
php-apache Deployment/php-apache <unknown>/50% 1 5 1 13m
php-apache Deployment/php-apache <unknown>/50% 1 5 1 17m
php-apache Deployment/php-apache <unknown>/50% 1 5 1 18m
php-apache Deployment/php-apache 0%/50% 1 5 1 26m
php-apache Deployment/php-apache 127%/50% 1 5 1 28m
php-apache Deployment/php-apache 213%/50% 1 5 3 28m
php-apache Deployment/php-apache 126%/50% 1 5 5 28m
php-apache Deployment/php-apache 98%/50% 1 5 5 28m
Now we can see max replicas achieved by HPA.
HPA scaling behavior:
Scaling behavior decides how the scale up/down should happen based on the metrics collected.
The following example shows this behavior while scaling down:
behavior:
scaleDown:
policies:
- type: Pods
value: 4
periodSeconds: 60
- type: Percent
value: 10
periodSeconds: 60
Above scale down behavior has 2 policies:
1) To take down 4 pods in 60 sec.
2) To take down 10% of overall pods in 60 sec.
When you have two policies, by default HPA pickups whichever policy will take high number of pods down.
Stabilization Window:
It prevents frequent flapping of replication count.
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Pods
value: 5
periodSeconds: 120
The above scale down behavior has a stabilizationWindowSeconds which waits for 5 mins before taking down 5 pods in 2 mins.
Comments
Post a Comment