Kubernetes Pod Priority and Preemption


Pod priority indicates the importance of a pod relative to other pods and queues the pods based on that priority.

Pod preemption allows the cluster to evict, or preempt, lower-priority pods so that higher-priority pods can be scheduled if there is no available space on a suitable node Pod priority also affects the scheduling order of pods and out-of-resource eviction ordering on the node.

Priority classes can help you control the Kubernetes scheduler decisions to favor higher priority pods over lower priority pods.

The Kubernetes scheduler can even preempt (remove) lower priority pods that are running so that pending higher priority pods can be scheduled.

By setting pod priority, you can help prevent lower priority workloads from impacting critical workloads in your cluster, especially in cases where the cluster starts to reach its resource capacity.

root@masterk8s:~# kubectl describe pod kube-scheduler-masterk8s -n kube-system | grep -i priority

Priority:             2000001000

Priority Class Name:  system-node-critical

root@masterk8s:~#


By default, Kubernetes or OKD has two reserved priority classes for critical system pods to have guaranteed scheduling.

* System-node-critical:

This priority class has a value of 2000001000 and is used for all pods that should never be evicted from a node. 

* System-cluster-critical:

This priority class has a value of 2000000000 (two billion) and is used with pods that are important for the cluster.

Pods with this priority class can be evicted from a node in certain circumstances.

For example, pods configured with the system-node-critical priority class can take priority. However, this priority class does ensure guaranteed scheduling.

A priority class object can take any 32-bit integer value smaller than or equal to 1000000000 (one billion). 

Reserve numbers larger than one billion for critical pods that should not be preempted or evicted.

How to use priority and preemption?

You apply pod priority and preemption by creating a priority class objects and associating pods to the priority using the "priorityClassName" in your pod specifications.

globalDefault: This field is false by default.  

Adding a priority class with "globalDefault:true" affects only pods created after the priority class is added and does not change the priorities of existing pods.

root@masterk8s:/kube# kubectl get priorityclass

NAME                      VALUE        GLOBAL-DEFAULT   AGE

system-cluster-critical   2000000000   false            23d

system-node-critical      2000001000   false            23d

root@masterk8s:/kube#

(Notes) if you delete a PriorityClass, existing Pods that use the name of the deleted PriorityClass remain unchanged, but you cannot create more Pods that use the name of the deleted PriorityClass.

Eg:

Create 2 priority classes high and low.

apiVersion: scheduling.k8s.io/v1

kind: PriorityClass

metadata:

  name: low-priority

value: 50

globalDefault: false

description: "Low-priority Pods"


apiVersion: scheduling.k8s.io/v1

kind: PriorityClass

metadata:

  name: high-priority

value: 100

globalDefault: false

description: "High-priority Pods"

root@masterk8s:/kube# kubectl get priorityclass

NAME                      VALUE        GLOBAL-DEFAULT   AGE

high-priority             100          false            3s

low-priority              50           false            2m47s

system-cluster-critical   2000000000   false            23d

system-node-critical      2000001000   false            23d

root@masterk8s:/kube#

Now lets create a deployment using low priority class:

root@masterk8s:/kube# cat low_prio_deployment.yml

apiVersion: apps/v1

kind: Deployment

metadata:

  labels:

    app: nginx-deployment

  name: nginx-deployment

spec:

  replicas: 10

  selector:

    matchLabels:

      app: nginx-deployment

  template:

    metadata:

      labels:

        app: nginx-deployment

    spec:

      priorityClassName: "low-priority"

      containers:

       - image: nginx

         name: nginx-deployment

         resources:

           limits:

              memory: 100Mi

root@masterk8s:/kube#


root@masterk8s:/kube# kubectl apply -f low_prio_deployment.yml

deployment.apps/nginx-deployment created

root@masterk8s:/kube# kubectl get deployment nginx-deployment --watch

NAME               READY   UP-TO-DATE   AVAILABLE   AGE

nginx-deployment   0/10    10           0           3s

nginx-deployment   1/10    10           1           4s

nginx-deployment   2/10    10           2           6s

nginx-deployment   3/10    10           3           8s

nginx-deployment   4/10    10           4           12s

nginx-deployment   5/10    10           5           14s

nginx-deployment   6/10    10           6           18s

nginx-deployment   7/10    10           7           22s

nginx-deployment   8/10    10           8           24s

nginx-deployment   9/10    10           9           27s

nginx-deployment   10/10   10           10          29s


root@masterk8s:/kube# cat high_prio_deployment.yml

apiVersion: apps/v1

kind: Deployment

metadata:

  labels:

    app: nginx-deployment

  name: high-nginx-deployment

spec:

  replicas: 10

  selector:

    matchLabels:

      app: nginx-deployment

  template:

    metadata:

      labels:

        app: nginx-deployment

    spec:

      priorityClassName: "high-priority"

      containers:

       - image: nginx

         name: nginx-deployment

         resources:

           limits:

              memory:  100Mi

root@masterk8s:/kube#

Lets deploy high priority deployment.

root@masterk8s:/kube# kubectl apply -f high_prio_deployment.yml

deployment.apps/high-nginx-deployment created

root@masterk8s:/kube#


root@masterk8s:/kube# kubectl get deployment --watch

NAME                    READY   UP-TO-DATE   AVAILABLE   AGE

high-nginx-deployment   10/10   10           10          2m3s

nginx-deployment        5/10    10           5           5m8s


When the higher-priority deployment is created, it started to remove lower-priority pods on the nodes.

root@masterk8s:/kube# kubectl get pods

NAME                                     READY   STATUS    RESTARTS   AGE

high-nginx-deployment-768b657896-25pfd   1/1     Running   0          169m

high-nginx-deployment-768b657896-2nnvc   1/1     Running   0          169m

high-nginx-deployment-768b657896-2vzwj   1/1     Running   0          169m

high-nginx-deployment-768b657896-55s5p   1/1     Running   0          169m

high-nginx-deployment-768b657896-5jhfp   1/1     Running   0          169m

high-nginx-deployment-768b657896-5vk7x   1/1     Running   0          169m

high-nginx-deployment-768b657896-d6lq9   1/1     Running   0          169m

high-nginx-deployment-768b657896-gnvn9   1/1     Running   0          169m

high-nginx-deployment-768b657896-mbfm7   1/1     Running   0          169m

high-nginx-deployment-768b657896-rgpkr   1/1     Running   0          169m

nginx-deployment-54f6864c7b-2mnkk        0/1     Pending   0          169m

nginx-deployment-54f6864c7b-g6r98        1/1     Running   0          172m

nginx-deployment-54f6864c7b-gqlzp        1/1     Running   0          172m

nginx-deployment-54f6864c7b-j6lvc        1/1     Running   0          172m

nginx-deployment-54f6864c7b-lfghw        0/1     Pending   0          169m

nginx-deployment-54f6864c7b-mhbhc        0/1     Pending   0          169m

nginx-deployment-54f6864c7b-ngqcp        0/1     Pending   0          169m

nginx-deployment-54f6864c7b-p6289        1/1     Running   0          172m

nginx-deployment-54f6864c7b-ssqdn        1/1     Running   0          172m

nginx-deployment-54f6864c7b-tmlcp        0/1     Pending   0          169m

root@masterk8s:/kube#

Let's delete the high priority deployment.

root@masterk8s:/kube# kubectl delete deployment high-nginx-deployment

deployment.apps "high-nginx-deployment" deleted

root@masterk8s:/kube# kubectl get deployment --watch

NAME               READY   UP-TO-DATE   AVAILABLE   AGE

nginx-deployment   5/10    10           5           173m

nginx-deployment   6/10    10           6           173m

nginx-deployment   7/10    10           7           173m

nginx-deployment   8/10    10           8           173m

nginx-deployment   9/10    10           9           173m

nginx-deployment   10/10   10           10          173m


root@masterk8s:/kube# kubectl get pods

NAME                                READY   STATUS    RESTARTS   AGE

nginx-deployment-54f6864c7b-2mnkk   1/1     Running   0          171m

nginx-deployment-54f6864c7b-g6r98   1/1     Running   0          174m

nginx-deployment-54f6864c7b-gqlzp   1/1     Running   0          174m

nginx-deployment-54f6864c7b-j6lvc   1/1     Running   0          174m

nginx-deployment-54f6864c7b-lfghw   1/1     Running   0          171m

nginx-deployment-54f6864c7b-mhbhc   1/1     Running   0          171m

nginx-deployment-54f6864c7b-ngqcp   1/1     Running   0          171m

nginx-deployment-54f6864c7b-p6289   1/1     Running   0          174m

nginx-deployment-54f6864c7b-ssqdn   1/1     Running   0          174m

nginx-deployment-54f6864c7b-tmlcp   1/1     Running   0          171m

root@masterk8s:/kube#


Pods with without a PriorityClass are 0. A global PriorityClass can be assigned.


Comments

Popular posts from this blog

SRE/DevOps Syllabus

AWS Code Commit - CI/CD Series Part 1

Docker - Preventing IP overlapping