Kubernetes - ETCD

 



etcd is a consistent and highly-available key value store used as Kubernetes' backing store for all cluster data.

If your Kubernetes cluster uses etcd as its backing store, make sure you have a back up plan for those data.

All Kubernetes objects are stored on etcd. 

Periodically backing up the etcd cluster data is important to recover Kubernetes clusters under disaster scenarios, such as losing all control plane nodes. 

The snapshot file contains all the Kubernetes states and critical information. 

In order to keep the sensitive Kubernetes data safe, encrypt the snapshot files.

Backing up an etcd cluster can be accomplished in two ways: etcd built-in snapshot and volume snapshot.

Few things you should know about etcd from a Kubernetes perspective.

It is a consistent, distributed, and a secure key-value store.

It uses raft protocol.

Supports highly available architecture with stacked etcd.

It stores kubernetes cluster configurations, all API objects, object states, and service discovery details.

Here is what you should know about etcd backup.

etcd has a built-in snapshot mechanism.

etcdctl is the command line utility that interacts with etcd for snapshots.

If you don’t have etcdctl in your cluster control plane, install it using the following command.

root@masterk8s:~# apt-get install etcd-client

We need to pass the following pieces of information to etcdctl to take an etcd snapshot.

etcd endpoint (–endpoints)

ca certificate (–cacert)

server certificate (–cert)

server key (–key)

You can get the above details in two ways.

From the etcd static pod manifest file located at /etc/kubernetes/manifests/etcd.yaml the location or describe the ETCD pod.

root@masterk8s:~# kubectl get pods -A | grep -i etcd

kube-system    etcd-masterk8s                      1/1     Running   1          44d

root@masterk8s:~#

root@masterk8s:~# kubectl describe pod etcd-masterk8s -n kube-system

Command:

      etcd

      --advertise-client-urls=https://192.168.163.128:2379

      --cert-file=/etc/kubernetes/pki/etcd/server.crt

      --client-cert-auth=true

      --data-dir=/var/lib/etcd

      --initial-advertise-peer-urls=https://192.168.163.128:2380

      --initial-cluster=masterk8s=https://192.168.163.128:2380

      --key-file=/etc/kubernetes/pki/etcd/server.key

      --listen-client-urls=https://127.0.0.1:2379,https://192.168.163.128:2379

      --listen-metrics-urls=http://127.0.0.1:2381

      --listen-peer-urls=https://192.168.163.128:2380

      --name=masterk8s

      --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt

      --peer-client-cert-auth=true

      --peer-key-file=/etc/kubernetes/pki/etcd/peer.key

      --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt

      --snapshot-count=10000

      --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt

Take an etcd snapshot backup using the following command.

root@masterk8s:~# ETCDCTL_API=3 etcdctl --endpoints=https://192.168.163.128:2379 \

> --cacert=/etc/kubernetes/pki/etcd/ca.crt \

> --cert=/etc/kubernetes/pki/etcd/server.crt \

> --key=/etc/kubernetes/pki/etcd/server.key \

> snapshot save /tmp/etcd.db

Snapshot saved at /tmp/etcd.db

root@masterk8s:~#

root@masterk8s:~# file /tmp/etcd.db

/tmp/etcd.db: data

root@masterk8s:~#

Verify the snapshot using the following command.

root@masterk8s:~# ETCDCTL_API=3 etcdctl --write-out=table snapshot status /tmp/etcd.db

+----------+----------+------------+------------+

|   HASH   | REVISION | TOTAL KEYS | TOTAL SIZE |

+----------+----------+------------+------------+

| 74f532b6 |    12678 |       1003 |     3.3 MB |

+----------+----------+------------+------------+

root@masterk8s:~#

Kubernetes etcd Restore Using Snapshot Backup:

1) Taking backup of etcd. I have the backup under - /tmp/etcd.db

2) Stopping all the API services.

3) Restore from the backup.

4) Start all the API services.

Stopping all the API services by moving the files under /etc/kubernetes/manifests to a temp location.

root@masterk8s:/etc/kubernetes/manifests# ls -lrt
total 16
-rw------- 1 root root 3834 Nov 26 15:43 kube-apiserver.yaml
-rw------- 1 root root 3496 Nov 26 15:43 kube-controller-manager.yaml
-rw------- 1 root root 1385 Nov 26 15:43 kube-scheduler.yaml
-rw------- 1 root root 2115 Nov 26 15:43 etcd.yaml
root@masterk8s:/etc/kubernetes/manifests# mv *.yaml /var/backuppods/
root@masterk8s:/etc/kubernetes/manifests#

root@masterk8s:/etc/kubernetes/manifests# kubectl get pods -A
The connection to the server 192.168.163.128:6443 was refused - did you specify the right host or port?
root@masterk8s:/etc/kubernetes/manifests#

Restore from backup.

root@masterk8s:/etc/kubernetes/manifests# export ETCDCTL_CACERT=/etc/kubernetes/pki/etcd/ca.crt
root@masterk8s:/etc/kubernetes/manifests# export ETCDCTL_CERT=/etc/kubernetes/pki/etcd/server.crt
root@masterk8s:/etc/kubernetes/manifests# export ETCDCTL_KEY=/etc/kubernetes/pki/etcd/server.key
root@masterk8s:/etc/kubernetes/manifests# export ETCDCTL_API=3

Switch to ETCD folder.

root@masterk8s:/etc/kubernetes/manifests# cd /var/lib/etcd/
root@masterk8s:/var/lib/etcd# ls -l
total 4
drwx------ 4 root root 4096 Dec  2 17:40 member
root@masterk8s:/var/lib/etcd#

root@masterk8s:/etc/kubernetes/manifests# cd /var/lib/etcd/
root@masterk8s:/var/lib/etcd# ls -l
total 4
drwx------ 4 root root 4096 Dec  2 17:40 member
root@masterk8s:/var/lib/etcd# etcdctl snapshot restore /tmp/etcd.db
2023-01-10 14:28:12.862479 I | mvcc: restore compact to 2306
2023-01-10 14:28:12.962115 I | etcdserver/membership: added member 8e9e05c52164694d [http://localhost:2380] to cluster cdf818194e3a8c32
root@masterk8s:/var/lib/etcd# ls -lrt
total 8
drwx------ 4 root root 4096 Dec  2 17:40 member
drwx------ 3 root root 4096 Jan 10 14:28 default.etcd
root@masterk8s:/var/lib/etcd#

Restore will create a folder called "default.etcd" and we need to move "member" folder from "default.etcd" to /var/lib/etcd

root@masterk8s:/var/lib/etcd# ls -lrt
total 8
drwx------ 4 root root 4096 Dec  2 17:40 member
drwx------ 3 root root 4096 Jan 10 14:28 default.etcd
root@masterk8s:/var/lib/etcd# mv member member.old
root@masterk8s:/var/lib/etcd# mv default.etcd/member/ .
root@masterk8s:/var/lib/etcd# ls -lrt
total 12
drwx------ 4 root root 4096 Dec  2 17:40 member.old
drwx------ 4 root root 4096 Jan 10 14:28 member
drwx------ 2 root root 4096 Jan 10 14:29 default.etcd
root@masterk8s:/var/lib/etcd#

Restart necessary k8s instances.

root@masterk8s:/var/lib/etcd# cd /etc/kubernetes/manifests/
root@masterk8s:/etc/kubernetes/manifests# mv /var/backuppods/* .
root@masterk8s:/etc/kubernetes/manifests# ls -lrt
total 16
-rw------- 1 root root 3834 Nov 26 15:43 kube-apiserver.yaml
-rw------- 1 root root 3496 Nov 26 15:43 kube-controller-manager.yaml
-rw------- 1 root root 1385 Nov 26 15:43 kube-scheduler.yaml
-rw------- 1 root root 2115 Nov 26 15:43 etcd.yaml
root@masterk8s:/etc/kubernetes/manifests#

root@masterk8s:~/.kube# systemctl restart kubelet

Restart kubelet if you get API errors.

root@masterk8s:~/.kube# kubectl get pods -A
NAMESPACE      NAME                                READY   STATUS    RESTARTS   AGE
kube-flannel   kube-flannel-ds-zl528               1/1     Running   0          38d
kube-system    coredns-f9fd979d6-27767             1/1     Running   0          44d
kube-system    coredns-f9fd979d6-qlttz             1/1     Running   0          44d
kube-system    etcd-masterk8s                      1/1     Running   1          44d
kube-system    kube-apiserver-masterk8s            1/1     Running   1          44d
kube-system    kube-controller-manager-masterk8s   1/1     Running   1          44d
kube-system    kube-proxy-t7dfv                    1/1     Running   1          44d
kube-system    kube-scheduler-masterk8s            1/1     Running   2          44d
root@masterk8s:~/.kube#




Comments

Popular posts from this blog

SRE/DevOps Syllabus

AWS Code Commit - CI/CD Series Part 1

Docker - Preventing IP overlapping