K8s - ETCD
etcd is a "strongly consistent, distributed
key-value store".
Why etcd?
1. Consistency:
Since the API server is the central coordination point of the entire cluster;
strong consistency is essential. It would be a disaster if, say, two nodes
tried to attach the same persistent volume over iSCSI because the API server
told them both that it was available.
2. Availability:
API downtime means that the entire Kubernetes control plane comes to a halt,
which is undesirable for production clusters. The CAP theorem says
that 100% availability is impossible with strong consistency, but minimizing
downtime is still a critical goal.
3.
Consistent Performance:
The API
server for a busy Kubernetes cluster receives a fair amount of read and write
traffic.
The secret behind etcd's balance of strong
consistency and high availability is the Raft algorithm.
Raft solves
a particular problem: how can multiple independent processes decide on a single value for something?
Raft works
by electing a leader among a set of nodes and forcing all write requests to go to the leader.
In
Kubernetes, etcd serves as the primary datastore for the cluster's state and configuration.
It's a
distributed, highly available key-value store that stores all cluster data,
including metadata about objects like pods, services, and deployments, as well
as the current and desired state of the cluster.
Kubernetes
uses etcd to coordinate cluster activities, ensure consistency, and manage the
reconciliation of the actual and desired states.
Kubernetes
uses etcd's "watch"
functionality to monitor changes to the cluster's state and automatically take
actions to reconcile any discrepancies between the actual and desired
states.
etcd acts
as a central coordination point for the Kubernetes control plane, ensuring
consistency and reliability in the data it manage.
There are 2 critical folders inside /var/lib/etcd
The snap and wal folders
are essential for ensuring that this data is persistent, fault-tolerant, and
can be recovered in case of failures.
snap (Snapshot)
and wal (Write-Ahead Log).
Snapshots are full backups of the etcd database,
capturing the state of the Kubernetes cluster at a specific
time. They are used for disaster recovery scenarios, allowing you to
restore the cluster to a known good state if the database becomes corrupted or
lost. The snapshot data is stored in the /snap directory within
the etcd data directory.
The WAL is a log of all changes made to the etcd database, including updates and deletes. It's designed to ensure that data is not lost even if the etcd server crashes or loses power before the changes can be written to the snapshot.
I am going to create a namespace and create a deployment with 3 replicas.
Let's start taking ETCD backup.
ETCDCTL_API=3 etcdctl snapshot save snapshot.db \
--cacert /etc/kubernetes/pki/etcd/ca.crt \
--cert /etc/kubernetes/pki/etcd/server.crt \--key
/etc/kubernetes/pki/etcd/server.key
The etcd backup is take by the name snapshot.db. The arguments are the keys used by ETCD to authenticate for communicating with API server and other clients like Kubelet.
ETCDCTL_API=3
etcdctl --data-dir="/var/lib/etcd-restore" \
--endpoints=https://172.18.0.2:6443\
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key \
snapshot restore snapshot.db
vi
/etc/kubernetes/manifests/etcd.yaml
Comments
Post a Comment