Etcd是Kubernetes集群中的一个十分重要的组件,用于保存集群所有的网络配置和对象的状态信息。
整个kubernetes系统中一共有两个服务需要用到etcd用来协同和存储配置,分别是:
- 网络插件flannel、对于其它网络插件也需要用到etcd存储网络的配置信息
- kubernetes本身,包括各种对象的状态和元信息配置
设置环境变量
当操作kubernetes时,需设置环境变量,ETCDCTL_API=3
export ETCDCTL_API=3
# 或者在`/etc/profile`文件中添加环境变量
vi /etc/profile
...
export ETCDCTL_API=3
...
source /etc/profile
# 或者在命令执行前加 ETCDCTL_API=3
ETCDCTL_API=3 etcdctl --endpoints=$ENDPOINTS member list
#查看版本
#etcdctl version -w table
etcdctl version: 3.4.9
API version: 3.
设置别名,避免每次得设置证书
$vim /etc/profile
alias etcdctl="ETCDCTL_API=3 /data/local/etcd/bin/etcdctl --cacert=/data/local/etcd/ssl/ca.pem --cert=/data/local/etcd/ssl/server.pem --key=/data/local/etcd/ssl/server-key.pem --endpoints="https://192.168.253.227:2379,https://192.168.
253.228:2379,https://192.168.253.229:2379""
#alias etcdctl="ETCDCTL_API=3 /data/local/etcd/bin/etcdctl --cacert=/data/local/etcd/ssl/ca.pem --cert=/data/local/etcd/ssl/server.pem --key=/data/local/etcd/ssl/server-key.pem --endpoints="https://192.168.253.227:2379""
$etcdctl member list -w table
+------------------+---------+--------+------------------------------+------------------------------+------------+
| ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | IS LEARNER |
+------------------+---------+--------+------------------------------+------------------------------+------------+
| c0142b9a19898be | started | etcd-1 | https://192.168.253.227:2380 | https://192.168.253.227:2379 | false |
| 78d50c713efbd616 | started | etcd-2 | https://192.168.253.228:2380 | https://192.168.253.228:2379 | false |
| bcb81ad99e84629a | started | etcd-3 | https://192.168.253.229:2380 | https://192.168.253.229:2379 | false |
+------------------+---------+--------+------------------------------+------------------------------+------------+
查看pod
etcdctl get /registry/pods --prefix --keys-only
- –keys-only 默认为true,只显示key,如果设置为false,会显示key的所有值.
- –prefix 默认为true可以看到所有的子目录.
查看service
etcdctl get /registry/service --prefix --keys-only
备份
- 备份命令只能写etcd集群中的单个节点,不能写多个节点。
- 指定在/etc/profile别名中endpoints="https://192.168.253.227:2379" 一个etcd节点即可
etcdctl snapshot save /data/backup/etcd/snapshot.db
{"level":"info","ts":1637575884.2848496,"caller":"snapshot/v3_snapshot.go:119","msg":"created temporary db file","path":"/data/backup/etcd/snapshot.db.part"}
{"level":"info","ts":"2021-11-22T18:11:24.297+0800","caller":"clientv3/maintenance.go:200","msg":"opened snapshot stream; downloading"}
{"level":"info","ts":1637575884.2974415,"caller":"snapshot/v3_snapshot.go:127","msg":"fetching snapshot","endpoint":"https://192.168.253.227:2379"}
{"level":"info","ts":"2021-11-22T18:11:24.340+0800","caller":"clientv3/maintenance.go:208","msg":"completed snapshot read; closing"}
{"level":"info","ts":1637575884.3451302,"caller":"snapshot/v3_snapshot.go:142","msg":"fetched snapshot","endpoint":"https://192.168.253.227:2379","size":"3.2 MB","took":0.060068191}
{"level":"info","ts":1637575884.34525,"caller":"snapshot/v3_snapshot.go:152","msg":"saved","path":"/data/backup/etcd/snapshot.db"}
Snapshot saved at /data/backup/etcd/snapshot.db
删除pod
kubectl delete pod nginx
pod "nginx" deleted
恢复准备
停止所有Master 上 kube-apiserver 服务
systemctl stop kube-apiserver
# 确认 kube-apiserver 服务是否停止
ps -ef | grep kube-apiserver
停止集群中 ETCD 服务
systemctl stop etcd
移除所有 ETCD 存储目录下数据(最好直接备份走从命名)
mv /var/lib/etcd/default.etcd /var/lib/etcd/default.etcd.bak
单td恢复备份
这边我已经在集群配置好/etc/profile,所以不用配置节点
etcdctl snapshot restore /data/backup/etcd/snapshot.db --data-dir=/var/lib/etcd/default.etcd
多etcd恢复备份
注:
- 2380是提供集群其他节点通信的的,2379是提供HTTP API服务的,别改错了
- 备份是一样的,都是etcd集群恢复也是在某一台etcd备份文件。
拷贝 ETCD 备份快照
scp /data/backup/etcd/snapshot.db 192.168.253.228:/data/backup/etcd/
scp /data/backup/etcd/snapshot.db 192.168.253.229:/data/backup/etcd/
test-k8s-master-227操作
etcdctl snapshot restore /data/backup/etcd/snapshot.db \
--name etcd-1 \
--initial-cluster "etcd-1=https://192.168.253.227:2380,etcd-2=https://192.168.253.228:2380,etcd-3=https://192.168.253.229:2380" \
--initial-cluster-token etcd-cluster \
--initial-advertise-peer-urls https://192.168.253.227:2380 \
--data-dir=/var/lib/etcd/default.etcd
test-k8s-node-228操作
etcdctl snapshot restore /data/backup/etcd/snapshot.db \
--name etcd-2 \
--initial-cluster "etcd-1=https://192.168.253.227:2380,etcd-2=https://192.168.253.228:2380,etcd-3=https://192.168.253.229:2380" \
--initial-cluster-token etcd-cluster \
--initial-advertise-peer-urls https://192.168.253.228:2380 \
--data-dir=/var/lib/etcd/default.etcd
test-k8s-node-229操作
etcdctl snapshot restore /data/backup/etcd/snapshot.db \
--name etcd-3 \
--initial-cluster "etcd-1=https://192.168.253.227:2380,etcd-2=https://192.168.253.228:2380,etcd-3=https://192.168.253.229:2380" \
--initial-cluster-token etcd-cluster \
--initial-advertise-peer-urls https://192.168.253.229:2380 \
--data-dir=/var/lib/etcd/default.etcd
说明:
name 节点名称
data-dir 节点数据存储目录
initial-cluster 集群中所有节点的信息
initial-advertise-peer-urls 该节点同伴监听地址,这个值会告诉集群中其他节点
nitial-cluster-token 创建集群的 token,这个值每个集群保持唯一
上面三台 ETCD 都恢复完成后,依次登陆三台机器启动 ETCD
systemctl start etcd
三台 ETCD 启动完成,检查 ETCD 集群状态
ETCDCTL_API=3 etcdctl --cacert=/data/local/etcd/ssl/ca.pem --cert=/data/local/etcd/ssl/server.pem --key=/data/local/etcd/ssl/server-key.pem --endpoints=https://192.168.253.227:2379,https://192.168.253.228:2379,https://192.168.253.229:2379 endpoint health
#查看etcd成员列表
etcdctl member list -w table
+------------------+---------+--------+------------------------------+------------------------------+------------+
| ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | IS LEARNER |
+------------------+---------+--------+------------------------------+------------------------------+------------+
| c0142b9a19898be | started | etcd-1 | https://192.168.253.227:2380 | https://192.168.253.227:2379 | false |
| 78d50c713efbd616 | started | etcd-2 | https://192.168.253.228:2380 | https://192.168.253.228:2379 | false |
| bcb81ad99e84629a | started | etcd-3 | https://192.168.253.229:2380 | https://192.168.253.229:2379 | false |
+------------------+---------+--------+------------------------------+------------------------------+------------+
依次启动apiserver
systemctl start kube-apiserver
kubectl get cs
Warning: v1 ComponentStatus is deprecated in v1.19+
NAME STATUS MESSAGE ERROR
scheduler Healthy ok
controller-manager Healthy ok
etcd-0 Healthy {"health":"true"}
etcd-2 Healthy {"health":"true"}
etcd-1 Healthy {"health":"true"}
查看pod是否恢复
kubectl get po
NAME READY STATUS RESTARTS AGE
nginx 1/1 Running 0 18h
注意
- flannel操作etcd使用的是v2的API,而kubernetes操作etcd使用的v3的API。
- 若使用 v3 备份数据时存在 v2 的数据则不影响恢复。
- 若使用 v2 备份数据时存在 v3 的数据则恢复失败。
- 备份的节点最好不要单一,如果备份的节点,出现有没备份的情况,等恢复的时候,就悲剧了,最好备份同时在两个节点以上,然后给备份加个监控。
- 备份ETCD集群时,只需要备份一个ETCD就行,恢复时,拿同一份备份数据恢复。
- 恢复顺序:停止kube-apiserver --> 停止ETCD --> 恢复数据 --> 启动ETCD --> 启动kube-apiserver