0
点赞
收藏
分享

微信扫一扫

Kubelet 启动异常排查


服务器启动异常排查的方法  故障一

这种kubelet启动失败的排查方法和所有systemd管理的服务排查方法相同,从检查服务状态和日志开始

  • 检查​​kubelet​​ 服务状态:

systemctl status kubelet

显示输出:

● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
Drop-In: /etc/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: activating (auto-restart) (Result: exit-code) since Tue 2021-06-01 23:05:15 CST; 2s ago
Docs: https://kubernetes.io/docs/home/
Process: 15941 ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS (code=exited, status=1/FAILURE)
Main PID: 15941 (code=exited, status=1/FAILURE)

6月 01 23:05:15 jetson systemd[1]: kubelet.service: Main process exited, code=exited, status=1/FAILURE
6月 01 23:05:15 jetson systemd[1]: kubelet.service: Failed with result 'exit-code'.

  • 检查journal日志:

journalctl -u kubelet --no-pager

​journalctl​​​ 的 ​​-u​​​ 参数可以指定服务进行过滤,这样可以屏蔽掉其他无关日志。 ​​--no-pager​​ 参数可以一次性输出日志,当然如果你只是在线查看,则可以不用这个参数,只是输出日志受到屏幕宽度限制,需要通过方向键滚动。

显示:

6月 01 23:15:22 jetson systemd[1]: Started kubelet: The Kubernetes Node Agent.
6月 01 23:15:22 jetson kubelet[24037]: Flag --network-plugin has been deprecated, will be removed along with dockershim.
6月 01 23:15:22 jetson kubelet[24037]: Flag --network-plugin has been deprecated, will be removed along with dockershim.
6月 01 23:15:22 jetson kubelet[24037]: I0601 23:15:22.825636 24037 server.go:440] "Kubelet version" kubeletVersion="v1.21.1"
6月 01 23:15:22 jetson kubelet[24037]: I0601 23:15:22.826407 24037 server.go:851] "Client rotation is on, will bootstrap in background"
6月 01 23:15:22 jetson kubelet[24037]: I0601 23:15:22.867597 24037 certificate_store.go:130] Loading cert/key pair from "/var/lib/kubelet/pki/kubelet-client-current.pem".
6月 01 23:15:22 jetson kubelet[24037]: I0601 23:15:22.870159 24037 dynamic_cafile_content.go:167] Starting client-ca-bundle::/etc/kubernetes/pki/ca.crt
6月 01 23:15:23 jetson kubelet[24037]: W0601 23:15:23.037531 24037 sysinfo.go:203] Nodes topology is not available, providing CPU topology
6月 01 23:15:23 jetson kubelet[24037]: I0601 23:15:23.090497 24037 server.go:660] "--cgroups-per-qos enabled, but --cgroup-root was not specified. defaulting to /"
6月 01 23:15:23 jetson kubelet[24037]: E0601 23:15:23.090909 24037 server.go:292] "Failed to run kubelet" err="failed to run Kubelet: running with swap on is not supported, please disable swap! or set --fail-swap-on flag to false. /proc/swaps contained: [Filename\t\t\t\tType\t\tSize\tUsed\tPriority /dev/zram0 partition\t507408\t0\t5 /dev/zram1 partition\t507408\t0\t5 /dev/zram2 partition\t507408\t0\t5 /dev/zram3 partition\t507408\t0\t5]"
6月 01 23:15:23 jetson systemd[1]: kubelet.service: Main process exited, code=exited, status=1/FAILURE
6月 01 23:15:23 jetson systemd[1]: kubelet.service: Failed with result 'exit-code'.
6月 01 23:15:33 jetson systemd[1]: kubelet.service: Service hold-off time over, scheduling restart.
6月 01 23:15:33 jetson systemd[1]: kubelet.service: Scheduled restart job, restart counter is at 2191.
6月 01 23:15:33 jetson systemd[1]: Stopped kubelet: The Kubernetes Node Agent.
6月 01 23:15:33 jetson systemd[1]: Started kubelet: The Kubernetes Node Agent.
6月 01 23:15:33 jetson kubelet[24176]: Flag --network-plugin has been deprecated, will be removed along with dockershim.
6月 01 23:15:33 jetson kubelet[24176]: Flag --network-plugin has been deprecated, will be removed along with dockershim.
6月 01 23:15:33 jetson kubelet[24176]: I0601 23:15:33.595247 24176 server.go:440] "Kubelet version" kubeletVersion="v1.21.1"
6月 01 23:15:33 jetson kubelet[24176]: I0601 23:15:33.596482 24176 server.go:851] "Client rotation is on, will bootstrap in background"

原因是 ​​kubelet​​​ 默认不支持 ​​swap​​​ ,所以需要关闭swap,或者在 kubelet 启动时传递 ​​--fail-swap-on​​ 参数。

# 关闭swap
swapoff -a # 临时
sed -ri 's/.*swap.*/#&/' /etc/fstab # 永久

 

 

故障二

故障现象

kubelet 启动不了,通过命令 journalctl -u kubelet 查看日志,报 Failed to start ContainerManager failed to initialize top level QOS containers: failed to update top level Burstable QOS cgroup : failed to set supported cgroup subsystems for cgroup [kubepods burstable]: failed to find subsystem mount for required subsystem: pids

故障分析
根据报错,有用的信息是 failed to find subsystem mount for required subsystem: pids,通过命令 ls -l /sys/fs/cgroup/systemd/kubepods/burstable/ 查看,该目录下没有 pids 目录。

SupportPodPidsLimit 在 kubernetes 1.14+ 默认开启。SupportNodePidsLimit 在 1.15+ 默认开启。

相关Issues:https://github.com/kubernetes/kubernetes/issues/79046

解决方法

  • 方法一:编辑 kubelet 配置文件,添加​​--feature-gates=SupportPodPidsLimit=false,SupportNodePidsLimit=false​​ 参数,后面在重启 kubelet 服务。
  • 方法二:可以升级系统内核​​5+​​ 版本

 

 

故障三

报错现象

kubelet 日志报 ​​network plugin is not ready: cni config uninitialized​

解决方法

网络插件(flannel 或者 calico)没有安装或者安装失败。

 

 

 

故障四

故障现象
kubelet 日志报 Failed to connect to apiserver: the server has asked for the client to provide credentials

故障分析
从上面 kubelet 日志信息能得出,kubelet 客户端证书已过期,导致 Node节点状态处于 NotReady。

也可以通过命令 openssl x509 -noout -enddate -in {证书路径} 来查看证书到期日期。

解决方法
kubeadm 部署的 Kubernetes 解决方法
kubernetes 1.15+ 版本可以直接通过命令 kubeadm alpha certs renew <cert_name> 更新。

kubernetes 小于 1.15 版本的,可以参考 https://github.com/yuyicai/update-kube-cert 项目更新

二进制部署的 Kubernetes 解决方法

# 删除旧的 kubelet 证书文件
$ rm -f /opt/kubernetes/ssl/kubelet*

# 删除 kubelet kubeconfig 文件
$ rm -f /opt/kubernetes/cfg/kubelet.kubeconfig

# 重启 kubelet 服务,让 master 重新颁发客户端证书
$ systemctl restart kubelet

举报

相关推荐

0 条评论