内容简介:说明服务挂掉过一次,登陆到机器上发现集群有台节点状态是nodelost状态上去看到相关服务都挂掉了然后排查到根分区占满了,排查到是k8s日志堆满了/var/log/
- 邮件收到zabbix的告警,业务的网页登陆状态不是200,后面又自愈了
说明服务挂掉过一次,登陆到机器上发现集群有台节点状态是nodelost状态
上去看到相关服务都挂掉了
然后排查到根分区占满了,排查到是k8s日志堆满了/var/log/
[root@cloudos02 ~]# df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/centos-root 219G 216G 0 100% / devtmpfs 63G 0 63G 0% /dev tmpfs 63G 12K 63G 1% /dev/shm tmpfs 63G 226M 63G 1% /run tmpfs 63G 0 63G 0% /sys/fs/cgroup /dev/sda3 197M 136M 61M 70% /boot /dev/sda2 200M 0 200M 0% /boot/efi tmpfs 13G 0 13G 0% /run/user/0 [root@cloudos02 /var/log/]# du -shx /var/log/* | grep -P '^\S+?G' 31G /var/log/heat 1.1G /var/log/keystone 163G /var/log/kubernetes 3.7G /var/log/nova 1.7G /var/log/openstack-compute
- 日志文件名有规律,直接删掉20天之前的日志文件
[root@cloudos02 kubernetes]# find -mtime +20 -name 'kube*.cloudos02*' -exec rm -f {} \; [root@cloudos02 kubernetes]# df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/centos-root 219G 117G 91G 57% / devtmpfs 63G 0 63G 0% /dev tmpfs 63G 12K 63G 1% /dev/shm tmpfs 63G 226M 63G 1% /run tmpfs 63G 0 63G 0% /sys/fs/cgroup /dev/sda3 197M 136M 61M 70% /boot /dev/sda2 200M 0 200M 0% /boot/efi tmpfs 13G 0 13G 0% /run/user/0
- k8s核心是etcd,果然是etcd有问题,k8s相关服务全部挂了
[root@cloudos02 ~]# /opt/bin/etcdctl cluster-health member 9bd4565552fd93c is healthy: got healthy result from http://10.12.0.21:2379 failed to check the health of member 658a31702f200e95 on http://10.12.0.22:2379: Get http://10.12.0.22:2379/health: dial tcp 10.12.0.22:2379: getsockopt: connection refused member 658a31702f200e95 is unreachable: [http://10.12.0.22:2379] are all unreachable member d1a9f9229366f9b8 is healthy: got healthy result from http://10.12.0.23:2379 cluster is healthy
- 查看日志
[root@cloudos02 ~]# journalctl -xe -u etcd2 一大堆输出说snap.broken
通过日志可以确定etcd的文件损坏了,肯定是由于根分区满了同步过来的数据无法写入导致损坏
先查找etcd的数据目录在哪,解决方法就是删掉此台的数据目录,然后再同步过来就行了
由于是实体服务,直接去找systemd脚本
[root@cloudos02 ~]# cat /usr/lib/systemd/system/etcd2.service [Unit] Description=Etcd2 Server [Service] Type=notify EnvironmentFile=-/etc/sysconfig/kube-etcd-cluster ExecStart=/opt/bin/etcd --name=${ETCD_NAME} ......省略
从/usr/lib/systemd/system/etcd2.service看到没有写数据目录,那么默认数据目录是默认为 ${name}.etcd
etcd2.service里的name是从/etc/sysconfig/kube-etcd-cluster里读取变量
[root@cloudos02 ~]# cat /etc/sysconfig/kube-etcd-cluster ETCD_NAME="NODE2" ETCD_INITIAL_ADVERTISE_PEER_URLS="http://10.12.0.22:2380" ETCD_LISTEN_PEER_URLS="http://10.12.0.22:2380" ETCD_LISTEN_CLIENT_URLS="http://10.12.0.22:2379,http://127.0.0.1:2379" ETCD_ADVERTISE_CLIENT_URLS="http://10.12.0.22:2379" ETCD_INITIAL_CLUSTER_TOKEN="my-etcd-cluster" ETCD_INITIAL_CLUSTER="NODE1=http://10.12.0.21:2380,NODE2=http://10.12.0.22:2380,NODE3=http://10.12.0.23:2380" ETCD_INITIAL_CLUSTER_STATE="new"
根目录确实有NODE2.etcd,删掉数据目录
[root@cloudos02 ~]# ll / drwx------ 3 root root 4096 Jun 29 11:47 NODE2.etcd [root@cloudos02 ~]# rm -rf /NODE2.etcd
去另外正常的节点上移除这个节点,然后再加上
[root@cloudos01 ~]# /opt/bin/etcdctl member remove 658a31702f200e95 Removed member 658a31702f200e95 from cluster [root@cloudos01 ~]# /opt/bin/etcdctl member add NODE2 http://10.12.0.22:2380
然后去异常节点上修改配置文件/etc/sysconfig/kube-etcd-cluster
将ETCD_INITIAL_CLUSTER_STATE=new,修改为ETCD_INITIAL_CLUSTER_STATE=existing并启动etcd
[root@cloudos02 ~]# sed -ri '/ETCD_INITIAL_CLUSTER_STATE/s#new#existing#' /etc/sysconfig/kube-etcd-cluster [root@cloudos02 ~]# systemctl start etcd2
查看集群成员状态
[root@cloudos02 ~]# /opt/bin/etcdctl cluster-health member 9bd4565552fd93c is healthy: got healthy result from http://10.12.0.21:2379 member d1a9f9229366f9b8 is healthy: got healthy result from http://10.12.0.23:2379 member f95341f81eb9322c is healthy: got healthy result from http://10.12.0.22:2379 cluster is healthy
然后去异常节点上修改配置文件/etc/sysconfig/kube-etcd-cluster
将ETCD_INITIAL_CLUSTER_STATE=existing改回new
[root@cloudos02 ~]# sed -ri '/ETCD_INITIAL_CLUSTER_STATE/s#existing#new#' /etc/sysconfig/kube-etcd-cluster
后面启动相关服务节点完全正常
以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网
猜你喜欢:- Redis源码解析:集群手动故障转移、从节点迁移详解
- xml创建节点(根节点、子节点)
- 故障公告:Linux 内核故障导致网站宕机近 1 个小时
- Vultr VPS 节点选择方法 | 各节点延迟一览
- 1.19 JQuery2:节点插入与节点选取
- POC分布式节点算法机制下的超级节点计划
本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。
Parsing Techniques
Dick Grune、Ceriel J.H. Jacobs / Springer / 2010-2-12 / USD 109.00
This second edition of Grune and Jacobs' brilliant work presents new developments and discoveries that have been made in the field. Parsing, also referred to as syntax analysis, has been and continues......一起来看看 《Parsing Techniques》 这本书的介绍吧!