一次kube-controller-manager的bug导致的线上无法调度处理过程

栏目: Redis · 发布时间: 5年前

内容简介:线上的k8s环境普遍版本很低,今天同事心血来潮去一个新节点上去看了下发现根分区满了,清理了后发有一个pod没有创建出来。清理后把因为文件系统满了而down掉的k8s相关进程起来后发现kubelet的log一直报错cannot remove /var/lib/kubelet/xxxxconfig/key resource busy now 啥的。查看了下describe对应rc还没有任何Event在对应的node上看了下发现没有rabbitmq的容器,只有一个pause的容器,但是状态是Dead并且很多这样的

线上的k8s环境普遍版本很低,今天同事心血来潮去一个新节点上去看了下发现根分区满了,清理了后发有一个pod没有创建出来。清理后把因为文件系统满了而down掉的k8s相关进程起来后发现kubelet的log一直报错cannot remove /var/lib/kubelet/xxxxconfig/key resource busy now 啥的。查看了下describe对应rc还没有任何Event

$ kubectl describe rc rabbit3rc 
Name:		rabbit3rc
Namespace:	default
Selector:	app=rabbitmq-cluster,node=rabbit3
Labels:		app=rabbitmq-cluster
		node=rabbit3
Annotations:	<none>
Replicas:	1 current / 1 desired
Pods Status:	0 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
  Labels:	app=rabbitmq-cluster
		node=rabbit3
  Containers:
   rabbit3:
    Image:	cloud-base/rabbitmq-3.6.5:E3103-PUB-20181015-RC1
    Ports:	4369/TCP, 5672/TCP, 15672/TCP, 25672/TCP
    Limits:
      cpu:	16
      memory:	8Gi
    Requests:
      cpu:	400m
      memory:	500Mi
    Liveness:	exec [health_check.sh] delay=600s timeout=10s period=15s #success=1 #failure=3
    Environment:
      RABBITMQ_DEFAULT_USER:	xxx
      RABBITMQ_DEFAULT_PASS:	xxx
      RABBITMQ_ERLANG_COOKIE:	xxx
    Mounts:
      /etc/localtime from time (rw)
      /var/lib/rabbitmq from rabbitmqvar (rw)
  Volumes:
   time:
    Type:	HostPath (bare host directory volume)
    Path:	/etc/localtime
   rabbitmqvar:
    Type:	HostPath (bare host directory volume)
    Path:	/opt/cloud/rabbitmq
Events:		<none>

在对应的node上看了下发现没有rabbitmq的容器,只有一个pause的容器,但是状态是Dead并且很多这样的而且删不掉。然后重启了下 docker 后虽然Dead的没了但是等被拉起来后对应的rabiitmq和它的pause容器都没有。怀疑调度有问题。

这个低版本集群的HA这块儿方案一直很迷,从进程存活来判断leader的机器后用systemctl查看上面的kube-controller-manager发现如下信息

Failed to update lock: Operation cannot be fulfilled on endpoints "kube-controller-manager": the object has been modified; please apply your changes to the latest version and try again

其他节点的controller认为它是leader但是它一直报错无法获取选举锁,然后复制systemd启动参数手动启动调整loglevel为8发现信息为如下

I0322 20:26:36.966931   34218 round_trippers.go:395] PUT https://100.68.24.2:6443/api/v1/namespaces/kube-system/endpoints/kube-controller-manager
I0322 20:26:36.966938   34218 round_trippers.go:402] Request Headers:
I0322 20:26:36.966944   34218 round_trippers.go:405]     Accept: application/vnd.kubernetes.protobuf, */*
I0322 20:26:36.966951   34218 round_trippers.go:405]     Content-Type: application/vnd.kubernetes.protobuf
I0322 20:26:36.966956   34218 round_trippers.go:405]     User-Agent: kube-controller-manager/v1.6.7+095136c3078cc (linux/amd64) kubernetes/095136c/leader-election
I0322 20:26:36.967726   34218 round_trippers.go:420] Response Status: 409 Conflict in 0 milliseconds
I0322 20:26:36.967738   34218 round_trippers.go:423] Response Headers:
I0322 20:26:36.967744   34218 round_trippers.go:426]     Content-Type: application/vnd.kubernetes.protobuf
I0322 20:26:36.967749   34218 round_trippers.go:426]     Content-Length: 259
I0322 20:26:36.967754   34218 round_trippers.go:426]     Date: Fri, 22 Mar 2019 12:26:36 GMT
I0322 20:26:36.967888   34218 request.go:988] Response Body:
00000000  6b 38 73 00 0a 0c 0a 02  76 31 12 06 53 74 61 74  |k8s.....v1..Stat|
00000010  75 73 12 ea 01 0a 04 0a  00 12 00 12 07 46 61 69  |us...........Fai|
00000020  6c 75 72 65 1a a1 01 4f  70 65 72 61 74 69 6f 6e  |lure...Operation|
00000030  20 63 61 6e 6e 6f 74 20  62 65 20 66 75 6c 66 69  | cannot be fulfi|
00000040  6c 6c 65 64 20 6f 6e 20  65 6e 64 70 6f 69 6e 74  |lled on endpoint|
00000050  73 20 22 6b 75 62 65 2d  63 6f 6e 74 72 6f 6c 6c  |s "kube-controll|
00000060  65 72 2d 6d 61 6e 61 67  65 72 22 3a 20 74 68 65  |er-manager": the|
00000070  20 6f 62 6a 65 63 74 20  68 61 73 20 62 65 65 6e  | object has been|
00000080  20 6d 6f 64 69 66 69 65  64 3b 20 70 6c 65 61 73  | modified; pleas|
00000090  65 20 61 70 70 6c 79 20  79 6f 75 72 20 63 68 61  |e apply your cha|
000000a0  6e 67 65 73 20 74 6f 20  74 68 65 20 6c 61 74 65  |nges to the late|
000000b0  73 74 20 76 65 72 73 69  6f 6e 20 61 6e 64 20 74  |st version and t|
000000c0  72 79 20 61 67 61 69 6e  22 08 43 6f 6e 66 6c 69  |ry again".Confli|
000000d0  63 74 2a 28 0a 17 6b 75  62 65 2d 63 6f 6e 74 72  |ct*(..kube-contr|
000000e0  6f 6c 6c 65 72 2d 6d 61  6e 61 67 65 72 12 00 1a  |oller-manager...|
000000f0  09 65 6e 64 70 6f 69 6e  74 73 28 00 30 99 03 1a  |.endpoints(.0...|
00000100  00 22 00                                          |.".|
E0322 20:26:36.967960   34218 leaderelection.go:263] Failed to update lock: Operation cannot be fulfilled on endpoints "kube-controller-manager": the object has been modified; please apply your changes to the latest version and try again
I0322 20:26:36.967971   34218 leaderelection.go:185] failed to acquire lease kube-system/kube-controller-manager
^C

去官方代码仓库准备搜下代码看看选举逻辑是如何获得锁的,结果根本看不懂,于是想着看看有没有人研究过选举逻辑啥的结果。按照关键词搜到了一个国外文章 http://gogosatellite.blogspot.com/2017/07/how-to-setup-high-availability.html ,里面一行输出日志给了我排查方向

I0607 11:04:32.485502   17291 leaderelection.go:248] lock is held by kuberm and has not yet expired
I0607 11:04:32.485506   17291 leaderelection.go:185] failed to acquire lease kube-system/kube-controller-manager
I0607 11:04:36.263032   17291 round_trippers.go:417] GET http://172.16.155.165:8080/api/v1/namespaces/kube-system/endpoints/kube-controller-manager 200 OK in 1 milliseconds
I0607 11:04:36.263122   17291 leaderelection.go:248] lock is held by kuberm and has not yet expired
I0607 11:04:36.263125   17291 leaderelection.go:185] failed to acquire lease kube-system/kube-controller-manager

猜测到是所有kube-controller-manager请求apiserver然后竞争这个ep来获得锁,用kubectl请求了下这个ep发现如下信息并且一直不会改变, annotations 的里的 holderIdentity 字段是当前leader的node

$ kubectl get --raw /api/v1/namespaces/kube-system/endpoints/kube-controller-manager | jq .
{
  "kind": "Endpoints",
  "apiVersion": "v1",
  "metadata": {
    "name": "kube-controller-manager",
    "namespace": "kube-system",
    "selfLink": "/api/v1/namespaces/kube-system/endpoints/kube-controller-manager",
    "uid": "87e9ff0a-388b-11e9-949b-0cda411d3f00",
    "resourceVersion": "36217274",
    "creationTimestamp": "2019-02-24T23:25:54Z",
    "annotations": {
      "control-plane.alpha.kubernetes.io/leader": "{\"holderIdentity\":\"xxxxx{nodename}xxxxxx\",\"leaseDurationSeconds\":15,\"acquireTime\":\"2019-02-24T23:25:54Z\",\"renewTime\":\"2019-03-17T11:20:08Z\",\"leaderTransitions\":0}"
    }
  },
  "subsets": []
}
$ kubectl get --raw /api/v1/namespaces/kube-system/endpoints/kube-controller-manager | jq .
{
  "kind": "Endpoints",
  "apiVersion": "v1",
  "metadata": {
    "name": "kube-controller-manager",
    "namespace": "kube-system",
    "selfLink": "/api/v1/namespaces/kube-system/endpoints/kube-controller-manager",
    "uid": "87e9ff0a-388b-11e9-949b-0cda411d3f00",
    "resourceVersion": "36217274",
    "creationTimestamp": "2019-02-24T23:25:54Z",
    "annotations": {
      "control-plane.alpha.kubernetes.io/leader": "{\"holderIdentity\":\"Xxxxxxx-S02\",\"leaseDurationSeconds\":15,\"acquireTime\":\"2019-02-24T23:25:54Z\",\"renewTime\":\"2019-03-17T11:20:08Z\",\"leaderTransitions\":0}"
    }
  },
  "subsets": []
}

在我自己搭建的新集群上查看了下发现每时每刻 resourceVersion 一直在改变,而对比上面线上的却没有任何变化

[root@k8s-m1 Kubernetes-ansible]# kubectl get --raw /api/v1/namespaces/kube-system/endpoints/kube-controller-manager | jq .
{
  "kind": "Endpoints",
  "apiVersion": "v1",
  "metadata": {
    "name": "kube-controller-manager",
    "namespace": "kube-system",
    "selfLink": "/api/v1/namespaces/kube-system/endpoints/kube-controller-manager",
    "uid": "0915773e-4c4d-11e9-a0b8-fa163e4edb6a",
    "resourceVersion": "52752",
    "creationTimestamp": "2019-03-22T02:48:56Z",
    "annotations": {
      "control-plane.alpha.kubernetes.io/leader": "{\"holderIdentity\":\"k8s-m1_00dbe494-4c4d-11e9-a89f-fa163ed10d54\",\"leaseDurationSeconds\":15,\"aransitions\":1}"
    }
  }
}
[root@k8s-m1 Kubernetes-ansible]# kubectl get --raw /api/v1/namespaces/kube-system/endpoints/kube-controller-manager | jq .
{
  "kind": "Endpoints",
  "apiVersion": "v1",
  "metadata": {
    "name": "kube-controller-manager",
    "namespace": "kube-system",
    "selfLink": "/api/v1/namespaces/kube-system/endpoints/kube-controller-manager",
    "uid": "0915773e-4c4d-11e9-a0b8-fa163e4edb6a",
    "resourceVersion": "52772",
    "creationTimestamp": "2019-03-22T02:48:56Z",
    "annotations": {
      "control-plane.alpha.kubernetes.io/leader": "{\"holderIdentity\":\"k8s-m1_00dbe494-4c4d-11e9-a89f-fa163ed10d54\",\"leaseDurationSeconds\":15,\"aransitions\":1}"
    }
  }
}

为了确定字段 holderIdentity 是leader,便尝试通过关闭 kube-controller-manager 看看字段会不会变,发现真的改变并且会有新的ep又出现了(注意看 creationTimestamp 实际上不是生成)

[root@k8s-m1 Kubernetes-ansible]# systemctl stop kube-controller-manager.service 
[root@k8s-m1 Kubernetes-ansible]# kubectl get --raw /api/v1/namespaces/kube-system/endpoints/kube-controller-manager | jq .
{
  "kind": "Endpoints",
  "apiVersion": "v1",
  "metadata": {
    "name": "kube-controller-manager",
    "namespace": "kube-system",
    "selfLink": "/api/v1/namespaces/kube-system/endpoints/kube-controller-manager",
    "uid": "0915773e-4c4d-11e9-a0b8-fa163e4edb6a",
    "resourceVersion": "52819",
    "creationTimestamp": "2019-03-22T02:48:56Z",
    "annotations": {
      "control-plane.alpha.kubernetes.io/leader": "{\"holderIdentity\":\"k8s-m1_00dbe494-4c4d-11e9-a89f-fa163ed10d54\",\"leaseDurationSeconds\":15,\"aransitions\":1}"
    }
  }
}
[root@k8s-m1 Kubernetes-ansible]# kubectl get --raw /api/v1/namespaces/kube-system/endpoints/kube-controller-manager | jq .
{
  "kind": "Endpoints",
  "apiVersion": "v1",
  "metadata": {
    "name": "kube-controller-manager",
    "namespace": "kube-system",
    "selfLink": "/api/v1/namespaces/kube-system/endpoints/kube-controller-manager",
    "uid": "0915773e-4c4d-11e9-a0b8-fa163e4edb6a",
    "resourceVersion": "52819",
    "creationTimestamp": "2019-03-22T02:48:56Z",
    "annotations": {
      "control-plane.alpha.kubernetes.io/leader": "{\"holderIdentity\":\"k8s-m1_00dbe494-4c4d-11e9-a89f-fa163ed10d54\",\"leaseDurationSeconds\":15,\"aransitions\":1}"
    }
  }
}

基本就确定了某些原因(可能文件系统慢导致的也可能其他),于是生产环境上删除ep来踢出无法修改的leader信息,发现虽然还是他自己获取到leader,但是 resourceVersion 开始刷新了,而controller的日志里也没报错了

$ kubectl -n kube-system delete ep kube-controller-manager
endpoints "kube-controller-manager" deleted
$  kubectl get --raw /api/v1/namespaces/kube-system/endpoints/kube-controller-manager | jq .
{
  "kind": "Endpoints",
  "apiVersion": "v1",
  "metadata": {
    "name": "kube-controller-manager",
    "namespace": "kube-system",
    "selfLink": "/api/v1/namespaces/kube-system/endpoints/kube-controller-manager",
    "uid": "dec669dd-4c9f-11e9-949b-0cda411d3f00",
    "resourceVersion": "37542637",
    "creationTimestamp": "2019-03-22T12:41:53Z",
    "annotations": {
      "control-plane.alpha.kubernetes.io/leader": "{\"holderIdentity\":\"Xxxxxxx-S02\",\"leaseDurationSeconds\":15,\"acquireTime\":\"2019-03-22T12:41:53Z\",\"renewTime\":\"2019-03-22T12:41:53Z\",\"leaderTransitions\":0}"
    }
  },
  "subsets": []
}
$ kubectl get --raw /api/v1/namespaces/kube-system/endpoints/kube-controller-manager | jq .
{
  "kind": "Endpoints",
  "apiVersion": "v1",
  "metadata": {
    "name": "kube-controller-manager",
    "namespace": "kube-system",
    "selfLink": "/api/v1/namespaces/kube-system/endpoints/kube-controller-manager",
    "uid": "dec669dd-4c9f-11e9-949b-0cda411d3f00",
    "resourceVersion": "37542785",
    "creationTimestamp": "2019-03-22T12:41:53Z",
    "annotations": {
      "control-plane.alpha.kubernetes.io/leader": "{\"holderIdentity\":\"Xxxxxxx-S02\",\"leaseDurationSeconds\":15,\"acquireTime\":\"2019-03-22T12:41:53Z\",\"renewTime\":\"2019-03-22T12:41:59Z\",\"leaderTransitions\":0}"
    }
  },
  "subsets": []

查看查看rc描述和对应的pod都起来了

kubectl describe rc rabbit3rc
Name:		rabbit3rc
Namespace:	default
Selector:	app=rabbitmq-cluster,node=rabbit3
Labels:		app=rabbitmq-cluster
		node=rabbit3
Annotations:	<none>
Replicas:	1 current / 1 desired
Pods Status:	1 Running / 0 Waiting / 0 Succeeded / 0 Failed
...

以上所述就是小编给大家介绍的《一次kube-controller-manager的bug导致的线上无法调度处理过程》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

产品型社群

产品型社群

李善友 / 机械工业出版社 / 2015-3-1 / CNY 69.00

传统模式企业正在直面一场空前的“降维战争”, 结局惨烈,或生或死。 传统模式很难避免悲惨下场, 诺基亚等昔日庞然大物轰然倒塌, 柯达发明了数码成像技术却依然破产, 新商业的兴起到底遵循的是什么模式? 微信轻而易举干掉了运营商的短信业务, “好未来”为何让传统教育不明觉厉? 花间堂为什么不是酒店,而是入口? 将来不会有互联网企业与传统企业之分, ......一起来看看 《产品型社群》 这本书的介绍吧!

Base64 编码/解码
Base64 编码/解码

Base64 编码/解码

Markdown 在线编辑器
Markdown 在线编辑器

Markdown 在线编辑器

HEX HSV 转换工具
HEX HSV 转换工具

HEX HSV 互换工具