一次kube-controller-manager的bug导致的线上无法调度处理过程

栏目: Redis · 发布时间: 5年前

内容简介:线上的k8s环境普遍版本很低,今天同事心血来潮去一个新节点上去看了下发现根分区满了,清理了后发有一个pod没有创建出来。清理后把因为文件系统满了而down掉的k8s相关进程起来后发现kubelet的log一直报错cannot remove /var/lib/kubelet/xxxxconfig/key resource busy now 啥的。查看了下describe对应rc还没有任何Event在对应的node上看了下发现没有rabbitmq的容器,只有一个pause的容器,但是状态是Dead并且很多这样的

线上的k8s环境普遍版本很低,今天同事心血来潮去一个新节点上去看了下发现根分区满了,清理了后发有一个pod没有创建出来。清理后把因为文件系统满了而down掉的k8s相关进程起来后发现kubelet的log一直报错cannot remove /var/lib/kubelet/xxxxconfig/key resource busy now 啥的。查看了下describe对应rc还没有任何Event

$ kubectl describe rc rabbit3rc 
Name:		rabbit3rc
Namespace:	default
Selector:	app=rabbitmq-cluster,node=rabbit3
Labels:		app=rabbitmq-cluster
		node=rabbit3
Annotations:	<none>
Replicas:	1 current / 1 desired
Pods Status:	0 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
  Labels:	app=rabbitmq-cluster
		node=rabbit3
  Containers:
   rabbit3:
    Image:	cloud-base/rabbitmq-3.6.5:E3103-PUB-20181015-RC1
    Ports:	4369/TCP, 5672/TCP, 15672/TCP, 25672/TCP
    Limits:
      cpu:	16
      memory:	8Gi
    Requests:
      cpu:	400m
      memory:	500Mi
    Liveness:	exec [health_check.sh] delay=600s timeout=10s period=15s #success=1 #failure=3
    Environment:
      RABBITMQ_DEFAULT_USER:	xxx
      RABBITMQ_DEFAULT_PASS:	xxx
      RABBITMQ_ERLANG_COOKIE:	xxx
    Mounts:
      /etc/localtime from time (rw)
      /var/lib/rabbitmq from rabbitmqvar (rw)
  Volumes:
   time:
    Type:	HostPath (bare host directory volume)
    Path:	/etc/localtime
   rabbitmqvar:
    Type:	HostPath (bare host directory volume)
    Path:	/opt/cloud/rabbitmq
Events:		<none>

在对应的node上看了下发现没有rabbitmq的容器,只有一个pause的容器,但是状态是Dead并且很多这样的而且删不掉。然后重启了下 docker 后虽然Dead的没了但是等被拉起来后对应的rabiitmq和它的pause容器都没有。怀疑调度有问题。

这个低版本集群的HA这块儿方案一直很迷,从进程存活来判断leader的机器后用systemctl查看上面的kube-controller-manager发现如下信息

Failed to update lock: Operation cannot be fulfilled on endpoints "kube-controller-manager": the object has been modified; please apply your changes to the latest version and try again

其他节点的controller认为它是leader但是它一直报错无法获取选举锁,然后复制systemd启动参数手动启动调整loglevel为8发现信息为如下

I0322 20:26:36.966931   34218 round_trippers.go:395] PUT https://100.68.24.2:6443/api/v1/namespaces/kube-system/endpoints/kube-controller-manager
I0322 20:26:36.966938   34218 round_trippers.go:402] Request Headers:
I0322 20:26:36.966944   34218 round_trippers.go:405]     Accept: application/vnd.kubernetes.protobuf, */*
I0322 20:26:36.966951   34218 round_trippers.go:405]     Content-Type: application/vnd.kubernetes.protobuf
I0322 20:26:36.966956   34218 round_trippers.go:405]     User-Agent: kube-controller-manager/v1.6.7+095136c3078cc (linux/amd64) kubernetes/095136c/leader-election
I0322 20:26:36.967726   34218 round_trippers.go:420] Response Status: 409 Conflict in 0 milliseconds
I0322 20:26:36.967738   34218 round_trippers.go:423] Response Headers:
I0322 20:26:36.967744   34218 round_trippers.go:426]     Content-Type: application/vnd.kubernetes.protobuf
I0322 20:26:36.967749   34218 round_trippers.go:426]     Content-Length: 259
I0322 20:26:36.967754   34218 round_trippers.go:426]     Date: Fri, 22 Mar 2019 12:26:36 GMT
I0322 20:26:36.967888   34218 request.go:988] Response Body:
00000000  6b 38 73 00 0a 0c 0a 02  76 31 12 06 53 74 61 74  |k8s.....v1..Stat|
00000010  75 73 12 ea 01 0a 04 0a  00 12 00 12 07 46 61 69  |us...........Fai|
00000020  6c 75 72 65 1a a1 01 4f  70 65 72 61 74 69 6f 6e  |lure...Operation|
00000030  20 63 61 6e 6e 6f 74 20  62 65 20 66 75 6c 66 69  | cannot be fulfi|
00000040  6c 6c 65 64 20 6f 6e 20  65 6e 64 70 6f 69 6e 74  |lled on endpoint|
00000050  73 20 22 6b 75 62 65 2d  63 6f 6e 74 72 6f 6c 6c  |s "kube-controll|
00000060  65 72 2d 6d 61 6e 61 67  65 72 22 3a 20 74 68 65  |er-manager": the|
00000070  20 6f 62 6a 65 63 74 20  68 61 73 20 62 65 65 6e  | object has been|
00000080  20 6d 6f 64 69 66 69 65  64 3b 20 70 6c 65 61 73  | modified; pleas|
00000090  65 20 61 70 70 6c 79 20  79 6f 75 72 20 63 68 61  |e apply your cha|
000000a0  6e 67 65 73 20 74 6f 20  74 68 65 20 6c 61 74 65  |nges to the late|
000000b0  73 74 20 76 65 72 73 69  6f 6e 20 61 6e 64 20 74  |st version and t|
000000c0  72 79 20 61 67 61 69 6e  22 08 43 6f 6e 66 6c 69  |ry again".Confli|
000000d0  63 74 2a 28 0a 17 6b 75  62 65 2d 63 6f 6e 74 72  |ct*(..kube-contr|
000000e0  6f 6c 6c 65 72 2d 6d 61  6e 61 67 65 72 12 00 1a  |oller-manager...|
000000f0  09 65 6e 64 70 6f 69 6e  74 73 28 00 30 99 03 1a  |.endpoints(.0...|
00000100  00 22 00                                          |.".|
E0322 20:26:36.967960   34218 leaderelection.go:263] Failed to update lock: Operation cannot be fulfilled on endpoints "kube-controller-manager": the object has been modified; please apply your changes to the latest version and try again
I0322 20:26:36.967971   34218 leaderelection.go:185] failed to acquire lease kube-system/kube-controller-manager
^C

去官方代码仓库准备搜下代码看看选举逻辑是如何获得锁的,结果根本看不懂,于是想着看看有没有人研究过选举逻辑啥的结果。按照关键词搜到了一个国外文章 http://gogosatellite.blogspot.com/2017/07/how-to-setup-high-availability.html ,里面一行输出日志给了我排查方向

I0607 11:04:32.485502   17291 leaderelection.go:248] lock is held by kuberm and has not yet expired
I0607 11:04:32.485506   17291 leaderelection.go:185] failed to acquire lease kube-system/kube-controller-manager
I0607 11:04:36.263032   17291 round_trippers.go:417] GET http://172.16.155.165:8080/api/v1/namespaces/kube-system/endpoints/kube-controller-manager 200 OK in 1 milliseconds
I0607 11:04:36.263122   17291 leaderelection.go:248] lock is held by kuberm and has not yet expired
I0607 11:04:36.263125   17291 leaderelection.go:185] failed to acquire lease kube-system/kube-controller-manager

猜测到是所有kube-controller-manager请求apiserver然后竞争这个ep来获得锁,用kubectl请求了下这个ep发现如下信息并且一直不会改变, annotations 的里的 holderIdentity 字段是当前leader的node

$ kubectl get --raw /api/v1/namespaces/kube-system/endpoints/kube-controller-manager | jq .
{
  "kind": "Endpoints",
  "apiVersion": "v1",
  "metadata": {
    "name": "kube-controller-manager",
    "namespace": "kube-system",
    "selfLink": "/api/v1/namespaces/kube-system/endpoints/kube-controller-manager",
    "uid": "87e9ff0a-388b-11e9-949b-0cda411d3f00",
    "resourceVersion": "36217274",
    "creationTimestamp": "2019-02-24T23:25:54Z",
    "annotations": {
      "control-plane.alpha.kubernetes.io/leader": "{\"holderIdentity\":\"xxxxx{nodename}xxxxxx\",\"leaseDurationSeconds\":15,\"acquireTime\":\"2019-02-24T23:25:54Z\",\"renewTime\":\"2019-03-17T11:20:08Z\",\"leaderTransitions\":0}"
    }
  },
  "subsets": []
}
$ kubectl get --raw /api/v1/namespaces/kube-system/endpoints/kube-controller-manager | jq .
{
  "kind": "Endpoints",
  "apiVersion": "v1",
  "metadata": {
    "name": "kube-controller-manager",
    "namespace": "kube-system",
    "selfLink": "/api/v1/namespaces/kube-system/endpoints/kube-controller-manager",
    "uid": "87e9ff0a-388b-11e9-949b-0cda411d3f00",
    "resourceVersion": "36217274",
    "creationTimestamp": "2019-02-24T23:25:54Z",
    "annotations": {
      "control-plane.alpha.kubernetes.io/leader": "{\"holderIdentity\":\"Xxxxxxx-S02\",\"leaseDurationSeconds\":15,\"acquireTime\":\"2019-02-24T23:25:54Z\",\"renewTime\":\"2019-03-17T11:20:08Z\",\"leaderTransitions\":0}"
    }
  },
  "subsets": []
}

在我自己搭建的新集群上查看了下发现每时每刻 resourceVersion 一直在改变,而对比上面线上的却没有任何变化

[root@k8s-m1 Kubernetes-ansible]# kubectl get --raw /api/v1/namespaces/kube-system/endpoints/kube-controller-manager | jq .
{
  "kind": "Endpoints",
  "apiVersion": "v1",
  "metadata": {
    "name": "kube-controller-manager",
    "namespace": "kube-system",
    "selfLink": "/api/v1/namespaces/kube-system/endpoints/kube-controller-manager",
    "uid": "0915773e-4c4d-11e9-a0b8-fa163e4edb6a",
    "resourceVersion": "52752",
    "creationTimestamp": "2019-03-22T02:48:56Z",
    "annotations": {
      "control-plane.alpha.kubernetes.io/leader": "{\"holderIdentity\":\"k8s-m1_00dbe494-4c4d-11e9-a89f-fa163ed10d54\",\"leaseDurationSeconds\":15,\"aransitions\":1}"
    }
  }
}
[root@k8s-m1 Kubernetes-ansible]# kubectl get --raw /api/v1/namespaces/kube-system/endpoints/kube-controller-manager | jq .
{
  "kind": "Endpoints",
  "apiVersion": "v1",
  "metadata": {
    "name": "kube-controller-manager",
    "namespace": "kube-system",
    "selfLink": "/api/v1/namespaces/kube-system/endpoints/kube-controller-manager",
    "uid": "0915773e-4c4d-11e9-a0b8-fa163e4edb6a",
    "resourceVersion": "52772",
    "creationTimestamp": "2019-03-22T02:48:56Z",
    "annotations": {
      "control-plane.alpha.kubernetes.io/leader": "{\"holderIdentity\":\"k8s-m1_00dbe494-4c4d-11e9-a89f-fa163ed10d54\",\"leaseDurationSeconds\":15,\"aransitions\":1}"
    }
  }
}

为了确定字段 holderIdentity 是leader,便尝试通过关闭 kube-controller-manager 看看字段会不会变,发现真的改变并且会有新的ep又出现了(注意看 creationTimestamp 实际上不是生成)

[root@k8s-m1 Kubernetes-ansible]# systemctl stop kube-controller-manager.service 
[root@k8s-m1 Kubernetes-ansible]# kubectl get --raw /api/v1/namespaces/kube-system/endpoints/kube-controller-manager | jq .
{
  "kind": "Endpoints",
  "apiVersion": "v1",
  "metadata": {
    "name": "kube-controller-manager",
    "namespace": "kube-system",
    "selfLink": "/api/v1/namespaces/kube-system/endpoints/kube-controller-manager",
    "uid": "0915773e-4c4d-11e9-a0b8-fa163e4edb6a",
    "resourceVersion": "52819",
    "creationTimestamp": "2019-03-22T02:48:56Z",
    "annotations": {
      "control-plane.alpha.kubernetes.io/leader": "{\"holderIdentity\":\"k8s-m1_00dbe494-4c4d-11e9-a89f-fa163ed10d54\",\"leaseDurationSeconds\":15,\"aransitions\":1}"
    }
  }
}
[root@k8s-m1 Kubernetes-ansible]# kubectl get --raw /api/v1/namespaces/kube-system/endpoints/kube-controller-manager | jq .
{
  "kind": "Endpoints",
  "apiVersion": "v1",
  "metadata": {
    "name": "kube-controller-manager",
    "namespace": "kube-system",
    "selfLink": "/api/v1/namespaces/kube-system/endpoints/kube-controller-manager",
    "uid": "0915773e-4c4d-11e9-a0b8-fa163e4edb6a",
    "resourceVersion": "52819",
    "creationTimestamp": "2019-03-22T02:48:56Z",
    "annotations": {
      "control-plane.alpha.kubernetes.io/leader": "{\"holderIdentity\":\"k8s-m1_00dbe494-4c4d-11e9-a89f-fa163ed10d54\",\"leaseDurationSeconds\":15,\"aransitions\":1}"
    }
  }
}

基本就确定了某些原因(可能文件系统慢导致的也可能其他),于是生产环境上删除ep来踢出无法修改的leader信息,发现虽然还是他自己获取到leader,但是 resourceVersion 开始刷新了,而controller的日志里也没报错了

$ kubectl -n kube-system delete ep kube-controller-manager
endpoints "kube-controller-manager" deleted
$  kubectl get --raw /api/v1/namespaces/kube-system/endpoints/kube-controller-manager | jq .
{
  "kind": "Endpoints",
  "apiVersion": "v1",
  "metadata": {
    "name": "kube-controller-manager",
    "namespace": "kube-system",
    "selfLink": "/api/v1/namespaces/kube-system/endpoints/kube-controller-manager",
    "uid": "dec669dd-4c9f-11e9-949b-0cda411d3f00",
    "resourceVersion": "37542637",
    "creationTimestamp": "2019-03-22T12:41:53Z",
    "annotations": {
      "control-plane.alpha.kubernetes.io/leader": "{\"holderIdentity\":\"Xxxxxxx-S02\",\"leaseDurationSeconds\":15,\"acquireTime\":\"2019-03-22T12:41:53Z\",\"renewTime\":\"2019-03-22T12:41:53Z\",\"leaderTransitions\":0}"
    }
  },
  "subsets": []
}
$ kubectl get --raw /api/v1/namespaces/kube-system/endpoints/kube-controller-manager | jq .
{
  "kind": "Endpoints",
  "apiVersion": "v1",
  "metadata": {
    "name": "kube-controller-manager",
    "namespace": "kube-system",
    "selfLink": "/api/v1/namespaces/kube-system/endpoints/kube-controller-manager",
    "uid": "dec669dd-4c9f-11e9-949b-0cda411d3f00",
    "resourceVersion": "37542785",
    "creationTimestamp": "2019-03-22T12:41:53Z",
    "annotations": {
      "control-plane.alpha.kubernetes.io/leader": "{\"holderIdentity\":\"Xxxxxxx-S02\",\"leaseDurationSeconds\":15,\"acquireTime\":\"2019-03-22T12:41:53Z\",\"renewTime\":\"2019-03-22T12:41:59Z\",\"leaderTransitions\":0}"
    }
  },
  "subsets": []

查看查看rc描述和对应的pod都起来了

kubectl describe rc rabbit3rc
Name:		rabbit3rc
Namespace:	default
Selector:	app=rabbitmq-cluster,node=rabbit3
Labels:		app=rabbitmq-cluster
		node=rabbit3
Annotations:	<none>
Replicas:	1 current / 1 desired
Pods Status:	1 Running / 0 Waiting / 0 Succeeded / 0 Failed
...

以上所述就是小编给大家介绍的《一次kube-controller-manager的bug导致的线上无法调度处理过程》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

网易一千零一夜

网易一千零一夜

网易杭研项目管理部 / 电子工业出版社 / 2016-9-1 / 46

本书是网易杭州研究院项目管理部多年来丰富的项目管理实践总结与干货分享。字字句句凝结了网易项目经理的甘与苦、汗与泪。 全书围绕项目管理体系,从敏捷实践、项目立项、需求管理、沟通管理,到计划进度管理、风险管理,真实反映了网易面向互联网产品项目管理实战经验与心路历程。 不论你是项目管理新手,还是资深项目经理,都可以从本书中获得启发与借鉴。一起来看看 《网易一千零一夜》 这本书的介绍吧!

图片转BASE64编码
图片转BASE64编码

在线图片转Base64编码工具

RGB HSV 转换
RGB HSV 转换

RGB HSV 互转工具

RGB CMYK 转换工具
RGB CMYK 转换工具

RGB CMYK 互转工具