Kubernetes源码分析——controller mananger

栏目: 编程工具 · 发布时间: 5年前

内容简介:建议先看下前文来自入口The Kubernetes controller manager is a daemon that embeds the core control loops shipped with Kubernetes. In applications of robotics and automation, a control loop is a non-terminating loop that regulates the state of the system. In Kubernetes,

简介

    • Controller 与 apiserver 的交互方式
    • 一次调协(Reconcile)

建议先看下前文 Kubernetes源码分析——apiserver

来自入口 cmd/kube-controller-manager/controller-manager.go 的概括

The Kubernetes controller manager is a daemon that embeds the core control loops shipped with Kubernetes. In applications of robotics and automation, a control loop is a non-terminating loop that regulates the state of the system. In Kubernetes, a controller is a control loop that watches the shared state of the cluster through the apiserver and makes changes attempting to move the current state towards the desired state.

那么在分析之初,便会有几个问题

  1. current state 和 desired state 从哪来
  2. 如何加载已有的各种controller
  3. 如何加载自定义controller
  4. 每个controller的存在形态是什么
  5. control loop 的存在形态是什么
  6. 自定义controller 与官方的controller 在实现上有哪些共通点

背景知识

Controller 与 apiserver 的交互方式

Kubernetes源码分析——apiserver 提到Kubernetes CRD的实现,关于Custom Resource Controller的实现有一个很重要的点:Controller 与 apiserver 的交互方式——controller 与 apiserver 交互的部分已经被定好了,只需实现control loop 部分即可。

Kubernetes源码分析——controller mananger

Kubernetes副本管理

参见Kubernetes副本管理

本文以Deployment Controller 为例来描述 Controller Manager的实现原理,因此要预先了解下 Deployment Controller 的实现原理。

以扩展pod 实例数为例, Deployment Controller 的逻辑便是找到 关联的ReplicaSet 并更改其Replicas 的值

Kubernetes object 控制器逻辑 备注
Deployment 控制 ReplicaSet 的数目,以及每个 ReplicaSet 的属性 Deployment 实际上是一个两层控制器
ReplicaSet 保证系统中 Pod 的个数永远等于指定的个数(比如,3 个) 一个应用的版本,对应的正是一个 ReplicaSet

启动

cmd/kube-controller-manager/controller-manager.go

以启动DeploymentController为例

Kubernetes源码分析——controller mananger

可以看到 启动一个goroutine 运行 Run 方法,Run begins watching and syncing.

control loop

Kubernetes找感觉 提到控制器的基本逻辑

for {
  实际状态 := 获取集群中对象 X 的实际状态(Actual State)
  期望状态 := 获取集群中对象 X 的期望状态(Desired State)
  if 实际状态 == 期望状态{
    什么都不做
  } else {
    执行编排动作,将实际状态调整为期望状态
  }
}

那么实际在代码中长什么样子呢?我们先看下run 方法

外围——循环及数据获取

// Run begins watching and syncing.
func (dc *DeploymentController) Run(workers int, stopCh <-chan struct{}) {
	defer utilruntime.HandleCrash()
	defer dc.queue.ShutDown()
	klog.Infof("Starting deployment controller")
	defer klog.Infof("Shutting down deployment controller")
	if !controller.WaitForCacheSync("deployment", stopCh, dc.dListerSynced, dc.rsListerSynced, dc.podListerSynced) {
		return
	}
	for i := 0; i < workers; i++ {
		go wait.Until(dc.worker, time.Second, stopCh)
	}
	<-stopCh
}

重点就是 go wait.Until(dc.worker, time.Second, stopCh) 。for 循环隐藏在 k8s.io/apimachinery/pkg/util/wait/wait.go 工具方法中, func Until(f func(), period time.Duration, stopCh <-chan struct{}) {...} 方法的作用是 Until loops until stop channel is closed, running f every period. 即在stopCh 标记停止之前,每隔 period 执行 一个func,对应到DeploymentController 就是 worker 方法

// worker runs a worker thread that just dequeues items, processes them, and marks them done.
// It enforces that the syncHandler is never invoked concurrently with the same key.
func (dc *DeploymentController) worker() {
	for dc.processNextWorkItem() {
	}
}

func (dc *DeploymentController) processNextWorkItem() bool {
	// 取元素
	key, quit := dc.queue.Get()
	if quit {
		return false
	}
	// 结束前标记元素被处理过
	defer dc.queue.Done(key)
	// 处理元素
	err := dc.syncHandler(key.(string))
	dc.handleErr(err, key)
	return true
}

dc.syncHandler 实际为 DeploymentController 的syncDeployment方法

一次调协(Reconcile)

syncDeployment 包含 扩容、rollback、rolloutRecreate、rolloutRolling 我们裁剪部分代码,以最简单的 扩容为例

// syncDeployment will sync the deployment with the given key.
func (dc *DeploymentController) syncDeployment(key string) error {
	namespace, name, err := cache.SplitMetaNamespaceKey(key)
	deployment, err := dc.dLister.Deployments(namespace).Get(name)
	// List ReplicaSets owned by this Deployment, while reconciling ControllerRef
	// through adoption/orphaning.
	rsList, err := dc.getReplicaSetsForDeployment(d)
	scalingEvent, err := dc.isScalingEvent(d, rsList)
	if scalingEvent {
		return dc.sync(d, rsList)
	}
	...
}

// sync is responsible for reconciling deployments on scaling events or when they
// are paused.
func (dc *DeploymentController) sync(d *apps.Deployment, rsList []*apps.ReplicaSet) error {
	newRS, oldRSs, err := dc.getAllReplicaSetsAndSyncRevision(d, rsList, false)
	...
	dc.scale(d, newRS, oldRSs);
	...
	allRSs := append(oldRSs, newRS)
	return dc.syncDeploymentStatus(allRSs, newRS, d)
}

scale要处理 扩容或 RollingUpdate 各种情况,此处只保留扩容逻辑。

func (dc *DeploymentController) scale(deployment *apps.Deployment, newRS *apps.ReplicaSet, oldRSs []*apps.ReplicaSet) error {
	// If there is only one active replica set then we should scale that up to the full count of the
	// deployment. If there is no active replica set, then we should scale up the newest replica set.
	if activeOrLatest := deploymentutil.FindActiveOrLatest(newRS, oldRSs); activeOrLatest != nil {
		if *(activeOrLatest.Spec.Replicas) == *(deployment.Spec.Replicas) {
			return nil
		}
		_, _, err := dc.scaleReplicaSetAndRecordEvent(activeOrLatest, *(deployment.Spec.Replicas), deployment)
		return err
	}
	...
}

func (dc *DeploymentController) scaleReplicaSetAndRecordEvent(rs *apps.ReplicaSet, newScale int32, deployment *apps.Deployment) (bool, *apps.ReplicaSet, error) {
	// No need to scale
	if *(rs.Spec.Replicas) == newScale {
		return false, rs, nil
	}
	var scalingOperation string
	if *(rs.Spec.Replicas) < newScale {
		scalingOperation = "up"
	} else {
		scalingOperation = "down"
	}
	scaled, newRS, err := dc.scaleReplicaSet(rs, newScale, deployment, scalingOperation)
	return scaled, newRS, err
}

func (dc *DeploymentController) scaleReplicaSet(rs *apps.ReplicaSet, newScale int32, deployment *apps.Deployment, scalingOperation string) (bool, *apps.ReplicaSet, error) {
	sizeNeedsUpdate := *(rs.Spec.Replicas) != newScale
	annotationsNeedUpdate := ...
	scaled := false
	var err error
	if sizeNeedsUpdate || annotationsNeedUpdate {
		rsCopy := rs.DeepCopy()
		*(rsCopy.Spec.Replicas) = newScale
		deploymentutil.SetReplicasAnnotations...
		// 调用api 接口更新 对应ReplicaSet 的数据
		rs, err = dc.client.AppsV1().ReplicaSets(rsCopy.Namespace).Update(rsCopy)
		...
	}
	return scaled, rs, err
}

调用api 接口更新Deployment 对象本身的数据

// syncDeploymentStatus checks if the status is up-to-date and sync it if necessary
func (dc *DeploymentController) syncDeploymentStatus(allRSs []*apps.ReplicaSet, newRS *apps.ReplicaSet, d *apps.Deployment) error {
	newStatus := calculateStatus(allRSs, newRS, d)
	if reflect.DeepEqual(d.Status, newStatus) {
		return nil
	}
	newDeployment := d
	newDeployment.Status = newStatus
	_, err := dc.client.AppsV1().Deployments(newDeployment.Namespace).UpdateStatus(newDeployment)
	return err
}

以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

硅谷之火

硅谷之火

保罗·弗赖伯格、迈克尔·斯韦因 / 张华伟 编译 / 中国华侨出版社 / 2014-11-1 / CNY 39.80

《硅谷之火:人与计算机的未来》以生动的故事,介绍了计算机爱好者以怎样的创新精神和不懈的努力,将计算机技术的力量包装在一个小巧玲珑的机壳里,实现了个人拥有计算机的梦想。同时以独特的视角讲述了苹果、微软、太阳微系统、网景、莲花以及甲骨文等公司的创业者们在实现个人计算机梦想的过程中创业的艰辛、守业的艰难、失败的痛苦,在激烈竞争的环境中奋斗的精神以及在技术上不断前进的历程。一起来看看 《硅谷之火》 这本书的介绍吧!

在线进制转换器
在线进制转换器

各进制数互转换器

RGB HSV 转换
RGB HSV 转换

RGB HSV 互转工具

HEX CMYK 转换工具
HEX CMYK 转换工具

HEX CMYK 互转工具