istio源码 – pilot-agent 源码分析(原创)

栏目: 后端 · 发布时间: 5年前

内容简介:本文分析基于 Istio 1.1 版本,但是日志或者流程是基于 1.0.5 版本。pilot 的代码仓库位于

本文分析基于 Istio 1.1 版本,但是日志或者流程是基于 1.0.5 版本。

整体架构

pilot 的代码仓库位于 pilot repo ,当前主要实现了 3 个命令:

  1. pilot-agent 充当 Proxy 节点上与 API-Server 和 proxy 的桥梁,负责生成 envoy 初始配置文件和管理envoy 生命周期;
  2. pilot-discoveryproxy 提供集群地址服务发现服务;
  3. sidecar-injector 自动注入提供的 Webhook 服务;

上图为旧版本的结构图,与新版本可能有差异,但是整体架构类似

在 istio-proxy 容器 pilot-agent 启动的命令行参数如下:

$ ps -ef -www
istio-p+     1     0  0 Jan14 ?        00:01:00 /usr/local/bin/pilot-agent proxy sidecar --configPath /etc/istio/proxy --binaryPath /usr/local/bin/envoy --serviceCluster helloworld --drainDuration 45s --parentShutdownDuration 1m0s --discoveryAddress istio-pilot.istio-system:15007 --discoveryRefreshDelay 1s --zipkinAddress zipkin.istio-system:9411 --connectTimeout 10s --proxyAdminPort 15000 --controlPlaneAuthPolicy NONE

istio-p+   608     1  0 Jan15 ?        03:31:24 /usr/local/bin/envoy -c /etc/istio/proxy/envoy-rev13.json --restart-epoch 13 --drain-time-s 45 --parent-shutdown-time-s 60 --service-cluster helloworld --service-node sidecar~10.128.5.4~helloworld-v1-8f8dd85-cfz42.default~default.svc.cluster.local --max-obj-name-len 189 --allow-unknown-fields -l warn --v2-config-only

其中 envoy 进程为 pilot-agent 的子进程,配置文件的初始化和启动监护由 pilot-agent 来管理。

pilot-agent 命令行

# /usr/local/bin/pilot-agent --help
Istio Pilot agent runs in the side car or gateway container and bootstraps envoy.

Usage:
  pilot-agent [command]

Available Commands:
  help        Help about any command
  proxy       Envoy proxy agent
  request     Makes an HTTP request to the Envoy admin API
  version     Prints out build version information

Flags:
  -h, --help                          help for pilot-agent
      --log_as_json                   Whether to format output as JSON or in plain console-friendly format
      --log_caller string             Comma-separated list of scopes for which to include caller information, scopes can be any of [default, model]
      --log_output_level string       Comma-separated minimum per-scope logging level of messages to output, in the form of <scope>:<level>,<scope>:<level>,... where scope can be one of [default, model] and level can be one of [debug, info, warn, error, none] (default "default:info")
      --log_rotate string             The path for the optional rotating log file
      --log_rotate_max_age int        The maximum age in days of a log file beyond which the file is rotated (0 indicates no limit) (default 30)
      --log_rotate_max_backups int    The maximum number of log file backups to keep before older files are deleted (0 indicates no limit) (default 1000)
      --log_rotate_max_size int       The maximum size in megabytes of a log file beyond which the file is rotated (default 104857600)
      --log_stacktrace_level string   Comma-separated minimum per-scope logging level at which stack traces are captured, in the form of <scope>:<level>,<scope:level>,... where scope can be one of [default, model] and level can be one of [debug, info, warn, error, none] (default "default:none")
      --log_target stringArray        The set of paths where to output the log. This can be any path as well as the special values stdout and stderr (default [stdout])

Use "pilot-agent [command] --help" for more information about a command.

关于子命令 proxy 的帮助内容如下:(省略了公共的参数)

$ /usr/local/bin/pilot-agent proxy --help
Envoy proxy agent

Usage:
  pilot-agent proxy [flags]

Flags:
      --applicationPorts stringSlice      Ports exposed by the application. Used to determine that Envoy is configured and ready to receive traffic.
      --availabilityZone string           Availability zone
      --binaryPath string                 Path to the proxy binary (default "/usr/local/bin/envoy")
      --bootstrapv2                       Use bootstrap v2 - DEPRECATED (default true)
      --concurrency int                   number of worker threads to run
      --configPath string                 Path to the generated configuration file directory (default "/etc/istio/proxy")
      --connectTimeout duration           Connection timeout used by Envoy for supporting services (default 1s)
      --controlPlaneAuthPolicy string     Control Plane Authentication Policy (default "NONE")
      --customConfigFile string           Path to the custom configuration file
      --disableInternalTelemetry          Disable internal telemetry
      --discoveryAddress string           Address of the discovery service exposing xDS (e.g. istio-pilot:8080) (default "istio-pilot:15007")
      --discoveryRefreshDelay duration    Polling interval for service discovery (used by EDS, CDS, LDS, but not RDS) (default 1s)
      --domain string                     DNS domain suffix. If not provided uses ${POD_NAMESPACE}.svc.cluster.local
      --drainDuration duration            The time in seconds that Envoy will drain connections during a hot restart (default 2s)
  -h, --help                              help for proxy
      --id string                         Proxy unique ID. If not provided uses ${POD_NAME}.${POD_NAMESPACE} from environment variables
      --ip string                         Proxy IP address. If not provided uses ${INSTANCE_IP} environment variable.
      --parentShutdownDuration duration   The time in seconds that Envoy will wait before shutting down the parent process during a hot restart (default 3s)
      --proxyAdminPort uint16             Port on which Envoy should listen for administrative commands (default 15000)
      --proxyLogLevel string              The log level used to start the Envoy proxy (choose from {trace, debug, info, warn, err, critical, off}) (default "warn")
      --serviceCluster string             Service cluster (default "istio-proxy")
      --serviceregistry string            Select the platform for service registry, options are {Kubernetes, Consul, CloudFoundry, Mock, Config} (default "Kubernetes")
      --statsdUdpAddress string           IP Address and Port of a statsd UDP listener (e.g. 10.75.241.127:9125)
      --statusPort uint16                 HTTP Port on which to serve pilot agent status. If zero, agent status will not be provided.
      --templateFile string               Go template bootstrap config
      --zipkinAddress string              Address of the Zipkin service (e.g. zipkin:9411)

如果启动方式为 /usr/local/bin/pilot-agent proxy 则后面可以选的启动类型分为:

  • pilot-agent proxy sidecar
  • pilot-agent proxy router, 用在 istio-gateway 方式下

    # ingressgateway 或者 egressgateway
    $ ps -ef -www
    UID        PID  PPID  C STIME TTY          TIME CMD
    root         1     0  0 Jan15 ?        00:00:52 /usr/local/bin/pilot-agent proxy router -v 2 --discoveryRefreshDelay 1s --drainDuration 45s --parentShutdownDuration 1m0s --connectTimeout 10s --serviceCluster istio-ingressgateway --zipkinAddress zipkin:9411 --proxyAdminPort 15000 --controlPlaneAuthPolicy NONE --discoveryAddress istio-pilot:8080
    root        64     1  0 Jan15 ?        03:37:59 /usr/local/bin/envoy -c /etc/istio/proxy/envoy-rev1.json --restart-epoch 1 --drain-time-s 45 --parent-shutdown-time-s 60 --service-cluster istio-ingressgateway --service-node router~10.128.45.4~istio-ingressgateway-78c6d8b8d7-sxvpx.istio-system~istio-system.svc.cluster.local --max-obj-name-len 189 --allow-unknown-fields -l warn --v2-config-only
  • pilot-agent proxy ingress,仅用于 ingress 模式下

pilot-agent 代码流程分析

pilot-agent 需要监视相关的证书。

  • sidecar模式,监视 /etc/certs/ 目录下的 cert-chain.pem/key.pem/root-cert.pem 三个文件;
  • ingress 模式,监控 /etc/istio/ingress-certs/ 目录下的 tls.crt/tls.key 两个文件;

在 pilot-agent 启动的初始日志里面,会打印出来当前的配置和监控的证书,如下:

$ # kubectl logs helloworld-v1-8f8dd85-cfz42 -c istio-proxy |more
2019-01-14T06:35:21.726068Z info    Version root@6f6ea1061f2b-docker.io/istio-1.0.5-c1707e45e71c75d74bf3a5dec8c7086f32f32fad-Clean
2019-01-14T06:35:21.726179Z info    Proxy role: model.Proxy{ClusterID:"", Type:"sidecar", IPAddress:"10.128.5.4", ID:"helloworld-v1-8f8dd8
5-cfz42.default", Domain:"default.svc.cluster.local", Metadata:map[string]string(nil)}
2019-01-14T06:35:21.726865Z info    Effective config: binaryPath: /usr/local/bin/envoy
configPath: /etc/istio/proxy
connectTimeout: 10s
discoveryAddress: istio-pilot.istio-system:15007
discoveryRefreshDelay: 1s
drainDuration: 45s
parentShutdownDuration: 60s
proxyAdminPort: 15000
serviceCluster: helloworld
zipkinAddress: zipkin.istio-system:9411

2019-01-14T06:35:21.726902Z info    Monitored certs: []envoy.CertSource{envoy.CertSource{Directory:"/etc/certs/", Files:[]string{"cert-cha
in.pem", "key.pem", "root-cert.pem"}}}
2019-01-14T06:35:21.727115Z info    Starting proxy agent
2019-01-14T06:35:21.728269Z info    Received new config, resetting budget
2019-01-14T06:35:21.728587Z info    Reconciling configuration (budget 10)
2019-01-14T06:35:21.728626Z info    Epoch 0 starting
2019-01-14T06:35:21.729334Z info    Envoy command: [-c /etc/istio/proxy/envoy-rev0.json --restart-epoch 0 --drain-time-s 45 --parent-shutd
own-time-s 60 --service-cluster helloworld --service-node sidecar~10.128.5.4~helloworld-v1-8f8dd85-cfz42.default~default.svc.cluster.local --m
ax-obj-name-len 189 --allow-unknown-fields -l warn --v2-config-only]

本文主要分析 proxy sidecar 方式下的主要逻辑:

istio源码 – pilot-agent 源码分析(原创)

Envoy hot restart

主函数入口:

istio.io/istio/pilot/cmd/pilot-agent/main.go

func main() {
    if err := rootCmd.Execute(); err != nil {
        log.Errora(err)
        os.Exit(-1)
    }
}

由于使用的命令行为 pilot-agent proxy sidecar ,最终调用的子命令的函数入口

proxyCmd = &cobra.Command{
        Use:   "proxy",
        Short: "Envoy proxy agent",
        RunE: func(c *cobra.Command, args []string) error {
             // ...
            // 用于设置默认配置文件的默认配置相关参数
            proxyConfig := model.DefaultProxyConfig()

            // set all flags
            proxyConfig.CustomConfigFile = customConfigFile
            proxyConfig.ConfigPath = configPath
            proxyConfig.BinaryPath = binaryPath
            proxyConfig.ServiceCluster = serviceCluster
            proxyConfig.DrainDuration = types.DurationProto(drainDuration)
            proxyConfig.ParentShutdownDuration = types.DurationProto(parentShutdownDuration)
            proxyConfig.DiscoveryAddress = discoveryAddress
            proxyConfig.ConnectTimeout = types.DurationProto(connectTimeout)
            proxyConfig.StatsdUdpAddress = statsdUDPAddress
            proxyConfig.ProxyAdminPort = int32(proxyAdminPort)
            proxyConfig.Concurrency = int32(concurrency)


             // ...
            // 1. 启动 status server
            // If a status port was provided, start handling status probes.
            if statusPort > 0 {
                parsedPorts, err := parseApplicationPorts()
                if err != nil {
                    return err
                }

                statusServer := status.NewServer(status.Config{
                    AdminPort:        proxyAdminPort,
                    StatusPort:       statusPort,
                    ApplicationPorts: parsedPorts,
                })
                go statusServer.Run(ctx)
            }

            // 初始化 envoyProxy 对象
            envoyProxy := envoy.NewProxy(proxyConfig, role.ServiceNode(), proxyLogLevel, pilotSAN, role.IPAddresses)

            agent := proxy.NewAgent(envoyProxy, proxy.DefaultRetry)
            watcher := envoy.NewWatcher(certs, agent.ConfigCh())

            // 2. 启动 agent 
            go agent.Run(ctx)

            // 3. 启动 watcher
            go watcher.Run(ctx)

            // 4. 主 goroutine 等待信号量
            stop := make(chan struct{})
            cmd.WaitSignal(stop)
            <-stop
            return nil
        }
    }

status server

如果 statusPort 端口进行了设置,则会启动 statusServer。

  • 对于 ready 检查,调用的路径为 /healthz/ready , 并配合设置的端口 applicationPorts 通过 envoy 的 admin 端口进行对应的端口进行检查,用于决定 envoy 是否已经 ready 接受相对应的流量。
 --statusPort uint16 HTTP Port on which to serve pilot agent status. If zero, agent status will not be provided.
 --applicationPorts stringSlice Ports exposed by the application. Used to determine that Envoy is configured and ready to receive traffic. 

检查原理是通过本地管理端口,如 http://127.0.0.1:15000/listeners 获取 envoy 当前监听的全部端口,然后将配置的端口 applicationPorts 在监听的端口中进行查找,来决定 envoy 是否 ready。

  • 应用端口检查

    检查的路径为 /url 路径,在 header 中设置 istio-app-probe-port 端口,使用 访问路径中的 url 来进行检查,最终调用的是 http://127.0.0.1:istio-app-probe-port/url ,头部设置的全部参数也都会传递到别检测的服务端口上;

agent

agent 的函数入口代码位于 istio.io/istio/pilot/pkg/proxy/agent.go 文件中。

interface 定义:

type Agent interface {
    // ConfigCh returns the config channel used to send configuration updates.
    // Agent compares the current active configuration to the desired state and
    // initiates a restart if necessary. If the restart fails, the agent attempts
    // to retry with an exponential back-off.
    ConfigCh() chan<- interface{}

    // Run starts the agent control loop and awaits for a signal on the input
    // channel to exit the loop.
    Run(ctx context.Context)
}

agent 结构定义为:

type agent struct {
    // proxy commands
    proxy Proxy

    // 记录 envoy 重启的各种参数,包括 time、budget、MaxRetries 和 InitialInterval 间隔
    // 对于异常退出的进程,一般来说抢救 10 次,中间使用退步算法,如果 10 次仍然不好,
    // 则会退出 proxy 容器,启动新的容器
    // retry configuration
    retry Retry

    // 期望使用的配置文件,当前为对应证书的 sha256 的值
    // desired configuration state
    desiredConfig interface{}

    // 用来保存全部对应的 epoch 对应的证书 sha256 的值
    // active epochs and their configurations
    epochs map[int]interface{}

    // 当前使用的配置文件,为对应证书的 sha256 的值
    // current configuration is the highest epoch configuration
    currentConfig interface{}

    // 读取从 watcher 监护到证书变化的 channel
    // channel for posting desired configurations
    configCh chan interface{}

    // 用于监护管理 Envoy 的 channel
    // channel for proxy exit notifications
    statusCh chan exitStatus

    // 记录 epoch 对应的  abortCh channel,当前最大为10个,最大允许10个正在重启中的 proxy 
    // channel for aborting running instances
    abortCh map[int]chan error
}

agent 接口体中外部的控制主要是通过 channel 来实现的:

  • configCh 用于接受到是否有配置文件发生变化,当前主要是有 watcher goroutine 来监视相关的证书,如果证书发生了变化或者定时(当前为10s),configCh 就会节后到 watcher 发送的 sha256 摘要值;
  • statusCh 用于管理启动 envoy 后的状态通道,用于监视 envoy 进程的状态;

  • proxy 对象则是实现了对于 envoy 管理的主要工作,在 proxyCmd 的函数中初始化:

    envoyProxy := envoy.NewProxy(proxyConfig, role.ServiceNode(), proxyLogLevel, pilotSAN, role.IPAddresses)

其中 Proxy 为接口定义如下:

// Proxy defines command interface for a proxy
type Proxy interface {
    // Run command for a config, epoch, and abort channel
    Run(interface{}, int, <-chan error) error

    // Cleanup command for an epoch
    Cleanup(int)

    // Panic command is invoked with the desired config when all retries to
    // start the proxy fail just before the agent terminating
    Panic(interface{})
}

agent 的主入口函数为:

istio.io/istio/pilot/pkg/proxy/agent.go

func (a *agent) Run(ctx context.Context) {
    log.Info("Starting proxy agent")

    // Throttle processing up to smoothed 1 qps with bursts up to 10 qps.
    // High QPS is needed to process messages on all channels.
    rateLimiter := rate.NewLimiter(1, 10)

    var reconcileTimer *time.Timer
    for {
        err := rateLimiter.Wait(ctx)
        if err != nil {
            a.terminate()
            return
        }

        // maximum duration or duration till next restart
        var delay time.Duration = 1<<63 - 1
        if a.retry.restart != nil {
            // 如果设置了下次重启的时间间隔,则 delay 设置为该值
            delay = time.Until(*a.retry.restart)
        }

        // 停止原有的 reconcileTimer, 并设置成当前的 delay 值
        if reconcileTimer != nil {
            reconcileTimer.Stop()
        }
        reconcileTimer = time.NewTimer(delay)

        select {
        //  1. 如果相关的配置发生了变化,如果没有变化则忽略
        case config := <-a.configCh:
            if !reflect.DeepEqual(a.desiredConfig, config) {
                log.Infof("Received new config, resetting budget")
                a.desiredConfig = config

                // reset retry budget if and only if the desired config changes
                // 因为配置发生了变化,把下一次重启时间间隔设置为最大
                a.retry.budget = a.retry.MaxRetries
                a.reconcile()
            }

        // 默认的重试策略值为 
        /*
        DefaultRetry = Retry{
            MaxRetries:      10,
            InitialInterval: 200 * time.Millisecond,
        }*/

        // 2. 如果 proxy-envoy 的状态发生了变化
        case status := <-a.statusCh:
            // delete epoch record and update current config
            // avoid self-aborting on non-abort error
            delete(a.epochs, status.epoch)
            delete(a.abortCh, status.epoch)
            a.currentConfig = a.epochs[a.latestEpoch()]

            // errAbort 为被正常取消情况下的退出,比如 <-ctx.Done() 情况下调用 a.terminate()
            if status.err == errAbort {
                log.Infof("Epoch %d aborted", status.epoch)
            } else if status.err != nil { // 异常情况下的退出
                log.Warnf("Epoch %d terminated with an error: %v", status.epoch, status.err)

                // NOTE: due to Envoy hot restart race conditions, an error from the
                // process requires aggressive non-graceful restarts by killing all
                // existing proxy instances
                //  Envoy热重启竞争条件,进程中的错误需要通过终止所有现有代理实例来进行积极的非正常重启
                a.abortAll()
            } else {
                // 正常情况下的退出
                log.Infof("Epoch %d exited normally", status.epoch)
            }

            // cleanup for the epoch
            // 删除当前 epoch 对应的配置文件
            a.proxy.Cleanup(status.epoch)

            // 设置出错后的重试,由于 proxy 可能已中止,因此当前配置可能已过期。当前配置
            // 将在中止时更改,因此在中止之前重试将不会进行。
            // 如果重新启动的计划尚未安排,需要重新安排相关重启。
            if status.err != nil {
                // skip retrying twice by checking retry restart delay
                // a.retry.restart nil 表示还未安排相关的重启进程
                if a.retry.restart == nil {
                    if a.retry.budget > 0 {
                        delayDuration := a.retry.InitialInterval * (1 << uint(a.retry.MaxRetries-a.retry.budget))
                        restart := time.Now().Add(delayDuration)
                        a.retry.restart = &restart
                        a.retry.budget = a.retry.budget - 1
                        log.Infof("Epoch %d: set retry delay to %v, budget to %d", status.epoch, delayDuration, a.retry.budget)
                    } else {
                        // 耗费了所有的重启次数尝试,仍然不能正常启动,退出容器
                        log.Error("Permanent error: budget exhausted trying to fulfill the desired configuration")
                        a.proxy.Panic(status.epoch)
                        return
                    }
                } else { // 重启已经安排过了
                    log.Debugf("Epoch %d: restart already scheduled", status.epoch)
                }
            }

        // 3. reconcileTimer 时间到了
        case <-reconcileTimer.C:
            a.reconcile()

        case _, more := <-ctx.Done():
            if !more { // 表明被关闭了
                a.terminate()
                return
            }
        }
    }
}

简化一下为:

for {
    // 根据当前的重启策略设置 reconcileTimer 定时器的时间
    select {

    // 接收到的配置如果和当前使用的而配置不相同,则调用, a.reconcile(); 相同则忽略
    case config := <-a.configCh:
        a.reconcile()

   // 检测各种错误值,如果是特定的 errAbort 退出或者 根据退出的各种参数检查判断是预期安排的重启
   // 还是异常退出;同时根据重启的策略设置相关的重启策略,在后续的循环中设置 reconcileTimer 时间
   case status := <-a.statusCh:
        // 非预期错误的错误处理
        // 如果是重启策略失效了,则直接退出当前循环
        // 特定的重启策略内,设置下次重启的时间
        // 特定的重启策略内,设置下次重启的时间

   // 设置的重启时间到达
   case <-reconcileTimer.C:
            a.reconcile()

   // 如果是取消,则全部退出
   case _, more := <-ctx.Done():
      if !more {
            a.terminate()
            return
      }

在配置发生变化或者异常重启设置重启策略后,最终调用的函数为 reconcile

func (a *agent) reconcile() {
    // cancel any scheduled restart
    a.retry.restart = nil

    log.Infof("Reconciling retry (budget %d)", a.retry.budget)

    // check that the config is current
    if reflect.DeepEqual(a.desiredConfig, a.currentConfig) {
        log.Infof("Desired configuration is already applied")
        return
    }

    // discover and increment the latest running epoch
    epoch := a.latestEpoch() + 1
    // buffer aborts to prevent blocking on failing proxy
    abortCh := make(chan error, maxAborts)
    a.epochs[epoch] = a.desiredConfig
    a.abortCh[epoch] = abortCh
    a.currentConfig = a.desiredConfig

    // 最终的调用,会将相关相关结果放到 abortCh channel 中
    go a.runWait(a.desiredConfig, epoch, abortCh)
}

reconcile 设置相关参数后,最终启动一个新的 goroutine 来进行启动最终的程序

// runWait runs the start-up command as a go routine and waits for it to finish
func (a *agent) runWait(config interface{}, epoch int, abortCh <-chan error) {
    log.Infof("Epoch %d starting", epoch)
    err := a.proxy.Run(config, epoch, abortCh)
    a.statusCh <- exitStatus{epoch: epoch, err: err}
}

runWait 函数中,最终调用 a.proxy.Run(config, epoch, abortCh) ,并将其返回的错误值放到 agent 的 statusCh channel 中。

对于 envoy 的启动过程可以通过 proxy.run 来进行总结分析:

envoy 的结构体定义如下:

type envoy struct {
    config    meshconfig.ProxyConfig  // 配置文件
    node      string
    extraArgs []string
    pilotSAN  []string
    opts      map[string]interface{}
    errChan   chan error
    nodeIPs   []string
}

istio.io/istio/pilot/pkg/proxy/envoy/proxy.go

func (e *envoy) Run(config interface{}, epoch int, abort <-chan error) error {
    var fname string
    // Note: the cert checking still works, the generated file is updated if certs are changed.
    // We just don't save the generated file, but use a custom one instead. Pilot will keep
    // monitoring the certs and restart if the content of the certs changes.
    // 1. 如果指定了模板文件,则使用用户指定的,否则则使用默认的
    if len(e.config.CustomConfigFile) > 0 {
        // there is a custom configuration. Don't write our own config - but keep watching the certs.
        fname = e.config.CustomConfigFile
    } else {
        out, err := bootstrap.WriteBootstrap(&e.config, e.node, epoch, e.pilotSAN, e.opts, os.Environ(), e.nodeIPs)
        if err != nil {
            log.Errora("Failed to generate bootstrap config", err)
            os.Exit(1) // Prevent infinite loop attempting to write the file, let k8s/systemd report
            return err
        }
        fname = out
    }

    // spin up a new Envoy process
    args := e.args(fname, epoch)
    log.Infof("Envoy command: %v", args)

    /* #nosec */
    cmd := exec.Command(e.config.BinaryPath, args...)
    cmd.Stdout = os.Stdout
    cmd.Stderr = os.Stderr
    if err := cmd.Start(); err != nil {
        return err
    }

    // Set if the caller is monitoring envoy, for example in tests or if envoy runs in same
    // container with the app.
    if e.errChan != nil {
        // Caller passed a channel, will wait itself for termination
        go func() {
            e.errChan <- cmd.Wait()
        }()
        return nil
    }

    // 通过 done channel 来获取 evnoy 启动的最终状态
    done := make(chan error, 1)
    go func() {
        done <- cmd.Wait()
    }()

    // 等待 abort channel 和 done,用于结束 envoy 和正确返回当前的启动状态
    select {
    case err := <-abort:
        log.Warnf("Aborting epoch %d", epoch)
        if errKill := cmd.Process.Kill(); errKill != nil {
            log.Warnf("killing epoch %d caused an error %v", epoch, errKill)
        }
        return err
    case err := <-done:
        return err
    }
}

函数 bootstrap.WriteBootstrap 用来生成 envoy 使用的初始配置文件,默认的配置模板文件路径为 /var/lib/istio/envoy/envoy_bootstrap_tmpl.json ,该文件也可以通过参数 templateFile 传递进来。

// istio.io/istio/pkg/bootstrap/bootstrap_config.go
DefaultCfgDir     = "/var/lib/istio/envoy/envoy_bootstrap_tmpl.json"

整体内容参见 file-envoy_bootstrap_tmpl-json

envoy_bootstrap_tmpl.json + meshconfig.ProxyConfig + epoch = envoy 初始化文件,文件名格式为 ”envoy-rev%d.json“, 其中 %d 会被替换成 epoch 的值。

在 envoy 默认配置输出成功以后,就接着构造 envoy 启动的参数,主要函数如下:

func (e *envoy) args(fname string, epoch int) []string {
    startupArgs := []string{"-c", fname,
        "--restart-epoch", fmt.Sprint(epoch),
        "--drain-time-s", fmt.Sprint(int(convertDuration(e.config.DrainDuration) / time.Second)),
        "--parent-shutdown-time-s", fmt.Sprint(int(convertDuration(e.config.ParentShutdownDuration) / time.Second)),
        "--service-cluster", e.config.ServiceCluster,
        "--service-node", e.node,
        "--max-obj-name-len", fmt.Sprint(e.config.StatNameLength),
        "--allow-unknown-fields",
    }

    startupArgs = append(startupArgs, e.extraArgs...)

    if e.config.Concurrency > 0 {
        startupArgs = append(startupArgs, "--concurrency", fmt.Sprint(e.config.Concurrency))
    }

    return startupArgs
}

到此为止,envoy 的配置文件和启动命令行已经构造完成,后续就需要采用 exec.Command 命令来进行启动。

至此 agent 对于 envoy 的启动和管理流程结束。

watcher

watcher 的整体逻辑相对比较简单,就是 watch 相关的证书变化,当证书有变化或者定期(10s),向 agent 发送配置当前的 sha256 摘要,agent 如果发现配置已经变化,则会触发 agent 重启 envoy,并增加 epoch 的值。

istio.io/istio/pilot/pkg/proxy/envoy/watcher.go

func (w *watcher) Run(ctx context.Context) {
    // kick start the proxy with partial state (in case there are no notifications coming)
    w.SendConfig()  // 用于向 agent 发送相关的摘要值

    // monitor certificates
    certDirs := make([]string, 0, len(w.certs))
    for _, cert := range w.certs {
        certDirs = append(certDirs, cert.Directory)
    }

    go watchCerts(ctx, certDirs, watchFileEvents, defaultMinDelay, w.SendConfig)

    <-ctx.Done()
}

watchCerts 的函数主体:

// watchCerts watches all certificate directories and calls the provided
// `updateFunc` method when changes are detected. This method is blocking
// so it should be run as a goroutine.
// updateFunc will not be called more than one time per minDelay.
func watchCerts(ctx context.Context, certsDirs []string, watchFileEventsFn watchFileEventsFn,
    minDelay time.Duration, updateFunc func()) {
    fw, err := fsnotify.NewWatcher()
    if err != nil {
        log.Warnf("failed to create a watcher for certificate files: %v", err)
        return
    }
    defer func() {
        if err := fw.Close(); err != nil {
            log.Warnf("closing watcher encounters an error %v", err)
        }
    }()

    // watch all directories
    for _, d := range certsDirs {
        if err := fw.Watch(d); err != nil {
            log.Warnf("watching %s encounters an error %v", d, err)
            return
        }
    }
    watchFileEventsFn(ctx, fw.Event, minDelay, updateFunc)
}

关于监听的相关证书,则来自于注入时从命名空间中 secret 中加载到 pod 中的文件,deployment文件如下:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  creationTimestamp: null
  name: helloworld-v2
spec:
  replicas: 1
  strategy: {}
  template:
    metadata:
      labels:
        app: helloworld
        version: v2
    spec:
      containers:
      - image: istio/examples-helloworld-v2
        imagePullPolicy: IfNotPresent
        name: helloworld
        ports:
        - containerPort: 5000
        resources:
          requests:
            cpu: 100m
      - args:
        - proxy
        - sidecar
        - --configPath
        - /etc/istio/proxy
        - --binaryPath
        - /usr/local/bin/envoy
        - --serviceCluster
        - helloworld
        - --drainDuration
        - 45s
        - --parentShutdownDuration
        - 1m0s
        - --discoveryAddress
        - istio-pilot.istio-system:15007
        - --discoveryRefreshDelay
        - 1s
        - --zipkinAddress
        - zipkin.istio-system:9411
        - --connectTimeout
        - 10s
        - --proxyAdminPort
        - "15000"
        - --controlPlaneAuthPolicy
        - NONE
        env:
        - name: POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: POD_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        - name: INSTANCE_IP
          valueFrom:
            fieldRef:
              fieldPath: status.podIP
        - name: ISTIO_META_POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: ISTIO_META_INTERCEPTION_MODE
          value: REDIRECT
        - name: ISTIO_METAJSON_LABELS
          value: |
            {"app":"helloworld","version":"v2"}
        image: docker.io/istio/proxyv2:1.0.5
        imagePullPolicy: IfNotPresent
        name: istio-proxy
        ports:
        - containerPort: 15090
          name: http-envoy-prom
          protocol: TCP
        resources:
          requests:
            cpu: 10m
        securityContext:
          readOnlyRootFilesystem: true
          runAsUser: 1337
        volumeMounts:
        - mountPath: /etc/istio/proxy
          name: istio-envoy
        - mountPath: /etc/certs/   # 将证书挂载的目录, watcher 会监视该目录
          name: istio-certs
          readOnly: true
      initContainers:
      - args:
        - -p
        - "15001"
        - -u
        - "1337"
        - -m
        - REDIRECT
        - -i
        - '*'
        - -x
        - ""
        - -b
        - "5000"
        - -d
        - ""
        image: docker.io/istio/proxy_init:1.0.5
        imagePullPolicy: IfNotPresent
        name: istio-init
        resources: {}
        securityContext:
          capabilities:
            add:
            - NET_ADMIN
          privileged: true
      volumes:
      - emptyDir:
          medium: Memory
        name: istio-envoy
      - name: istio-certs    # 来自于 istio.default 的 secret
        secret:
          optional: true
          secretName: istio.default

使用 kubectl 命令验证:

# kubectl get secrets istio.default
NAME            TYPE                    DATA   AGE
istio.default   istio.io/key-and-cert   3      17d

# kubectl get secrets istio.default -o yaml
apiVersion: v1
data:
  cert-chain.pem: xxxx
  key.pem: xxxx==
  root-cert.pem: xxxx==
kind: Secret
metadata:
  annotations:
    istio.io/service-account.name: default
  creationTimestamp: 2019-01-15T08:24:26Z
  name: istio.default
  namespace: default
  resourceVersion: "16685160"
  selfLink: /api/v1/namespaces/default/secrets/istio.default
  uid: f863ce6f-189e-11e9-ab53-00163e0c1552
type: istio.io/key-and-cert

证书的生成和管理是由 Citadel 负责的,具体细节可以参见 Keys and Certificates ,可以使用以下命令来检查证书的详细信息

# jq 为命令下处理 json 的工具,参见 
# https://www.ibm.com/developerworks/cn/linux/1612_chengg_jq/index.html
$ sudo yum install -y jq  

$ kubectl get secret -o json istio.default | jq -r '.data["cert-chain.pem"]' | base64 --decode | openssl x509 -noout -text

Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number:
            07:d1:25:13:1c:38:db:ef:89:8e:95:2e:d1:6c:b5:4d
    Signature Algorithm: sha256WithRSAEncryption
        Issuer: O=k8s.cluster.local
        Validity
            Not Before: Jan 15 08:24:26 2019 GMT
            Not After : Apr 15 08:24:26 2019 GMT
        Subject: O=
        Subject Public Key Info:
            Public Key Algorithm: rsaEncryption
                Public-Key: (2048 bit)
                Modulus:
                    00:d1:13:34:58:e7:b0:6e:b5:07:0e:bd:7f:d5:a0:
                    66:d9:4a:2a:6d:ec:bd:26:ab:22:26:31:c7:c9:48:
                    da:57:9a:5a:91:b3:7c:78:2c:c4:8d:14:4b:b1:b4:
                    a4:29:3d:26:d1:ad:8d:6e:6f:b0:27:64:31:93:cf:
                    43:be:f4:04:0a:d2:0f:e6:dc:45:4a:5d:38:65:c0:
                    08:44:25:5f:e8:2d:c3:2a:9a:6b:82:bb:27:81:59:
                    c7:f5:38:66:b1:f2:06:eb:94:46:34:47:c5:b1:9d:
                    01:59:04:e7:8e:df:bf:ed:17:f8:16:06:9d:85:c0:
                    e9:43:0f:3a:a0:b6:b9:64:50:3e:e1:26:e3:03:d4:
                    dc:43:08:ef:de:af:56:5c:d5:6c:c1:72:72:7b:f5:
                    05:f8:09:15:08:2d:f3:5b:c7:57:7d:1a:15:72:90:
                    3e:df:0a:e6:a1:e3:d9:81:9f:bb:9c:f2:c5:da:7f:
                    48:a6:4d:12:f7:5e:e8:21:99:1d:f0:95:d7:c5:1a:
                    34:d5:a7:56:79:4b:dd:82:a1:39:cc:d5:0b:e3:fa:
                    92:04:21:38:89:41:7c:9b:12:aa:c4:5f:93:c1:1d:
                    fc:bc:5a:6d:e5:3d:98:e1:9c:28:38:58:75:c6:e3:
                    27:5f:77:4b:10:b6:53:70:7b:25:fd:5c:44:28:67:
                    54:51
                Exponent: 65537 (0x10001)
        X509v3 extensions:
            X509v3 Key Usage: critical
                Digital Signature, Key Encipherment
            X509v3 Extended Key Usage:
                TLS Web Server Authentication, TLS Web Client Authentication
            X509v3 Basic Constraints: critical
                CA:FALSE
            X509v3 Subject Alternative Name:
                URI:spiffe://cluster.local/ns/default/sa/default
    Signature Algorithm: sha256WithRSAEncryption
         68:fb:87:12:c6:d1:fb:c1:69:fa:ec:2e:30:ea:f7:4d:8f:9c:
         5b:54:a1:f9:a3:5f:ff:83:3f:76:c5:d6:9c:2b:cb:55:6a:66:
         49:b5:a2:bd:dd:71:06:7f:a4:f8:18:cd:0d:7a:3f:bf:4c:56:
         e0:35:5b:68:2d:71:72:e3:a2:7d:b9:90:f3:86:d0:1a:87:f6:
         31:7e:24:db:00:a8:69:df:54:7f:8b:b0:1d:7d:02:03:c0:26:
         1a:87:53:aa:e4:66:76:e9:80:1e:28:61:53:0c:53:c6:1e:80:
         7e:b0:2d:71:75:1b:42:9f:51:f4:5c:7b:53:ee:06:02:31:5d:
         71:2d:b5:4a:dd:58:25:a6:c4:24:ad:19:86:ac:24:87:99:ad:
         6b:be:c8:ae:84:7e:7d:86:0a:1d:44:a0:50:62:2d:8b:d6:79:
         ff:db:43:40:de:3b:ec:4e:2b:80:87:e5:1a:cb:1e:cf:e6:12:
         8e:96:10:46:f7:fa:ed:1c:bb:0b:11:35:41:c8:69:43:64:79:
         44:ea:a8:72:b7:27:2a:0d:a6:39:bb:34:b0:8b:e3:86:ba:3c:
         1e:b9:ee:b5:61:bb:c8:65:b0:8d:bd:a1:9c:29:64:7d:0b:2c:
         f9:9b:34:18:98:38:24:ad:85:b8:1e:59:41:09:1f:2e:a8:6d:
         ef:ee:0d:a8

特别是,Subject Alternative Name 字段应为 URI:spiffe://cluster.local/ns/default/sa/default

参考

  1. istio源码分析之pilot-agent模块分析

以上所述就是小编给大家介绍的《istio源码 – pilot-agent 源码分析(原创)》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

Host Your Web Site In The Cloud

Host Your Web Site In The Cloud

Jeff Barr / SitePoint / 2010-9-28 / USD 39.95

Host Your Web Site On The Cloud is the OFFICIAL step-by-step guide to this revolutionary approach to hosting and managing your websites and applications, authored by Amazon's very own Jeffrey Barr. "H......一起来看看 《Host Your Web Site In The Cloud》 这本书的介绍吧!

JSON 在线解析
JSON 在线解析

在线 JSON 格式化工具

图片转BASE64编码
图片转BASE64编码

在线图片转Base64编码工具

HEX CMYK 转换工具
HEX CMYK 转换工具

HEX CMYK 互转工具