Kubernetes Node Local DNS Cache

栏目: IT技术 · 发布时间: 4年前

内容简介：I was playing around with my home Kubernetes cluster and decided to try outI definitely recommend going thru Node Local DNS cache setup, if you are interested in Kubernetes internals. Especially, If you are studying forFirst, let’s take a look at how typic

Overview

I was playing around with my home Kubernetes cluster and decided to try out Node Local DNS Cache . It’s a really cool piece of software, that helps with DNS load by caching most of responses on the node local DNS and solves Linux conntrack races, which cause intermittent delays of 5s for some DNS requests. You can read more about the contrack issue in DNS intermittent delays of 5s issue & racy conntrack and dns lookup timeouts .

I definitely recommend going thru Node Local DNS cache setup, if you are interested in Kubernetes internals. Especially, If you are studying for Certified Kubernetes Administrator certification , or you are involved in Kuberenetes operations.

First, let’s take a look at how typically DNS works in Kubernetes. By default, Pods in Kubernetes send requests to a kube-dns service. Internally this is done by setting pod’s /etc/resolv.conf nameserver field to kube-dns Service IP ( 10.96.0.10 by default). kube-proxy manages those Service IPs and translates them to a kube-dns endpoint via iptables or IPVS technologies. With Node Local DNS Cache, pods reach out to the new Node Local DNS Cache, without any connection tracking.If the request is in the pod’s cache, good we can return response directly. Otherwise, Node Local DNS Cache reaches out to kube-dns with TCP requests, thus avoiding conntrack bug.

You can read more about Node Local DNS in Node Local DNS cache and Graduate NodeLocal DNSCache to beta Kubernetes enhancement proposals. Let’s take a look at how to deploy it.

Deployment

I recommend that you go thru Using NodeLocal DNSCache in Kubernetes clusters page and take a look at generated manifests.

In order to deploy Node Local DNS, you will have to select link local IP address. Link local IP addresses are a special class of IP addresses in the range of 169.254.0.1 to 169.254.255.254 . Importantly, routers do not forward packets with link local addresses, as there is no guarantees that this range is unique beyond it’s network segment. The example deployment uses 169.254.20.10 as a default local link adress.

Deep Dive

Let’s go deeper in how Node Local DNS works. You can find the code in https://github.com/kubernetes/dns repository in cmd/node-cache directory. Node Local DNS uses CoreDNS under the hood, with some additional functionality for setting up networking, templating configs & reloading. In this section, I will go thru details step by step. I’ll assume that you are running kube-proxy in iptables mode, kube-dns Cluster IP is 10.96.0.10 and you have chosen 169.254.20.10 as a link local IP.

Networking

Firstly, Node Local DNS creates a network interface. It executes these Linux commands:

ip link add nodelocaldns
ip addr add 169.254.20.10 dev nodelocaldns
ip addr add 10.96.0.10 dev nodelocaldns

I believe this is needed to make the pod listen on those IPs, as it’s configured with hostNetwork: true . You can see the network interface by issuing ip addr on the node and looking for a link named nodelocaldns . It should give an output similar to this:

128: nodelocaldns: mtu 1500 qdisc noop state DOWN group default
link/ether 46:e7:18:86:15:23 brd ff:ff:ff:ff:ff:ff
inet 169.254.20.10/32 brd 169.254.20.10 scope global nodelocaldns
valid_lft forever preferred_lft forever
inet 10.96.0.10/32 brd 10.96.0.10 scope global nodelocaldns
valid_lft forever preferred_lft forever

Secondly, Node Local DNS will sync iptables rules. You can see those rules by executing iptables-save , look for 10.96.0.10 / 169.254.20.10 IP related rules. This is what I see on one of my nodes:

*raw

-A PREROUTING -d 10.96.0.10/32 -p udp -m udp --dport 53 -j NOTRACK                                                                      
-A PREROUTING -d 10.96.0.10/32 -p tcp -m tcp --dport 53 -j NOTRACK                                                                      
-A PREROUTING -d 169.254.20.10/32 -p udp -m udp --dport 53 -j NOTRACK                                                                   -A PREROUTING -d 169.254.20.10/32 -p tcp -m tcp --dport 53 -j NOTRACK                                                                   
-A OUTPUT -s 10.96.0.10/32 -p tcp -m tcp --sport 8080 -j NOTRACK                                                                        
-A OUTPUT -d 10.96.0.10/32 -p tcp -m tcp --dport 8080 -j NOTRACK                                                                        -A OUTPUT -d 10.96.0.10/32 -p udp -m udp --dport 53 -j NOTRACK                                                                          
-A OUTPUT -d 10.96.0.10/32 -p tcp -m tcp --dport 53 -j NOTRACK                                                                          
-A OUTPUT -s 10.96.0.10/32 -p udp -m udp --sport 53 -j NOTRACK                                                                          
-A OUTPUT -s 10.96.0.10/32 -p tcp -m tcp --sport 53 -j NOTRACK                                                                          
-A OUTPUT -s 169.254.20.10/32 -p tcp -m tcp --sport 8080 -j NOTRACK                                                                     
-A OUTPUT -d 169.254.20.10/32 -p tcp -m tcp --dport 8080 -j NOTRACK                                                                     -A OUTPUT -d 169.254.20.10/32 -p udp -m udp --dport 53 -j NOTRACK                                                                       
-A OUTPUT -d 169.254.20.10/32 -p tcp -m tcp --dport 53 -j NOTRACK                                                                       
-A OUTPUT -s 169.254.20.10/32 -p udp -m udp --sport 53 -j NOTRACK                                                                       
-A OUTPUT -s 169.254.20.10/32 -p tcp -m tcp --sport 53 -j NOTRACK                                                                                                                                 
...
*filter
...
-A INPUT -d 10.96.0.10/32 -p udp -m udp --dport 53 -j ACCEPT
-A INPUT -d 10.96.0.10/32 -p tcp -m tcp --dport 53 -j ACCEPT
-A INPUT -d 169.254.20.10/32 -p udp -m udp --dport 53 -j ACCEPT
-A INPUT -d 169.254.20.10/32 -p tcp -m tcp --dport 53 -j ACCEPT
...
-A OUTPUT -s 10.96.0.10/32 -p udp -m udp --sport 53 -j ACCEPT
-A OUTPUT -s 10.96.0.10/32 -p tcp -m tcp --sport 53 -j ACCEPT
-A OUTPUT -s 169.254.20.10/32 -p udp -m udp --sport 53 -j ACCEPT
-A OUTPUT -s 169.254.20.10/32 -p tcp -m tcp --sport 53 -j ACCEPT

Node Local DNS adds these rules right before any kube-proxy rules, so that they are evaluated first. -j NOTRACK makes TCP & UDP packets sent to the local or kube-dns IP address to be untracked by conntrack. This is how Node Local DNS cache avoids conntrack bug and takes over the packets to send to kube-dns IP.

That’s all from the networking configuration. Now let’s move on to CoreDNS configuration.

CoreDNS configuration

The deployments adds node-local-dns configmap. This config map is in Corefile format (except for special variables), which is a config format for CoreDNS . I really like the simplicity and readability of this format. This is how my deployment’s Corefile looks:

cluster.local:53 {
        errors
        cache {
                success 9984 30
                denial 9984 5
        }
        reload
        loop
        bind 169.254.20.10 10.96.0.10
        forward . __PILLAR__CLUSTER__DNS__ {
                force_tcp
        }
        prometheus :9253
        health 169.254.20.10:8080
        }
    in-addr.arpa:53 {
        errors
        cache 30
        reload
        loop
        bind 169.254.20.10 10.96.0.10
        forward . __PILLAR__CLUSTER__DNS__ {
                force_tcp
        }
        prometheus :9253
        }
    ip6.arpa:53 {
        errors
        cache 30
        reload
        loop
        bind 169.254.20.10 10.96.0.10
        forward . __PILLAR__CLUSTER__DNS__ {
                force_tcp
        }
        prometheus :9253
        }
    .:53 {
        errors
        cache 30
        reload
        loop
        bind 169.254.20.10 10.96.0.10
        forward . __PILLAR__UPSTREAM__SERVERS__ {
                force_tcp
        }
        prometheus :9253
        }

The important bit in this configuration is the cluster.local zone, which handles all Kubernetes cluster local DNS names. In this zone you can see that we configured caching, with 9984 record limit and 30/5 seconds time to live. If request is isn’t in the cache, it will use TCP (which avoids conntrack bug) to send the request to __PILLAR__CLUSTER__DNS .

This creates an interesting problem, how do we send DNS traffic to kube-dns ? We can’t use it’s IP as the traffic will go to Node Local DNS pod itself due to networking changes it did. The way Node Local DNS solves this is it adds new Service for existing kube-dns deployment. This is how the service looks:

apiVersion: v1
kind: Service
metadata:
  name: kube-dns-upstream
  namespace: kube-system
  labels:
    k8s-app: kube-dns
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: Reconcile
    kubernetes.io/name: "KubeDNSUpstream"
spec:
  ports:
  - name: dns
    port: 53
    protocol: UDP
    targetPort: 53
  - name: dns-tcp
    port: 53
    protocol: TCP
    targetPort: 53
  selector:
    k8s-app: kube-dns

This service creates a new Cluster IP, which can be used to contact kube-dns pod and skip Node Local DNS iptables rules. The __PILLAR__CLUSTER__DNS is a special template variable, which Node Local DNS will template out. This is done by parsing KUBE_DNS_UPSTREAM_SERVICE_HOST environment variable, which is generated on pod startup if enableServiceLinks is enabled (by default it is). Keep in mind, if you recreate the service, you will have to restart the Node Local DNS pods.

In CoreDNS config, we also have .:53 zone, which handles the case if resolution request isn’t for service running inside Kubernetes. We cache the request and forward to __PILLAR__UPSTREAM__SERVERS upstream DNS server. Node Local DNS looks up __PILLAR__UPSTREAM__SERVERS value from kube-dns configmap. The example deployment doesn’t have it set, so it defaults to /etc/resolv.conf . Note that Node Local DNS uses dnsPolicy: Default , which makes /etc/resolv.conf to be same as what is on the node.

That is it for the deep dive. Now let’s take a look operations side of things.

Operations

In this section, let’s take a look at high availability & monitoring setup. Currently Node Local DNS has a limited high availability setting. Let’s take a look.

High Availability

The availability of this setup currently depends on Node Local DNS agent signal handling. During regular pod termination, the pod will remove dummy interface & iptables rules. This switches traffic from itself to a regular kube-dns pod running in the cluster. But if we forcefully kill the pod, removal of iptable rules won’t happen. This then breaks all the DNS traffic for that particular node, as nothing is listening on nodelocaldns interface.

To avoid unnecessary trouble, the default DaemonSet doesn’t have Memory/CPU limits set. I suggest that you leave it that way, to avoid out of memory terminations or any CPU throttling. Additionally the pods are marked with system-node-critical priority, which makes them almost guaranteed to be scheduled first and less likely to be evicted, when the node is out of resources.

If you want to be on the safe side, I recommend that you change DaemonSet update strategy to OnDelete and do maintenance/upgrade by draining Kubernetes nodes and deleting Node Local DNS pods one by one after the successful drain. It’s really easy to automate it.

Monitoring

It’s also important to have good monitoring in place. If you are running Prometheus or a monitoring system which supports Prometheus metrics format, I highly recommend you to scrape metrics on 9253 and 9353 ports. 9253 port has regular CoreDNS metrics, which contain the usual RED metrics: queries, durations & error counts. 9353 port has a coredns_nodecache_setup_errors / coredns_nodecache_setup_errors_total metric, which increases when there are errors syncing IP table rules or setting up/reloading configuration.

Additionally, I have built a Node Local DNS mixin , which has Prometheus Alerts. You can get them via jb install github.com/povilasv/nodelocaldns-mixin if you are using jsonnet to generate your Prometheus alerts. If not, you can grab an example from here , and tinker it for your Kubernetes/Prometheus setup. I plan to add CoreDNS dashboard and SLA / Error budget alerts, so stay tuned

Another good way to do monitoring, is adding black-box monitoring into the mix. One simple thing you can do is constantly send a DNS request and expect a timely response. Node Problem Detector can implement this. You can find an example Node Problem Detector plugin in this uSwitch’s open source repository .

Conclusion

That’s it. Hopefully by now you now how Node Local DNS magic works and you can deploy it and operate it reliably. Please keep in mind that Node Local DNS is currently in Beta .

Have some free time? Consider applying for Certified Kubernetes Administrator certification . I am a Certified Kubernetes Administrator myself and I think it’s worth it, you will learn a lot of Kubernetes & become kubectl ninja.

Thanks for reading, hope you enjoyed it & see you next time!

以上所述就是小编给大家介绍的《Kubernetes Node Local DNS Cache》，希望对大家有所帮助，如果大家有任何疑问请给我留言，小编会及时回复大家的。在此也非常感谢大家对码农网的支持！

查看所有标签

猜你喜欢:

Kubernetes Node Local DNS Cache

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

自品牌

[美] 丹·斯柯伯尔（Dan Schawbel） / 佘卓桓 / 湖南文艺出版社 / 2016-1-1 / 39.80元

什么是自品牌？如何利用新媒体推广自己？如何放大自己的职业优势？细化到如何巩固“弱联系”人脉？如何在团队里合作与生存？如何开创自己的事业？这些都是职场人不得不面临的问题，但少有人告诉你答案，你需要利用书里分享的高效方法独辟蹊径，把自己变成职场里高性价比的人才。这是一本教你利用新型社交媒体开发职业潜能的自我管理读本，不管你是新人还是老鸟，都可以通过打造自品牌在职场中脱颖而出。如果不甘平庸，就亮......一起来看看《自品牌》这本书的介绍吧!

码农工具