Kubernetes The Very Hard Way - O'Reilly Media

2y ago
102 Views
4 Downloads
1.41 MB
76 Pages
Last View : 27d ago
Last Download : 2m ago
Upload by : Aarya Seiber
Transcription

Kubernetesthe Very Hard WayLaurent BernailleStaff Engineer, Infrastructure@lbernail

DatadogOver 350 integrationsOver 1,200 employeesOver 8,000 customersRuns on millions of hostsTrillions of data points per day10000s hosts in our infra10s of k8s clusters with 50-2500 nodesMulti-cloudVery fast growthlbernail

Why Kubernetes?DogfoodingImprove k8s integrationsMulti CloudCommon APIImmutableMove from ChefCommunityLarge and Dynamiclbernail

The very hard way?

It was much harder

This talk is about the fine print“Of course, you will need a HA master setup”“Oh, and yes, you will have to manage your certificates”“By the way, networking is slightly more complicated, lookinto CNI / ingress controllers”lbernail

What happens after “Kube 101”1. Resilient and Scalable Control Plane2. Securing the Control Planea. Kubernetes and Certificatesb. Exceptions?c. Impact of Certificate Rotation3. Efficient networkinga. Giving pod IPs and routing themb. Ingresses: Getting data in the clusterlbernail

What happens after “Kube 101”1. Resilient and Scalable Control Plane2. Securing the Control Planea. Kubernetes and Certificatesb. Exceptions?c. Impact of Certificate Rotation3. Efficient networkinga. Giving pod IPs and routing themb. Ingresses: Getting data in the clusterlbernail

Resilient and ScalableControl Plane

Kube 101 Control ein-clusterappskubeletkubectllbernail

Making it dulercontrollerskubectllbernail

Kube 101 Control ein-clusterappskubeletkubectllbernail

Separate etcd trollerskubectllbernail

Single active edulerMastercontrollerskubectllbernail

Split alancerkubeletschedulerskubectlschedulerslbernail

Split kubeletschedulerskubectlschedulerslbernail

Sizing the control planeetcdeventsetcdapiserver2x (3 or 5 nodes)disk net iosapiserverapiserverX nodesRAM net ioscontrollers2 erkubeletschedulerskubectl2 nodesCPUschedulerslbernail

What happens after “Kube 101”1. Resilient and Scalable Control Plane2. Securing the Control Planea. Kubernetes and Certificatesb. Exceptions?c. Impact of Certificate Rotation3. Efficient networkinga. Giving pod IPs and routing themb. Ingresses: Getting data in the clusterlbernail

Kubernetes andCertificates

From “the hard way”lbernail

“Our cluster broke after 1y”lbernail

Certificates in Kubernetes Kubernetes uses certificates everywhere Very common source of incidents Our Strategy: Rotate all certificates dailylbernail

Certificate managementPeer/Server certetcdetcd PKIVaulttretcdctEneilCapiserverlbernail

Certificate managementetcd PKIPeer/Server certetcdVaultliendCEtctretcebuk/reApiservtrectnlet cliekube PKIapiservercontrollersController client certschedulerScheduler client certkubeletKubelet client/server certlbernail

Certificate managementetcd PKIPeer/Server certetcdVaultliendCEtctretcebuk/reApiservSA public keyapiserverSAnetokIn-clusterapptrectnlet cliecontrollersschedulerkubeletkube PKIkube kvyekteSavirApController client certScheduler client certKubelet client/server certlbernail

Certificate managementetcd PKIPeer/Server certetcd)skoohb/weapiservice PKIyxort iseEtcSA public erapiservicewebhook.kubeletVaultkube PKIkube kvyekteSavirApController client certScheduler client certKubelet client/server certlbernail

Certificate managementetcd PKIPeer/Server certOIDCprovideretcd)skoohb/weapiservice PKIyxort iseEtcOIDC authkubectlSA public erapiservicewebhook.kubeletVaultkube PKIkube kvyekteSavirApController client certScheduler client certKubelet client/server certlbernail

Exception ?Incident.

Kubelet: TLS Bootstrapapiserver1- Create Bootstrap tokenkube PKIcontrollers3- Get signing keykube kvVaultadmin2- Add Bootstrap token to vaultlbernail

Kubelet: TLS Bootstrapapiserver2- Authenticate with token4- Create CSR7- Download certificate8- Authenticate with cert9- Register node3- Verify Token and map groupscontrollerskube PKIkube kvVault5- Verify RBAC for CSR creator6- Sign certificate1- Get Bootstrap tokenkubeletlbernail

Kubelet certificate issue1.2.3.4.5.One day, some Kubelets were failing to start or took 10s of minutesNothing in logsEverything looked good but they could not get a certTurns out we had a lot of CSRs in flightSigning controller was having a hard time evaluating them allCSR resources in the clusterLower is better!lbernail

Why?Kubelet Authentication Initial creation: bootstrap token, mapped to group “system:bootstrappers” Renewal: use current node certificate, mapped to group “system:nodes“Required RBAC permissions CSR creation CSR auto-approvalCSR creationCSR Klbernail

Exception 2?Incident 2.

Temporary solutionapiserverCreate webhook withself-signed cert as CAVaultGet cert and keyadminwebhookkube kvAdd self-signed cert key to VaultOne day, after 1 year Creation of resources started failing (luckily only a Custom Resource) Cert had expired.lbernail

Take-away Rotate server/client certificates Not easyBut, “If it’s hard, do it often” no expiration issues anymorelbernail

Impact ofCertificate rotation

Apiservercertificate rotation

Impact on etcdapiserver restartsWe have multiple apiserversWe restart each dailyetcd trafficetcd slow queriesSignificant etcd network impact(caches are repopulated)Significant impact on etcd performanceslbernail

Impact on Load-balancersapiserver restartsELB surge queueSignificant impact on LB as connections are reestablishedMitigation: increase queues on apiserversnet.ipv4.tcp max syn backlognet.core.somaxconn

Impact on apiserver clientsapiserver restartscoredns memory usage Apiserver restarts clients reconnect and refresh their cache Memory spike for impacted appsNo real mitigation todaylbernail

Impact on traffic balance15MB/s2.5MB/sNumber of connections / traffic very unbalancedBecause connections are very long-livedMore clients Bigger impact clusterwide2300 connections300 connectionslbernail

Why? Simple simulationSimulation for 48h 5 apiservers 10000 connections (4 x 2500 nodes) Every 4h, one apiserver restarts Reconnections evenly dispatchedCause Cloud TCP load-balancers use round-robin Long-lived connections No rebalancinglbernail

Kubelet certificaterotation

Pod graceful terminationadmin orcontrollerDelete podapiserverkubeletStop Containerwith rdSend SIGTERMAfter timeout, send SIGKILLcontainer

Restarts impact graceful terminationadmin orcontrollerDelete podapiserverkubeletcontainerdSend SIGTERMAfter timeout, or Context Cancelledsend SIGKILLKubelet restarts end graceful terminationFixed upstream“Do not SIGKILL container if container stop is /1099container

Impact on pod readinesskubelet restarts on “system” nodes (coredns other services)coredns endpoints NotReadyOn kubelet restart Readiness probes marked as failed Pods removed from service endpoints Requires readiness to succeed againIssue upstream“pod with readinessProbe will be not ready when kubelet /issues/78733

Take-awayRestarting components is not transparentIt would be great if Components could transparently reload certs (server & client)Clients could wait 0-Xs to reconnect to avoid thundering herdReconnections did not trigger memory spikesCloud TCP load-balancers supported least-conn algorithmConnections were rebalanced (kill them after a while?)lbernail

What happens after “Kube 101”1. Resilient and Scalable Control Plane2. Securing the Control Planea. Kubernetes and Certificatesb. Exceptions?c. Impact of Certificate Rotation3. Efficient networkinga. Giving pod IPs and routing themb. Ingresses: Getting data in the clusterlbernail

Efficientnetworking

Network challengesThroughputTrillions of data points dailyLatencyEnd-to-end pipelineScale1000-2000 nodes clustersTopologyMultiple clustersAccess from standard VMslbernail

Giving pods IPs &Routing them

From “the Hard Way”node IPPod CIDR for this nodelbernail

Small cluster? Static routesNode 1Node 2IP: 192.168.0.1Pod CIDR: 10.0.1.0/24IP: 192.168.0.2Pod CIDR: 10.0.2.0/24Routes (local or cloud provider)10.0.1.0/24 192.168.0.110.0.2.0/24 192.168.0.2Limitslocal: nodes must be in the same subnetcloud provider: number of routeslbernail

Mid-size cluster? OverlayNode 1Node 2IP: 192.168.0.1Pod CIDR: 10.0.1.0/24IP: 192.168.0.2Pod CIDR: 10.0.2.0/24VXLANVXLANTunnel traffic between hostsExamples: Calico, FlannelLimitsOverhead of the overlayScaling route distribution (control plane)lbernail

Large cluster with a lot of traffic?Native pod routingPerformanceDatapath: no overheadControl plane: simplerAddressingPod IPs are accessible from Other clusters VMslbernail

In practiceOn premiseGCPAWSBGPCalicoKube-routerMacvlanIP aliasesAdditional IPs on ENIsAWS EKS CNI pluginLyft CNI pluginCilium ENI IPAMlbernail

How it works on AWSagentPod 1Pod 2ip 1eth0Attach ENIAllocate IPseth1ip 1ip 2ip 3veteeaCrcontainerdCNIcnithkubeletCRIRouting rule“From IP1, use eth1”gniutRoeth0lbernail

Address space planningPod Cidr: /2410. (8bits)4bits4 bits availableUp to 16 clustersnode prefix: 12bitsUp to 4096 nodespod cidr 8bitsUp to 255 pods per nodeSimple addressing /24 leads to inefficient address usage sig-network: remove contiguous range requirement for CIDR allocation But also Address space for node IPs (another /20 per cluster for 4096 nodes) Service IP range (/20 would make sense for such a cluster) Total: 1 /15 for pods, 2 /20 for nodes and service!lbernail

Take-away Native pod routing has worked very well at scaleA bit more complex to debugMuch more efficient datapathTopic is still dynamic (Cilium introduced ENI recently)Great relationship with Lyft / CiliumPlan your address space earlylbernail

Ingresses

Ingress: cross-clusters, VM to clustersB?C?AABABC?BCluster 2Classic (VM)CCDDCluster 1lbernail

Kubernetes default: LB serviceMasterservice-controllerNP podpodNP kube-proxypodNP kube-proxydata pathhealth checksconfiguration (from watching ingresses on apiservers)lbernail

Inefficient Datapath & cross-application impactsMasterservice-controllerNP kube-proxyHealthcheckerWeb trafficLoad-Balancerweb-1web-2NP kube-proxyweb-3NP kube-proxykafkadata pathhealth checksconfiguration (from watching ingresses on apiservers)lbernail

ExternalTrafficPolicy: Local?Masterservice-controllerNP kube-proxyHealthcheckerWeb trafficLoad-Balancerweb-1web-2NP kube-proxyweb-3NP kube-proxykafkadata pathhealth checksconfiguration (from watching ingresses on apiservers)lbernail

L7-proxy ingress controllerCreate l7proxy deploymentsUpdate backends using service rNP -BalancerNP kube-proxypodl7proxypodpodNP kube-proxydata pathhealth checksconfigurationfrom watching ingresses/endpoints on apiservers (ingress-controller)from watching LoadBalancer services (service-controller)lbernail

ChallengesLimitsAlternatives?All nodes as backends (1000 )Inefficient datapathCross-application impactsExternalTrafficPolicy: Local? Number of nodes remains the same Issues with some CNI pluginsK8s ingress Still load-balancer based Need to scale ingress pods Still inefficient datapathlbernail

Our target: native thcheckerpodALBpoddata pathhealth checksconfiguration (from watching ingresses/endpoints on apiservers)lbernail

Remaining challengesLimited to HTTP ingressesNo support for TCP/UDPRegistration delaySlow registration with LBPod rolling-updates much fasterIngress v2 should address thisMitigations- MinReadySeconds- Pod ReadinessGateslbernail

WorkaroundTCP / Registration delay not manageable Dedicated roxyl7proxypodpodpodpodNot managed by k8sDedicated nodesPods in host networklbernail

Take-away Ingress solutions are not great at scale yet May require workarounds Definitely a very important topic for us The community is working on v2 Ingresseslbernail

Conclusion

A lot of other topics Accessing services (kube-proxy) DNS (it’s always DNS!) Challenges with Stateful applications How to DDOS insert anything with Daemonsets Node Lifecycle / Cluster Lifecycle Deploying applications .lbernail

Getting started?“Deep Dive into Kubernetes Internals for Builders and Operators”Jérôme Petazzoni, Lisa ml.htmlMinimal cluster, showing interactions between main components“Kubernetes the Hard Way”Kelsey etes-the-hard-wayHA control plane with encryptionlbernail

You like horror stories?“Kubernetes the very hard way at Datadog”https://www.youtube.com/watch?v 2dsCwp j0yQ“10 ways to shoot yourself in the foot with Kubernetes”https://www.youtube.com/watch?v QKI-JRs2RIE“Kubernetes Failure Stories”https://k8s.aflbernail

Key lessonsSelf-managed Kubernetes is hard If you can, use a managed serviceNetworking is not easy (especially at scale)The main challenge is not technical Build a team Transforming practices and training users is very importantlbernail

Thank youWe’re atadoghq.com@lbernail

Kubernetes the Very Hard Way Laurent Bernaille Staff Engineer, Infrastructure @lbernail. lbernail Datadog Over 350 integrations Over 1,200 employees Over 8,000 customers Runs on millions of hosts Trillions of data points per day 10000s hosts in our infra 10s of k8s

Related Documents:

May 02, 2018 · D. Program Evaluation ͟The organization has provided a description of the framework for how each program will be evaluated. The framework should include all the elements below: ͟The evaluation methods are cost-effective for the organization ͟Quantitative and qualitative data is being collected (at Basics tier, data collection must have begun)

Silat is a combative art of self-defense and survival rooted from Matay archipelago. It was traced at thé early of Langkasuka Kingdom (2nd century CE) till thé reign of Melaka (Malaysia) Sultanate era (13th century). Silat has now evolved to become part of social culture and tradition with thé appearance of a fine physical and spiritual .

On an exceptional basis, Member States may request UNESCO to provide thé candidates with access to thé platform so they can complète thé form by themselves. Thèse requests must be addressed to esd rize unesco. or by 15 A ril 2021 UNESCO will provide thé nomineewith accessto thé platform via their émail address.

̶The leading indicator of employee engagement is based on the quality of the relationship between employee and supervisor Empower your managers! ̶Help them understand the impact on the organization ̶Share important changes, plan options, tasks, and deadlines ̶Provide key messages and talking points ̶Prepare them to answer employee questions

Dr. Sunita Bharatwal** Dr. Pawan Garga*** Abstract Customer satisfaction is derived from thè functionalities and values, a product or Service can provide. The current study aims to segregate thè dimensions of ordine Service quality and gather insights on its impact on web shopping. The trends of purchases have

Kubernetes and Canonical This reference architecture based on Canonical's Charmed Kubernetes. Canonical commercially distributes and supports the pure upstream version of Kubernetes. Ubuntu is the reference operating system for Kubernetes deployments, making it an easy way to build Kubernetes clusters.

The top Kubernetes environments are Minikube (37%), on-prem Kubernetes installations (31%), and Docker Kubernetes (29%). On-prem Kubernetes installation increased to 31% from 23% last year. Packaging Applications What is your preferred method for packaging Kubernetes applications? Helm is still the most popular tool for packaging Kubernetes

Kubernetes support in Docker for Desktop 190 Pods 196 Comparing Docker Container and Kubernetes pod networking 197 Sharing the network namespace 198 Pod life cycle 201 Pod specification 202 Pods and volumes 204 Kubernetes ReplicaSet 206 ReplicaSet specification 207 Self-healing208 Kubernetes deployment 209 Kubernetes service 210