Kubernetes: ContainerOrchestration and Micro-ServiceslogoUniversity of Washington 590s2016-11-16Alexander Mohr mohr@google.com Technical Lead / Manager on Google Container Engine and KubernetesGithub: @alex-mohr Email: mohr@google.comGoogle Cloud Platform
Contents1. Systems Projects at Google Seattle and Kirkland (2-3 mins)2. Brief Docker Container Primer (5-10 mins)3. Kubernetes: Container Orchestration (many mins)Google Cloud Platform
Prelude: Systems Projects at Google Seattle and KirklandSeattle:Kirkland: Chrome Cloud (incl. Flywheel) Cloud Machine Learning (Matt Welch) (Mona Attariyan) Flume / Dataflow / Apache Beam Spanner (Craig Chambers) (?) Compute Engine VM Hypervisor Compute Engine’s Control Plane (Mike Dahlin) (Mike Dahlin) Kubernetes Container Engine Compute Engine’s Persistent Disk (Alex Mohr) (?) App Engine Flex Thialfi notifications (Tomas Isdal) (Atul Adya) Cloud Storage (?)These are some of the (public) projects explicitly focused FOOon systems. Other areas require systems knowledge too! (Michael Piatek)
Contents1. Prelude: Systems Projects at Google Seattle and Kirkland2. Brief Docker Container Primera. Runtimeb. Building Imagesc. Shipping Images3. Kubernetes: Container OrchestrationGoogle Cloud Platform
What are Containers? (Part 1: the Runtime)Virtualize the kernel’s syscall interface no guest OS or hypervisor as with VMsIsolation (from each other and from the host) chroots namespaces cgroupsPackaging hermetically sealed bundles no external dependencies no DLL hell portable from dev laptop to on-prem & cloudsGoogle Cloud Platformappapplibslibsappapplibslibskernel
What are Containers? (Part 2: Building an Image)% cat - DockerfileFROM node:4.4EXPOSE 8080COPY server.js .CMD node server.jsGoogle Cloud Platform
What are Containers? (Part 2: Building an Image)% cat DockerfileFROM node:4.4EXPOSE 8080COPY server.js .CMD node server.js% docker build -t gcr.io/mohr-dev/hello-node:v1 .[log spam]Google Cloud Platform
What are Containers? (Part 2: Building an Image)% cat DockerfileFROM node:4.4EXPOSE 8080COPY server.js .CMD node server.js% docker build -t gcr.io/mohr-dev/hello-node:v1 .[log spam]% docker run -d -p 8080:8080 --name hello tutorial gcr.io/mohr-dev/hello-node:v1Google Cloud Platform
What are Containers? (Part 2: Building an Image)% cat DockerfileFROM node:4.4EXPOSE 8080COPY server.js .CMD node server.js% docker build -t gcr.io/mohr-dev/hello-node:v1 .[log spam]% docker run -d -p 8080:8080 --name hello tutorial gcr.io/mohr-dev/hello-node:v1% curl http://localhost:8080/Hello World!Google Cloud Platform
What are Containers? (Part 3: Shipping an Image)The magic:% gcloud docker --authorize-only% docker push gcr.io/mohr-dev/hellonode:v1The push refers to a repository [gcr.io/mohr-dev/hellonode] (len: 1)[.]v1: digest: 75be861fa8c6df5a29c4d size: 12985Google Cloud Platform
What are Containers? (Part 3: Shipping an Image)The magic:% gcloud docker --authorize-only% docker push gcr.io/mohr-dev/hellonode:v1The push refers to a repository [gcr.io/mohr-dev/hellonode] (len: 1)[.]v1: digest: 75be861fa8c6df5a29c4d size: 12985Then, from any other machine:% docker pull gcr.io/mohr-dev/hellonode:v1v1: Pulling from mohr-dev/hellonodeDigest: 75be861fa8c6df5a29c4dStatus: Image is up to date for gcr.io/mohr-dev/hellonode:v1% docker run ARGS gcr.io/mohr-dev/hellonode:v1.Google Cloud Platform
Contents1. Prelude: Systems Projects at Google Seattle and Kirkland2. Brief Docker Container Primer3. Kubernetes: Container OrchestrationGoogle Cloud Platform
Image by Connie Zhou
FailuresA 2000-machine cluster will have1 to 10 machine failures per day.This is not a problem: it's normal.Images by Connie Zhou
KubernetesGreek for “Helmsman”; also the root of thewords “governor” and “cybernetic” Manages container clusters Inspired and informed by Google’s experiencesand internal systems Supports multiple cloud and bare-metalenvironments Supports multiple container runtimes 100% Open source, written in GoManage applications, not machinesGoogle Cloud Platform
All you really care aboutContainerClusterAPIUIGoogle Cloud Platform
The 10000 foot ntrollersuserskubeletmasternodesGoogle Cloud Platform
Container clusters: A story in two partsGoogle Cloud Platform
Container clusters: A story in two parts1. Setting up the cluster Choose a cloud: GCE, AWS, Azure, Rackspace, on-premises, .Choose a node OS: CoreOS, Atomic, RHEL, Debian, CentOS, Ubuntu, .Provision machines: Boot VMs, install and run kube components, .Configure networking: IP ranges for Pods, Services, SDN, .Start cluster services: DNS, logging, monitoring, .Manage nodes: kernel upgrades, OS updates, hardware failures.Not the easy or fun part, but unavoidableThis is where things like Google Container Engine (GKE) really helpGoogle Cloud Platform
Container clusters: A story in two parts2. Using the cluster Run Pods & Containers ReplicaSets & Deployments & DaemonSets & StatefulSets Services & Volumes & Secrets & AutoscalersThis is the fun part!A distinct set of problems from cluster setup and managementDon’t make developers deal with cluster administration!Accelerate development by focusing on the applications, not the clusterGoogle Cloud Platform
Kubernetes: a Cloud OS?Perhaps grandiose, but attempts at “Cloud OS” primitives: Scheduling: Decide where my containers should run Lifecycle and health: Keep my containers running despitefailures Scaling: Make sets of containers bigger or smaller Naming and discovery: Find where my containers are now Load balancing: Distribute traffic across a set of containers Storage volumes: Provide data to containers Logging and monitoring: Track what’s happening with mycontainers Debugging and introspection: Enter or attach to containers Identity and authorization: Control who can do things tomy containersGoogle Cloud Platform
Workload PortabilityGoogle Cloud Platform
Workload portabilityGoal: Avoid vendor lock-inRuns in many environments, including“bare metal” and “your laptop”The API and the implementation are100% openThe whole system is modular andreplaceableGoogle Cloud Platform
Workload portabilityGoal: Write once, run anywhere*Don’t force apps to know aboutconcepts that arecloud-provider-specificExamples of this: Network modelIngressService load-balancersPersistentVolumes* approximatelyGoogle Cloud Platform
Workload portabilityResult: PortabilityBuild your apps on-prem, lift-and-shiftinto cloud when you are readyDon’t get stuck with a platform thatdoesn’t work for youPut your app on wheels and move itwhenever and wherever you needGoogle Cloud Platform
NetworkingGoogle Cloud Platform
Docker Google Cloud Platform
Docker 6.1.2172.16.1.1Google Cloud Platform
Port mappingA: 172.16.1.1C: 172.16.1.133069376SNAT800080SNAT11878B: 172.16.1.2Google Cloud Platform
Port mappingDETCEJERA: 172.16.1.133069376SNAT80SNATC: 172.16.1.111878B: 172.16.1.2Google Cloud Platform8000
Kubernetes networkingIPs are cluster-scoped vs docker default private IPPods can reach each other directly even across nodesNo brokering of port numbers too complex, why bother?This is a fundamental requirement can be L3 routed can be underlayed (cloud) can be overlayed (SDN)Google Cloud Platform
Kubernetes 1.0/2410.1.3.110.1.3.0/24Google Cloud Platform
PodsGoogle Cloud Platform
PodsSmall group of containers & volumesContentManagerConsumersTightly coupledThe atom of scheduling & placementShared namespaceFilePuller share IP address & localhost share IPC, etc.Managed lifecycle bound to a node, restart in place can die, cannot be reborn with same IDVolumeExample: data puller & web serverPodGoogle Cloud PlatformWebServer
VolumesPod-scoped storageSupport many types of volume plugins Empty dir (and tmpfs)Host pathGit repositoryGCE Persistent DiskAWS Elastic Block StoreAzure File StorageiSCSIFlockerNFS vSphereGlusterFSCeph File and RBDCinderFibreChannelSecret, ConfigMap,DownwardAPIFlex (exec a binary).Google Cloud Platform
Labels & SelectorsGoogle Cloud Platform
LabelsArbitrary metadataAttached to any API objectGenerally represent identityQueryable by selectors think SQL ‘select . where .’The only grouping mechanism pods under a ReplicaSet pods in a Service capabilities of a node (constraints)Google Cloud Platform
SelectorsApp: MyAppApp: MyAppPhase: prodPhase: prodRole: FERole: BEApp: MyAppApp: MyAppPhase: testPhase: testRole: FERole: BEGoogle Cloud Platform
SelectorsApp: MyAppApp: MyAppPhase: prodPhase: prodRole: FERole: BEApp: MyAppApp: MyAppPhase: testPhase: testRole: FERole: BEApp MyAppGoogle Cloud Platform
SelectorsApp: MyAppApp: MyAppPhase: prodPhase: prodRole: FERole: BEApp: MyAppApp: MyAppPhase: testPhase: testRole: FERole: BEApp MyApp, Role FEGoogle Cloud Platform
SelectorsApp: MyAppApp: MyAppPhase: prodPhase: prodRole: FERole: BEApp: MyAppApp: MyAppPhase: testPhase: testRole: FERole: BEApp MyApp, Role BEGoogle Cloud Platform
SelectorsApp: MyAppApp: MyAppPhase: prodPhase: prodRole: FERole: BEApp: MyAppApp: MyAppPhase: testPhase: testRole: FERole: BEApp MyApp, Phase prodGoogle Cloud Platform
SelectorsApp: MyAppApp: MyAppPhase: prodPhase: prodRole: FERole: BEApp: MyAppApp: MyAppPhase: testPhase: testRole: FERole: BEApp MyApp, Phase testGoogle Cloud Platform
ReplicationGoogle Cloud Platform
ReplicaSetsReplicaSetA simple control loopRuns out-of-process wrt API server-name “my-rc”selector {“App”: “MyApp”}template { . }replicas 4One job: ensure N copies of a pod grouped by a selector too few? start some too many? kill someHowmany?Start 1moreHowmany?3OKLayered on top of the public Pod APIAPI ServerReplicated pods are fungible No implied order or identityGoogle Cloud Platform4
Control loops: the Reconciler PatternDrive current state - desired stateobserveAct independentlyAPIs - no shortcuts or back doorsactObserved state is truth*Recurring pattern in the systemdiffExample: ReplicaSet* Observations are really stale caches of what once was your view of truth.Google Cloud Platform
ServicesGoogle Cloud Platform
ServicesA group of pods that work togetherClient grouped by a selectorDefines access policy “load balanced” or “headless”Virtual IPCan have a stable virtual IP and port also a DNS nameVIP is managed by kube-proxy watches all services updates iptables when backends change default implementation - can be replaced!Hides complexityGoogle Cloud Platform
iptables kube-proxyNode Xkube-proxyapiserveriptablesGoogle Cloud Platform
iptables kube-proxyservices &endpointsNode Xkube-proxywatchapiserveriptablesGoogle Cloud Platform
iptables kube-proxykubectl run .Node Xkube-proxywatchapiserveriptablesGoogle Cloud Platform
iptables kube-proxyNode Xkube-proxywatchapiserverscheduleiptablesGoogle Cloud Platform
iptables kube-proxykubectl expose .Node Xkube-proxywatchapiserveriptablesGoogle Cloud Platform
iptables kube-proxynewservice!Node Xkube-proxyupdateapiserveriptablesGoogle Cloud Platform
iptables kube-proxyNode Xkube-proxywatchapiserverconfigureiptablesGoogle Cloud Platform
iptables kube-proxyNode Xkube-proxywatchapiserverVIPiptablesGoogle Cloud Platform
iptables kube-proxynewendpoints!Node Xkube-proxyupdateapiserverVIPiptablesGoogle Cloud Platform
iptables kube-proxyNode e Cloud Platform
iptables kube-proxyNode Xkube-proxywatchapiserverVIPiptablesGoogle Cloud Platform
iptables kube-proxyNode Xkube-proxywatchapiserverClientVIPiptablesGoogle Cloud Platform
iptables kube-proxyNode Xkube-proxywatchapiserverClientVIPiptablesGoogle Cloud Platform
iptables kube-proxyNode Xkube-proxywatchapiserverClientVIPiptablesGoogle Cloud Platform
iptables kube-proxyNode Xkube-proxywatchapiserverClientVIPiptablesGoogle Cloud Platform
External servicesServices VIPs are only available inside the clusterNeed to receive traffic from “the outside world”Service “type” NodePort: expose on a port on every node LoadBalancer: provision a cloud load-balancerDiY load-balancer solutions socat (for nodePort remapping) haproxy nginxIngress (L7 LB)Google Cloud Platform
Ingress (L7 LB)ClientMany apps are HTTP/HTTPSServices are L4 (IP port)Ingress maps incoming traffic to backendservicesURL Map by HTTP host headers by HTTP URL pathsHAProxy, NGINX, AWS and GCEimplementations in progressNow with SSL!Status: BETA in Kubernetes v1.2Google Cloud Platform
Rolling UpdateGoogle Cloud Platform
Rolling UpdateService- app: MyAppReplicaSet- name: my-app-v1- replicas: 3- selector:- app: MyApp- version: v1Google Cloud Platform
Rolling UpdateService- app: MyAppReplicaSet- name: my-app-v1- replicas: 3- selector:- app: MyApp- version: v1ReplicaSet- name: my-app-v2- replicas: 0- selector:- app: MyApp- version: v2Google Cloud Platform
Rolling UpdateService- app: MyAppReplicaSet- name: my-app-v1- replicas: 3- selector:- app: MyApp- version: v1ReplicaSet- name: my-app-v2- replicas: 1- selector:- app: MyApp- version: v2Google Cloud Platform
Rolling UpdateService- app: MyAppReplicaSet- name: my-app-v1- replicas: 2- selector:- app: MyApp- version: v1ReplicaSet- name: my-app-v2- replicas: 1- selector:- app: MyApp- version: v2Google Cloud Platform
Rolling UpdateService- app: MyAppReplicaSet- name: my-app-v1- replicas: 2- selector:- app: MyApp- version: v1ReplicaSet- name: my-app-v2- replicas: 2- selector:- app: MyApp- version: v2Google Cloud Platform
Rolling UpdateService- app: MyAppReplicaSet- name: my-app-v1- replicas: 1- selector:- app: MyApp- version: v1ReplicaSet- name: my-app-v2- replicas: 2- selector:- app: MyApp- version: v2Google Cloud Platform
Rolling UpdateService- app: MyAppReplicaSet- name: my-app-v1- replicas: 1- selector:- app: MyApp- version: v1ReplicaSet- name: my-app-v2- replicas: 3- selector:- app: MyApp- version: v2Google Cloud Platform
Rolling UpdateService- app: MyAppReplicaSet- name: my-app-v1- replicas: 0- selector:- app: MyApp- version: v1ReplicaSet- name: my-app-v2- replicas: 3- selector:- app: MyApp- version: v2Google Cloud Platform
Rolling UpdateService- app: MyAppReplicaSet- name: my-app-v2- replicas: 3- selector:- app: MyApp- version: v2Google Cloud Platform
DeploymentsGoogle Cloud Platform
DeploymentsUpdates-as-a-service Rolling update is imperative, client-sideDeployment manages replica changes for you stable object name updates are configurable, done server-side kubectl edit or kubectl applyAggregates statsCan have multiple updates in flight.Status: BETA in Kubernetes v1.2Google Cloud Platform
DaemonSetsGoogle Cloud Platform
DaemonSetsPodProblem: how to run a Pod on every node? or a subset of nodesSimilar to ReplicaSet principle: do one thing, don’t overload“Which nodes?” is a selectorUse familiar tools and patternsStatus: BETA in Kubernetes v1.2Google Cloud Platform
JobsGoogle Cloud Platform
JobsRun-to-completion, as opposed to run-forever Express parallelism vs. required completions Workflow: restart on failure Build/test: don’t restart on failureAggregates success/failure countsBuilt for batch and big-data workStatus: GA in Kubernetes v1.2.Google Cloud Platform
PersistentVolumesGoogle Cloud Platform
PersistentVolumesA higher-level storage abstraction insulation from any one cloud environmentAdmin provisions them, users claim them NEW: auto-provisioning (alpha in v1.2)ClaimIndependent lifetime from consumers lives until user is done with it can be handed-off between podsDynamically “scheduled” and managed, likenodes and podsGoogle Cloud Platform
PersistentVolumesClusterAdminGoogle Cloud Platform
AdminGoogle Cloud Platform
Google Cloud Platform
aimCreateUserGoogle Cloud Platform
aimUserGoogle Cloud PlatformBinder
aimCreatePodUserGoogle Cloud Platform
laimPodUserGoogle Cloud Platform
ClaimDeletePodUserGoogle Cloud Platform
laimUserGoogle Cloud Platform
laimCreatePodUserGoogle Cloud Platform
laimPodUserGoogle Cloud Platform
laimDeletePodUserGoogle Cloud Platform
laimDeleteUserGoogle Cloud Platform
clerUserGoogle Cloud Platform
StatefulSetsGoogle Cloud Platform
StatefulSetsGoal: enable clustered software on Kubernetes mysql, redis, zookeeper, .Clustered apps need “identity” and sequencingguarantees stable hostname, available in DNSan ordinal indexstable storage: linked to the ordinal & hostnamediscovery of peers for quorumstartup/teardown orderingStatus: ALPHA in Kubernetes v1.3Google Cloud Platform
ConfigMapsGoogle Cloud Platform
ConfigMapsAPIGoal: manage app configuration .without making overly-brittle container imagesConfigMapPod12-factor says config comes from theenvironment Kubernetes is the environmentManage config via the Kubernetes APInodeInject config as a virtual volume into your Pods late-binding, live-updated (atomic) also available as env varsStatus: GA in Kubernetes v1.2Google Cloud Platform
SecretsGoogle Cloud Platform
SecretsAPIGoal: grant a pod access to a secured something don’t put secrets in the container image!PodSecret12-factor says config comes from theenvironment Kubernetes is the environmentManage secrets via the Kubernetes APInodeInject secrets as virtual volumes into your Pods late-binding, tmpfs - never touches disk also available as env varsGoogle Cloud Platform
HorizontalPodAutoscalersGoogle Cloud Platform
HorizontalPodAutoScalersStatsGoal: Automatically scale pods as needed based on CPU utilization (for now) custom metrics in AlphaEfficiency now, capacity when you need itOperates within user-defined min/max boundsSet it and forget it.Status: GA in Kubernetes v1.2Google Cloud Platform
Multi-Zone ClustersGoogle Cloud Platform
Multi-Zone ClustersUserGoal: zone-fault tolerance for applicationsZero API changes relative to kubernetes Create services, ReplicaSets, etc. exactly asusualFederationMasterNodes and PersistentVolumes are labelledwith their availability zone Fully automatic for GKE, GCE, AWS Manual for on-premise and other cloudproviders (for now)Zone AZone BStatus: GA in Kubernetes v1.2Zone CGoogle Cloud Platform
NamespacesGoogle Cloud Platform
NamespacesProblem: I have too much stuff! name collisions in the API poor isolation between users don’t want to expose things like SecretsSolution: Slice up the cluster create new Namespaces as needed per-user, per-app, per-department, etc. part of the API - NOT private machines most API objects are namespaced part of the REST URL path Namespaces are just another API object One-step cleanup - delete the Namespace Obvious hook for policy enforcement (e.g. quota)Google Cloud Platform
Resource
Kubernetes: Container Orchestration and Micro-Services University of Washington 590s 2016-11-16 Alexander Mohr Technical Lead / Manager on Google Container Engine and Kubernetes Github: @alex-mohr Email: mohr@google.com
Improving Kubernetes Container Scheduling using Ant Colony Optimization Shashwat Shekhar x17101506 28th January 2019 Abstract In this paper we are looking at container scheduling algorithms which could be useful in improving the container and task scheduling for the popular container orchestration tools like Kubernetes.
Kubernetes support in Docker for Desktop 190 Pods 196 Comparing Docker Container and Kubernetes pod networking 197 Sharing the network namespace 198 Pod life cycle 201 Pod specification 202 Pods and volumes 204 Kubernetes ReplicaSet 206 ReplicaSet specification 207 Self-healing208 Kubernetes deployment 209 Kubernetes service 210
Kubernetes Engine (GKE), Amazon Elastic Container Service for Kubernetes (EKS) or Azure Kubernetes Service (AKS). B. Install, run, and manage Kubernetes on an IaaS platform such as Amazon EC2, Azure, Google Cloud or DigitalOcean. C. Install, run, and manage Kubernetes on infrastructure you own, either on bare metal or on a private cloud .
Docker has several orchestration tolls such as Kubernetes, Docker Machine and Docker swam among others. Kubernetes is one of the most feature-rich orchestration tools and is widely used. After building the container image you want with Docker, you can use Kubernetes or others to automate deployment on one or more compute nodes in the cluster.
OpenShift Container Platform uses Kubernetes which is an orchestration framework based on container-deployment practices. Kubernetes has gained popularity in the cloud community due to its maturity, scalability, performance, and many built-in tools that enable production-level container workload orchestration.
2 OLCF Container Orchestration for HPC Middleware Multiple Container Strategies at OLCF Container orchestration: Automate deploying and operating service containers with Kubernetes/OpenShift – Focused on framework for providing resources (cpu, memory,
indicated that security is a challenge hindering container adoption. Unisys Stealth with micro-segmentation, encryption and cloaking can help mitigate those vulnerabilities and further secure Kubernetes and container deployments. Introduction Docker containers and Kubernetes Orchestration improve software development and add much-needed .
PASSOVER BLUEBERRY MUFFINS (Alexa & Riley Newbold) Ingredients: -1/3 cup butter -1 scant cup of sugar -3 eggs -1/2 teaspoon vanilla -1/2 cup matzo cake meal -1/4 cup potato starch -1/4 teaspoon salt -1 cup blueberries (frozen, drained)— don’t defrost -Cinnamon sugar . Directions: Cream sugar and butter. Add three eggs one at a time, beating after each. Add vanilla and mix. Add matzo cake .