Kubernetes On NVIDIA GPUs

2y ago

46 Views

3 Downloads

1.63 MB

33 Pages

Last View : 12d ago

Last Download : 3m ago

Upload by : Wren Viola

Report this link

Download PDF

Transcription

KUBERNETES ON NVIDIA GPUSDU-09016-001 v1.0 November 2018Installation Guide

TABLE OF CONTENTSChapter 1. Introduction.1Chapter 2. Supported Platforms. 2Chapter 3. Installing Kubernetes. 33.1. Master Nodes. 43.1.1. Installing and Running Kubernetes. 43.1.2. Checking Cluster Health.43.1.3. DGX Station.53.1.3.1. For NetworkManager.53.1.3.2. For systemd-resolved. 63.2. Worker Nodes.63.2.1. Installing and Running Kubernetes. 63.2.2. Check Your Cluster State.73.3. Run GPU Tasks.73.3.1. Enable GPUs. 73.3.2. Run a GPU Workload.7Chapter 4. Cluster Customization. 9Chapter 5. Using Distribution-Specific Features. 115.1. Exposing GPU Attributes In a Node. 115.2. Scheduling GPUs By Attribute.115.3. Setting Up Monitoring. 125.4. CRI-O Runtime Preview Feature Support. 145.4.1. Install CRI-O.145.4.2. Run the CRI-O Service.155.4.3. Configure the Kubelet to Use CRI-O. 165.4.4. Run a GPU Task. 17Chapter 6. Troubleshooting.186.1. Package Installation.186.2. Cluster Initialization. 186.3. Monitoring Issues. 19Appendix A. DGX Systems. 21A.1. DGX and NGC Images.21A.2. Install NVIDIA Container Runtime for Docker 2.0. 21A.2.1. Uninstalling Old Versions. 21A.2.2. Setting Up the Repository. 22A.2.3. Install NVIDIA Container Runtime. 22Appendix B. Upgrading the Cluster. 24B.1. Upgrading the Cluster from 1.9.7 to 1.9.10. 24B.1.1. Upgrading the Control Plane.24B.1.2. Finalizing the Upgrade. 25B.2. Upgrading the Cluster from 1.9.10 to 1.10.8. 25www.nvidia.comKubernetes on NVIDIA GPUsDU-09016-001 v1.0 ii

B.2.1. Upgrading the Control Plane.26B.2.2. Finalizing the Upgrade. 26Appendix C. Support. 28www.nvidia.comKubernetes on NVIDIA GPUsDU-09016-001 v1.0 iii

www.nvidia.comKubernetes on NVIDIA GPUsDU-09016-001 v1.0 iv

Chapter 1.INTRODUCTIONKubernetes is an open-source platform for automating deployment, scaling andmanaging containerized applications. Kubernetes on NVIDIA GPUs includes supportfor GPUs and enhancements to Kubernetes so users can easily configure and use GPUresources for accelerating workloads such as deep learning. This document serves as astep-by-step guide to installing Kubernetes and using it with NVIDIA GPUs.www.nvidia.comKubernetes on NVIDIA GPUsDU-09016-001 v1.0 1

Chapter 2.SUPPORTED PLATFORMSReleases of Kubernetes up to and including 1.10.8 are supported on the followingplatforms. Note that there are certain prerequisites that must be satisfied beforeproceeding to install Kubernetes. These are detailed in the “Before you begin” section.On-Premises‣‣DGX-1 Pascal and Volta with OS Server v3.1.6DGX-Station with OS Desktop v3.1.6Cloud‣NVIDIA GPU Cloud virtual machine images available on Amazon EC2 and GoogleCloud Platform.Cluster Topology‣‣One master CPU or GPU nodeAt least one worker GPU nodeBefore You Begin‣‣‣Ensure that NVIDIA drivers are loadedEnsure that a supported version of Docker is installed.Ensure that NVIDIA Container Runtime for Docker 2.0 is installed.www.nvidia.comKubernetes on NVIDIA GPUsDU-09016-001 v1.0 2

Chapter 3.INSTALLING KUBERNETESKubernetes can be deployed through different mechanisms. NVIDIA recommends usingkubeadm to deploy Kubernetes.The master nodes run the control plane components of Kubernetes. These include theAPI Server (front-end to the kubectl CLI), etcd (stores the cluster state) and othercomponents. Master nodes need to be setup with the following three components, ofwhich only the kubelet has been customized with changes from NVIDIA:‣‣‣KubeletKubeadmKubectlWe recommend that your master nodes not be equipped with GPUs and to only run themaster components, such as the following:‣‣‣SchedulerAPI-serverController ManagerBefore proceeding to install the components, check that all the Kubernetes prerequisiteshave been satisfied:‣‣‣‣‣‣Check network adapters and required ports.Disable swap for kubelet to work correctlyInstall dependencies such as the Docker container runtime. To install Docker onUbuntu, follow the official instructions provided by Docker.The worker nodes must be provisioned with the NVIDIA driver.Ensure that NVIDIA Container Runtime for Docker 2.0 is installed on the machineRun ps -ef grep resolv to determine whether Network Manager or systemdresolved is being used.If you are setting up a single node GPU cluster for development purposes or you wantto run jobs on the master nodes as well, then you must install the NVIDIA ContainerRuntime for Docker.www.nvidia.comKubernetes on NVIDIA GPUsDU-09016-001 v1.0 3

Installing Kubernetes3.1. Master NodesInstall the required components on your master node:3.1.1. Installing and Running Kubernetes1.2.3.4.Add the official GPG keys. curl -s gpg sudo aptkey add curl -s -L https://nvidia.github.io/kubernetes/gpgkey sudo apt-key add curl -s -L idiakubernetes.list \sudo tee te the package index. sudo apt updateInstall packages. VERSION 1.10.8 nvidia sudo apt install -y kubectl {VERSION} kubelet {VERSION} \kubeadm {VERSION} helm {VERSION}Start your cluster. sudo kubeadm init --ignore-preflight-errors all --config /etc/kubeadm/config.ymlYou may choose to save the token and the hash of the CA certificate as part ofof kubeadm init to join worker nodes to the cluster later. This will take a fewminutes.5. For NGC VMIs, issue a chmod command on the newly created file. sudo chmod 644 2. Checking Cluster HealthCheck that all the control plane components are running on the master node: kubectl get all temkube-scheduler-master1mwww.nvidia.comKubernetes on NVIDIA -001 v1.0 4

Installing f7b9-t5qqz1/1Running03.1.3. DGX StationOn DGX Station (and other Ubuntu 16.04 desktop systems), there is a known issuewith Kubernetes 1.9.7 and Ubuntu 16.04 Desktop where the kube-dns service willnot run. In order to work around this issue, take the following actions, dependingon the DNS resolver service you are using. In most cases for Ubuntu 16.04 desktopsystems, NetworkManager is the DNS resolver service and the procedure in ForNetworkManager applies.Run ps -ef grep resolv to determine whether Network Manager or systemdresolved is being used.3.1.3.1. For NetworkManager1.Find the active interface. route grep ' default' grep -o '[ ]* '2.(Alternately, use ifconfig.)Determine the nameservers. For interface, use the active interface listed in theoutput of the previous command. nmcli device show interface grep IP4.DNSFor example:3. nmcli device show enp2s0f0 grep .DNS[3]:192.0.2.2Create the configuration file and submit it to kubernetes with the followingcommands: cat config.ymlapiVersion: v1kind: ConfigMapmetadata:name: kube-dnsnamespace: kube-systemdata:upstreamNameservers: # 3 Nameservers Maximum ["192.0.2.0", "192.0.2.1","192.0.2.0"] kubectl create -f config.ymlwww.nvidia.comKubernetes on NVIDIA GPUsDU-09016-001 v1.0 5

Installing Kubernetes3.1.3.2. For systemd-resolved1.2.Add the following line to nf.Environment "KUBELET RESOLVER ARGS --resolv-conf /run/systemd/resolve/resolv.conf"Start kubelet. sudo systemctl start kubelet3.2. Worker NodesThe procedures in this section do not need to be completed if the goal is a singlenode cluster.DGX and NGC ImagesOn DGX systems installed with nvidia-docker version 1.0.1, NVIDIA provides an optionto upgrade the existing system environment to NVIDIA Container Runtime. Followthese instructions to upgrade your environment. Skip the following section and proceedto installing Kubernetes on the worker nodes.If you are using the NGC images on AWS or GCP, then you may skip the followingsection and proceed to installing Kubernetes on the worker nodes.3.2.1. Installing and Running Kubernetes1.2.3.4.Add the official GPG keys. curl -s gpg sudo aptkey add curl -s -L https://nvidia.github.io/kubernetes/gpgkey sudo apt-key add curl -s -L idiakubernetes.list \sudo tee te the package index. sudo apt updateInstall packages. VERSION 1.10.8 nvidia sudo apt install -y kubectl {VERSION} kubelet {VERSION} \kubeadm {VERSION} helm {VERSION}Before starting your cluster, retrieve the token and CA certificate hash you recordedfrom when kubeadm init was run on the master node. Alternatively, to retrievethe token, use the following command. sudo kubeadm token create --print-join-commandwww.nvidia.comKubernetes on NVIDIA GPUsDU-09016-001 v1.0 6

Installing Kubernetes5.Join the worker node to the cluster with a command similar to the following. (Thecommand below is an example that will not work for your installation.) sudo kubeadm join --token token master-ip : master-port --discoverytoken-ca-cert-hash sha256:: hash --ignore-preflight-errors all3.2.2. Check Your Cluster StateRun the following command on the Master Node and make sure your GPU workernodes appear and their state transitions to Healthy. It may take a few minutes for thestatus to change. kubectl describe nodes3.3. Run GPU Tasks3.3.1. Enable GPUsAs part of installing Kubernetes on the worker nodes, GPUs are enabled by default.It may take up to a minute or two for GPUs to be enabled on your cluster (i.e. forKubernetes to download and run containers).Once Kubernetes has downloaded and run containers, run the following commandto see GPUs listed in the resource section: kubectl describe nodes grep -B 3 gpuCapacity:cpu:8memory:32879772Kinvidia.com/gpu: : 23.3.2. Run a GPU WorkloadMake sure the GPU support has been properly set up by running a simple CUDAcontainer. We provided one in the artifacts you downloaded (you’ll need to have a GPUwith at least 8GB). There are also other examples available in the examples directory.1.2.Start the CUDA sample workload. kubectl create -f /etc/kubeadm/examples/pod.ymlWhen the pod is running, you can execute the nvidia-smi command inside thecontainer. kubectl exec -it gpu-pod nvidia-smi -------------------------- NVIDIA-SMI 384.125Driver Version: 384.125 www.nvidia.comKubernetes on NVIDIA GPUsDU-09016-001 v1.0 7

Installing Kubernetes ------------------------------- --------------------- ---------------------- GPU NamePersistence-M Bus-IdDisp.A Volatile Uncorr.ECC Fan Temp Perf Pwr:Usage/Cap Memory-Usage GPU-Util ComputeM. 0 Tesla V100-SXM2. On 00000000:00:1E.0 Off 0 N/A34CP020W / 300W 10MiB / 16152MiB 0%Default ------------------------------- --------------------- ---------------------- -------------------------- Processes:GPUMemory GPUPIDTypeProcess nameUsage No running processes found -------------------------- www.nvidia.comKubernetes on NVIDIA GPUsDU-09016-001 v1.0 8

Chapter 4.CLUSTER CUSTOMIZATIONAfter starting up your cluster, it is setup with a few basic utilities:‣‣‣‣The Flannel network pluginThe NVIDIA Device Plugin which allows you to enable GPU supportHelm, the Kubernetes package managerThe NVIDIA Monitoring StackEdit /etc/kubeadm/post-install.sh to change some or all of these auto-deployedutilities.The Flannel Network PluginKubernetes clusters need a pod network addon installed. Flannel is deployed by default,for multiple reasons:‣‣‣Recommended by KubernetesUsed in production clustersIntegrates well with the CRI-O runtimeFor more information and other networking options refer to: https://kubernetes.io/podnetwork.HelmHelm is automatically installed and deployed with Kubernetes on NVIDIA GPUs. Helmhelps you manage Kubernetes applications, through helm charts it allows you to define,install, and upgrade even the most complex Kubernetes application. Charts are packagesof pre-configured Kubernetes resources.For more information on helm refer to: https://github.com/helm/helm.Monitoring StackAn integrated monitoring stack is deployed by default in your cluster to monitor healthand get metrics from GPUs in Kubernetes.www.nvidia.comKubernetes on NVIDIA GPUsDU-09016-001 v1.0 9

Cluster CustomizationThe stack is deployed using helm (the charts can be found at /etc/kubeadm/monitoring)and uses the NVIDIA Datacenter GPU Manager (DCGM), Prometheus (usingPrometheus Operator), and Grafana for visualizing the various metrics.You can change some of the configuration in the values file of the ts/exporter-node/values.ymlTainting the Master Node (Optional)If you are setting up a multi-node cluster and you do not want jobs to run on the masternode (to avoid impact on control plane performance), set the master Kubernetes node todeny pods that can't run on the master node: kubectl taint nodes MasterNodeName etes on NVIDIA GPUsDU-09016-001 v1.0 10

Chapter 5.USING DISTRIBUTION-SPECIFIC FEATURESKubernetes on NVIDIA GPUs include the following features that are not yet available inthe upstream release of Kubernetes:‣‣‣‣GPU attributes exposed in a nodeScheduling improvementsGPU monitoringSupport for the CRI-O runtime preview feature5.1. Exposing GPU Attributes In a NodeNodes now expose the attributes of your GPUs. This can be inspected by querying theKubernetes API at the node endpoint. The GPUs attributes currently advertised are:‣‣‣GPU memoryGPU ECCGPU compute capabilitiesInspect GPU attributes in a node with the following command: kubectl proxy --port 8000 & curl -s http://localhost:8000/api/v1/nodes grep -B 7 -A 3 gpu-memory5.2. Scheduling GPUs By AttributePods can now specify device selectors based on the attributes that are advertised on thenode. These can be specified at the container level. For example:apiVersion: v1kind: Podmetadata:name: gpu-podspec:containers:- name: cuda-containerimage: nvidia/cuda:9.0-basewww.nvidia.comKubernetes on NVIDIA GPUsDU-09016-001 v1.0 11

Using Distribution-Specific Featurescommand: ["sleep"]args: ["100000"]computeResourceRequests: ["nvidia-gpu"]computeResources:- name: "nvidia-gpu"resources:limits:nvidia.com/gpu: 1affinity:required:- key: "nvidia.com/gpu-memory"operator: "Gt"values: ["8000"] # change value to appropriate mem for GPU1.2.3.Create the pod and check its status. kubectl create -f /etc/kubeadm/examples/pod.ymlList the pods running in the cluster. kubectl get podsRun the nvidia-smi command inside the container. kubectl exec -it gpu-pod nvidia-smi5.3. Setting Up MonitoringTo set up monitoring, follow these steps.1.2.3.4.Label the GPU nodes. kubectl label nodes gpu-node-name hardware-type NVIDIAGPUEnsure that the label has been applied. kubectl get nodes --show-labelsInstall the monitoring charts. (This step is performed automatically at the end ofinstallation.) helm install .tgz --nameprometheus-operator --namespace monitoring helm install /etc/kubeadm/monitoring/kube-prometheus-0.0.43.tgz --namekube-prometheus --namespace monitoringCheck the status of the pods. It may take a few minutes for the components toinitalize and start running. kubectl get pods -n -kube-prometheus-01hwww.nvidia.comKubernetes on NVIDIA 2Running02/2Running0DU-09016-001 v1.0 12

Using Distribution-Specific 1/1Running0Forward the port fo

Kubernetes is an open-source platform for automating deployment, scaling and managing containerized applications. Kubernetes on NVIDIA GPUs includes support for GPUs and enhancements to Kubernetes so users can easily configure and use GPU resources for accelerating w

Related Documents:

CNCF SURVEY 2020

The top Kubernetes environments are Minikube (37%), on-prem Kubernetes installations (31%), and Docker Kubernetes (29%). On-prem Kubernetes installation increased to 31% from 23% last year. Packaging Applications What is your preferred method for packaging Kubernetes applications? Helm is still the most popular tool for packaging Kubernetes

81 Views

3y ago

Containerize your Apps with Docker and Kubernetes

Kubernetes support in Docker for Desktop 190 Pods 196 Comparing Docker Container and Kubernetes pod networking 197 Sharing the network namespace 198 Pod life cycle 201 Pod specification 202 Pods and volumes 204 Kubernetes ReplicaSet 206 ReplicaSet specification 207 Self-healing208 Kubernetes deployment 209 Kubernetes service 210

152 Views

3y ago

Running and licensing Oracle software in Containers and ...

Configuring Kubernetes to run Oracle Programs on Certain Kubernetes Nodes Using Generic Kubernetes Features To leverage these Kubernetes features to limit Oracle licensing requirements for Oracle Programs to certain Kubernetes nodes within a Kubernetes clusters, you should perform the following steps using kubectl and YAML editing tools: 1.

65 Views

3y ago

Canonical Charmed Kubernetes on Supermicro A+ systems ...

Kubernetes and Canonical This reference architecture based on Canonical's Charmed Kubernetes. Canonical commercially distributes and supports the pure upstream version of Kubernetes. Ubuntu is the reference operating system for Kubernetes deployments, making it an easy way to build Kubernetes clusters.

85 Views

3y ago

Kubernetes made easy with Docker EE - Containers Today

Kubernetes integration in Docker EE What the community and our customers asked for: Provide choice of orchestrators Make Kubernetes easier to manage Docker Dev to Ops user experience with Kubernetes Docker EE advanced capabilities on Kubernetes Kubernetes management on multiple Linux distributions, multiple clouds and Windows

94 Views

3y ago

SALES@RANCHER.COM A Guide to Kubernetes with Rancher

Kubernetes Engine (GKE), Amazon Elastic Container Service for Kubernetes (EKS) or Azure Kubernetes Service (AKS). B. Install, run, and manage Kubernetes on an IaaS platform such as Amazon EC2, Azure, Google Cloud or DigitalOcean. C. Install, run, and manage Kubernetes on infrastructure you own, either on bare metal or on a private cloud .

75 Views

2y ago

Learn More with FlexPod Datacenter for AI - NetApp

M5. The Cisco UCS C240 M5 Rack Server can host up to four NVIDIA T4 Tensor Core GPUs for AI inferencing, or up to two NVIDIA Tesla V100 Tensor Core GPUs for training workloads. The compact, 1RU Cisco UCS C220 M5 Rack Server can host up to two NVIDIA T4 Tensor Core GPUs. NetApp ONTAP. The ONTAP software built into

13 Views

2y ago

Copper Alloys - ThyssenKrupp

and Materials, ASTM; and by the Society of Automotive Engineers, SAE. These groups are summarized as follows: 100 Series (C10000) Coppers This group comprises the pure coppers, those with a designated mini-mum copper content of 99.3%, for high electrical conductivity. Also included within this group are the high copper alloys, those with

95 Views

3y ago

Recent Views

Grammar as a Foreign Language - List of Proceedings

Grammar as a Foreign Language Oriol Vinyals Google vinyals@google.com Lukasz Kaiser Google lukaszkaiser@google.com Terry Koo Google terrykoo@google.com Slav Petrov Google slav@google.com Ilya Sutskever Google ilyasu@google.com Geoffrey Hinton Google geoffhinton@google.com Abstract Synta

2y ago

445 Views

Attention is All you Need - NIPS

Google Brain avaswani@google.com Noam Shazeer Google Brain noam@google.com Niki Parmar Google Research nikip@google.com Jakob Uszkoreit Google Research usz@google.com Llion Jones Google Research llion@google.com Aidan N. Gomezy University of Toronto aidan@cs.toronto.edu Łukasz Kaiser Google Brain lukaszkaiser@google.com Illia Polosukhinz illia .

1y ago

303 Views

GSA Implementation of Google (G) Suite

Google Meet Classic Hangouts Google Chat Google Calendar Google Drive and Shared Drive Google Docs Google Sheets Google Slides Google Forms Google Sites Google Keep Apps Script D

2y ago

316 Views

Google Drive (Google Docs, Google Sheets, Google Slides)

Google Drive (Google Docs, Google Sheets, Google Slides) Employees are automatically issued a Kyrene Google account. Navigate to drive.google.com. Use Kyrene email address and network password to login. Launch in Chrome browser for best experience. Google Drive is a cloud storage sys

2y ago

388 Views

Quick Guide of Using Google Home to Control Smart Devices

Configuration needs Google Home app. Search "Google Home" in App Store or Google Play to install the app. 3.1 Set up Google Home with Google Home app You can skip this part if your Google Home is already set up. 1. Make sure your Google Home is energized. 2. Open the Google Home app by tapping the app icon on your mobile device. 3.

1y ago

326 Views

Elaboração de Provas Online usando o Formulário Google Docs

2 Após o login acesse o Google Drive ou o Google Docs e selecione a ferramenta Google Forms (Formulários). Clique na caixa de Ferramentas do Google, localizada no canto direito superior da tela e selecione o Google Drive. Na tela do Google Drive clique em New , opção More e selecione Google Forms. OBS: É possível acessar o google

11m ago

123 Views

ACS WASC Templates

File upload, Folder upload, Google Docs, Google Sheets, or Google Slides. You can also create Google Forms, Google Drawings, Google My Maps, etc. Share with exactly who you want — without email attachments. Search or sort your list of files, folders, and Google Docs. Preview files and Google Docs.

2y ago

366 Views

Google Drive - San Bernardino City Unified School District

Google Apps All of the Google applications that are available upon logging into Google.com (G , Gmail, Gphotos, Gdrive, etc.). Google Suite Google’s online cloud based office companion applications (Docs, Sheets, Slides). Google Drive Google’s online cloud storage and file sharing/collaboration application.

2y ago

378 Views

Single Sign On for Google Apps with NetScaler Unified Gateway

Google Apps for Work is a suite of cloud computing productivity and collaboration applications provided by Google on a subscription basis. It includes Google’s popular web applications including Gmail, Google Drive, Google Hangouts, Google Calendar and Google

2y ago

295 Views

Serviceteil

Google 84, 87, 124 Google 110 Google AdWords 101, 103 Google Alerts 127 Google Analytics 89 Google Maps 100, 110, 173 Google-Maps 63 Google Places 100, 103, 124 Graphiken 66 H Haftung 170 Haftungsausschluss 72 Hausfarbe 11 Headline 35 Heilmittelwerbegesetz 14, 69, 163 Heilversprechen 164 HONcode 78 HTML 58 HWG 31 I Imagefilm 31

2y ago

336 Views

Best practices for managing identities when you move to Google Cloud

Google Cloud. To provide t he informat ion an organizat ion would ne e d to transfer data and ownership from one Google Account to anot her for s ome of t he noncore Google s er vice s, such as Google Ads, Google Analyt ics, or DV360. Intende d audience Organizat ion administrators. Sta planning Google Cloud / Google Wor kspace migrat ion. Key .

1y ago

481 Views

MANAGERIAL FINANCE - GBV

of Managerial Finance page 2 Introduction to Managerial Finance 1 Starbucks—A Taste for Growth page 3 1.1 Finance and Business What Is Finance? 4 Major Areas and Opportunities in Finance 4 Legal Forms of Business Organization 5 Why Study Managerial Finance? Review Questions 9 1.2 The Managerial Finance Function 9 Organization of the Finance

3y ago

6.8K Views

Chapter 1 The roles of finance function in organisations

The roles of the finance function in organisations 4. The role of ethics in the role of the finance function Ethics is the system of moral principles that examines the concept of right and wrong. Ethics underpins an organisation’s sustained value creation. The roles that the finance function performs should be carried out in an .File Size: 888KBPage Count: 10Explore furtherRole of the Finance Function in the Financial Management .www.managementstudyguide.c Roles and Responsibilities of a Finance Department in a .www.pharmapproach.comRoles and Responsibilities of a Finance Department .www.smythecpa.comTop 10 – Functions of Business Finance in an om23 Functions and Duties of Accounting and Finance nded to you b

2y ago

335 Views

Introduction - Google Earth User Guide

Google Earth Community: Learn from other Google Earth users by asking questions and sharing answers on the Google Earth Community forums. Using Google Earth: This blog describes how you can use some of the interesting features of Google Earth. Selecting a Server Note: This section is relevant to Google Earth Pro and EC users.

3y ago

288 Views

Using Google Forms to Manage Officials Signups

Google Sheets, deleting a response from the form or sheet will not affect the other. Once the Google Form is linked to a Google Sheet, clicking on the spreadsheet icon will open the linked Google Sheet. Google Responses Sheet Google automatically creates and populates the sp

2y ago

276 Views

Kubernetes On NVIDIA GPUs

It looks like you're using an ad-blocker