DEEP LEARNING WITH GPUSMaxim Milakov, Senior HPC DevTechEngineer, NVIDIA
Convolutional NetworksTOPICSCOVEREDDeep LearningUse CasesGPUscuDNN2
MACHINE LEARNINGTrainingTrain the model fromsupervised dataClassification (inference)Run the new sample throughthe model to predict itsclass/function valueSamplesTrainingModelLabelsSamplesModelLabels3
CONVOLUTIONAL NETWORKSLocal Receptive FieldsNeurophysiologistsDavid Hubel andTorsten Wiesel,1962“Receptive fields, binocular interaction and functional architecture in the cat's visual cortex”, Journal of Physiology (London), 19624
CONVOLUTIONAL NETWORKSNeocognitron: shared weightsKunihiko Fukushima, 1980“Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position”, Biological Cybernetics, 19805
CONVOLUTIONAL NETWORKSTraining DNN with BackpropagationYann LeCun et al, 1998MNIST: 0.7% error rate“Gradient-Based Learning Applied to Document Recognition”, Proceedings of the IEEE 19986
High need for computational resourcesLow ConvNet adoption rate until 20107
USE CASESGTSRB: Traffic sign recognitionThe German Traffic SignRecognition Benchmark,2011Rank TeamError rateModelCNNs, trained using GPUs1IDSIA, Dan Ciresan0.56%2Human1.16%3NYU, Pierre Sermanet1.69%CNNs4CAOR, Fatin Zaklouta3.86%Random Forestshttp://benchmark.ini.rub.de/?section gtsrb8
USE CASESImageNet: natural image classificationAlex Krizhevsky et al, 20121.2M training images, 1000classesScored 15.3% Top-5 errorrate with 26.2% for thesecond-best entry forclassification taskCNNs trained with GPUshttp://www.image-net.org/challenges/LSVRC/9
USE CASESImageNet: results for %201020112012201310.90.80.70.60.50.40.30.20.10% Teams using GPUsTop-5 error201410
USE CASESDogs vs. Cats: Transfer LearningDogs vs. Cats, 2014Train model on one dataset – ImageNetRe-train the last layer only on a newdataset – Dogs and CatsRank1TeamError rateModelPierre Sermanet1.1%CNNs, model transferred from ImageNet1.9%CNN, model trained on Dogs vs. Cat dataset only 5Maxim Milakovhttps://www.kaggle.com/c/dogs-vs-cats11
USE CASESSpeech recognitionAcoustic model is DNNUsually fully-connected layersSome try using convolutional layers withspectrogram used as inputAcousticsignalAcousticModelLikelihood ofphonetic unitsBoth fit GPU perfectlyLanguage model is weighted Finite StateTransducer (wFST)Beam search runs fast on ageModelMost likelyword sequence12
It is all about supercomputing,right?13
GPUTesla K40 and Tegra K1NVIDIA Tesla K40NVIDIA Jetson TK1CUDA cores2880192Peak performance, SP4.29 Tflops326 GflopsPeak power consumption235 Wt 10 Wt, for the whole boardDeep Learning tasksTraining, InferenceInference, Online Traininghttp://www.nvidia.com/tesla http://www.nvidia.com/jetson-tk1 elopment-platform.html14
USE CASESPedestrian detection on Jetson TK1Ikuro Sato, Hideki Niihara,R&D Group, Denso ITLaboratory, Inc.Real-time pedestriandetection with depth,height, and body orks-automotive-safety.pdf15
How do we run DNNs on GPUs?16
CUDNNcuDNN (and cuBLAS)Library for DNN toolkit developer and researchersContains building blocks for DNN toolkitsConvolutions, pooling, activation functions e t.c.Best performance, easiest to deploy, future proofingJetson TK1 support coming soon!developer.nvidia.com/cuDNNcuBLAS (SGEMM for fully-connected layers) is part of CUDA toolkit,developer.nvidia.com/cuda-toolkit17
CUDNNFrameworkscuDNN is already integrated in major open-source frameworksCaffe - caffe.berkeleyvision.orgTorch - torch.chTheano - deeplearning.net/software/theano/index.html, already hasGPU support, cuDNN support coming soon!18
REFERENCESHPC by NVIDIA: www.nvidia.com/teslaJetson TK1 Development Kit: www.nvidia.com/jetson-tk1Jetson Pro: -platform.htmlCUDA Zone: developer.nvidia.com/cuda-zoneParallel Forall blog: devblogs.nvidia.com/parallelforallContact me: mmilakov@nvidia.com19
DEEP LEARNING WITH GPUS Maxim Milakov, Senior HPC DevTech Engineer, NVIDIA. 2 Convolutional Networks Deep Learning Use Cases GPUs cuDNN TOPICS COVERED. 3 MACHINE LEARNING Training Train the model from supervised data Classification (inference) Run the new sample through the model to predict its class/function value Training Model
M5. The Cisco UCS C240 M5 Rack Server can host up to four NVIDIA T4 Tensor Core GPUs for AI inferencing, or up to two NVIDIA Tesla V100 Tensor Core GPUs for training workloads. The compact, 1RU Cisco UCS C220 M5 Rack Server can host up to two NVIDIA T4 Tensor Core GPUs. NetApp ONTAP. The ONTAP software built into
Accelerating Ansys Fluent Using NVIDIA GPUs Accelerating ANSYS Fluent 15.0 Using NVIDIA GPUs DA-07311-001_v01 9 3. CHANGING AMGX CONFIGURATION In ANSYS Fluent 15.0, the Algebraic Multigrid (AMG) linear system solver used on the CPU is different from that used on the GPU. In the latter case, the AmgX library is used to perform the
2 / 44 Contents Motivation Recent Media Articles Nvidia launched their RTX GPUs CPU vs GPUs GPU Architecture Basics GPU Programming Model: CUDA Game AI on GPUs? Investigating Common AI Techniques Neural Networks and Deep Learning Nvidia’s RTX Architecture Real-Time Rendering now relies on AI!? Selected Topics of A
NVIDIA virtual GPU products deliver a GPU Experience to every Virtual Desktop. Server. Hypervisor. Apps and VMs. NVIDIA Graphics Drivers. NVIDIA Virtual GPU. NVIDIA Tesla GPU. NVIDIA virtualization software. CPU Only VDI. With NVIDIA Virtu
Kubernetes is an open-source platform for automating deployment, scaling and managing containerized applications. Kubernetes on NVIDIA GPUs includes support for GPUs and enhancements to Kubernetes so users can easily configure and use GPU resources for accelerating w
NVIDIA GRID K2 1 Number of users depends on software solution, workload, and screen resolution NVIDIA GRID K1 GPU 4 Kepler GPUs 2 High End Kepler GPUs CUDA cores 768 (192 / GPU) 3072 (1536 / GPU) Memory Size 16GB DDR3 (4GB / GPU) 8GB GDDR5 Max Power 130 W 225 W Form Factor Dual Slot ATX, 10.5” Dual Slot ATX,
NVIDIA Fabric Manager (FM) configures the NVSwitch memory fabrics to form a single memory fabric among all participating GPUs and monitors the NVLinks that support the fabric. At a high level, Fabric Manager has the following responsibilities . install the c ompatible Driver for NVIDIA Data Center GPUs be
Abrasive Jet machining can be employed for machining super alloys and refractory from materials. This process is based on surface erosion process. The process parameters that control metal removal rate are air quality and pressure, Abrasive grain size, nozzle material, nozzle diameter, stand of distance between nozzle tip and work surface. INTRODUCTION: Abrasives are costly but the abrasive .