Deep Learning With Gpus - Nvidia

1y ago
7 Views
1 Downloads
1.43 MB
19 Pages
Last View : 4d ago
Last Download : 3m ago
Upload by : Annika Witter
Transcription

DEEP LEARNING WITH GPUSMaxim Milakov, Senior HPC DevTechEngineer, NVIDIA

Convolutional NetworksTOPICSCOVEREDDeep LearningUse CasesGPUscuDNN2

MACHINE LEARNINGTrainingTrain the model fromsupervised dataClassification (inference)Run the new sample throughthe model to predict itsclass/function valueSamplesTrainingModelLabelsSamplesModelLabels3

CONVOLUTIONAL NETWORKSLocal Receptive FieldsNeurophysiologistsDavid Hubel andTorsten Wiesel,1962“Receptive fields, binocular interaction and functional architecture in the cat's visual cortex”, Journal of Physiology (London), 19624

CONVOLUTIONAL NETWORKSNeocognitron: shared weightsKunihiko Fukushima, 1980“Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position”, Biological Cybernetics, 19805

CONVOLUTIONAL NETWORKSTraining DNN with BackpropagationYann LeCun et al, 1998MNIST: 0.7% error rate“Gradient-Based Learning Applied to Document Recognition”, Proceedings of the IEEE 19986

High need for computational resourcesLow ConvNet adoption rate until 20107

USE CASESGTSRB: Traffic sign recognitionThe German Traffic SignRecognition Benchmark,2011Rank TeamError rateModelCNNs, trained using GPUs1IDSIA, Dan Ciresan0.56%2Human1.16%3NYU, Pierre Sermanet1.69%CNNs4CAOR, Fatin Zaklouta3.86%Random Forestshttp://benchmark.ini.rub.de/?section gtsrb8

USE CASESImageNet: natural image classificationAlex Krizhevsky et al, 20121.2M training images, 1000classesScored 15.3% Top-5 errorrate with 26.2% for thesecond-best entry forclassification taskCNNs trained with GPUshttp://www.image-net.org/challenges/LSVRC/9

USE CASESImageNet: results for %201020112012201310.90.80.70.60.50.40.30.20.10% Teams using GPUsTop-5 error201410

USE CASESDogs vs. Cats: Transfer LearningDogs vs. Cats, 2014Train model on one dataset – ImageNetRe-train the last layer only on a newdataset – Dogs and CatsRank1TeamError rateModelPierre Sermanet1.1%CNNs, model transferred from ImageNet1.9%CNN, model trained on Dogs vs. Cat dataset only 5Maxim Milakovhttps://www.kaggle.com/c/dogs-vs-cats11

USE CASESSpeech recognitionAcoustic model is DNNUsually fully-connected layersSome try using convolutional layers withspectrogram used as inputAcousticsignalAcousticModelLikelihood ofphonetic unitsBoth fit GPU perfectlyLanguage model is weighted Finite StateTransducer (wFST)Beam search runs fast on ageModelMost likelyword sequence12

It is all about supercomputing,right?13

GPUTesla K40 and Tegra K1NVIDIA Tesla K40NVIDIA Jetson TK1CUDA cores2880192Peak performance, SP4.29 Tflops326 GflopsPeak power consumption235 Wt 10 Wt, for the whole boardDeep Learning tasksTraining, InferenceInference, Online Traininghttp://www.nvidia.com/tesla http://www.nvidia.com/jetson-tk1 elopment-platform.html14

USE CASESPedestrian detection on Jetson TK1Ikuro Sato, Hideki Niihara,R&D Group, Denso ITLaboratory, Inc.Real-time pedestriandetection with depth,height, and body orks-automotive-safety.pdf15

How do we run DNNs on GPUs?16

CUDNNcuDNN (and cuBLAS)Library for DNN toolkit developer and researchersContains building blocks for DNN toolkitsConvolutions, pooling, activation functions e t.c.Best performance, easiest to deploy, future proofingJetson TK1 support coming soon!developer.nvidia.com/cuDNNcuBLAS (SGEMM for fully-connected layers) is part of CUDA toolkit,developer.nvidia.com/cuda-toolkit17

CUDNNFrameworkscuDNN is already integrated in major open-source frameworksCaffe - caffe.berkeleyvision.orgTorch - torch.chTheano - deeplearning.net/software/theano/index.html, already hasGPU support, cuDNN support coming soon!18

REFERENCESHPC by NVIDIA: www.nvidia.com/teslaJetson TK1 Development Kit: www.nvidia.com/jetson-tk1Jetson Pro: -platform.htmlCUDA Zone: developer.nvidia.com/cuda-zoneParallel Forall blog: devblogs.nvidia.com/parallelforallContact me: mmilakov@nvidia.com19

DEEP LEARNING WITH GPUS Maxim Milakov, Senior HPC DevTech Engineer, NVIDIA. 2 Convolutional Networks Deep Learning Use Cases GPUs cuDNN TOPICS COVERED. 3 MACHINE LEARNING Training Train the model from supervised data Classification (inference) Run the new sample through the model to predict its class/function value Training Model

Related Documents:

M5. The Cisco UCS C240 M5 Rack Server can host up to four NVIDIA T4 Tensor Core GPUs for AI inferencing, or up to two NVIDIA Tesla V100 Tensor Core GPUs for training workloads. The compact, 1RU Cisco UCS C220 M5 Rack Server can host up to two NVIDIA T4 Tensor Core GPUs. NetApp ONTAP. The ONTAP software built into

Accelerating Ansys Fluent Using NVIDIA GPUs Accelerating ANSYS Fluent 15.0 Using NVIDIA GPUs DA-07311-001_v01 9 3. CHANGING AMGX CONFIGURATION In ANSYS Fluent 15.0, the Algebraic Multigrid (AMG) linear system solver used on the CPU is different from that used on the GPU. In the latter case, the AmgX library is used to perform the

2 / 44 Contents Motivation Recent Media Articles Nvidia launched their RTX GPUs CPU vs GPUs GPU Architecture Basics GPU Programming Model: CUDA Game AI on GPUs? Investigating Common AI Techniques Neural Networks and Deep Learning Nvidia’s RTX Architecture Real-Time Rendering now relies on AI!? Selected Topics of A

NVIDIA virtual GPU products deliver a GPU Experience to every Virtual Desktop. Server. Hypervisor. Apps and VMs. NVIDIA Graphics Drivers. NVIDIA Virtual GPU. NVIDIA Tesla GPU. NVIDIA virtualization software. CPU Only VDI. With NVIDIA Virtu

Kubernetes is an open-source platform for automating deployment, scaling and managing containerized applications. Kubernetes on NVIDIA GPUs includes support for GPUs and enhancements to Kubernetes so users can easily configure and use GPU resources for accelerating w

NVIDIA GRID K2 1 Number of users depends on software solution, workload, and screen resolution NVIDIA GRID K1 GPU 4 Kepler GPUs 2 High End Kepler GPUs CUDA cores 768 (192 / GPU) 3072 (1536 / GPU) Memory Size 16GB DDR3 (4GB / GPU) 8GB GDDR5 Max Power 130 W 225 W Form Factor Dual Slot ATX, 10.5” Dual Slot ATX,

NVIDIA Fabric Manager (FM) configures the NVSwitch memory fabrics to form a single memory fabric among all participating GPUs and monitors the NVLinks that support the fabric. At a high level, Fabric Manager has the following responsibilities . install the c ompatible Driver for NVIDIA Data Center GPUs be

Abrasive Jet machining can be employed for machining super alloys and refractory from materials. This process is based on surface erosion process. The process parameters that control metal removal rate are air quality and pressure, Abrasive grain size, nozzle material, nozzle diameter, stand of distance between nozzle tip and work surface. INTRODUCTION: Abrasives are costly but the abrasive .