Introduction To GPU Computing

3y ago

34 Views

4 Downloads

3.03 MB

60 Pages

Last View : 27d ago

Last Download : 3m ago

Upload by : Maxton Kershaw

Report this link

Download PDF

Transcription

INTRODUCTION TO GPU COMPUTING

Add GPUs: Accelerate Science ApplicationsCPUGPU2

ACCELERATED COMPUTING IS GROWING RAPIDLY11x GPU Developers450 Applications Accelerated500Available Everywhere 730M485450400350CUDA Enabled GPUs300250615,00020015010045,0005002011 2012 2013 2014 2015 2016 201720122017 2200Universities Teaching CUDA3

SMALL CHANGES, BIG SPEED-UPApplication CodeCompute-Intensive FunctionsGPURest of SequentialCPU Code5% of Code CPU4

3 WAYS TO ACCELERATE y AccelerateApplicationsMaximumFlexibility5

3 WAYS TO ACCELERATE y AccelerateApplicationsMaximumFlexibility6

LIBRARIES: EASY, HIGH-QUALITY ACCELERATIONEASE OF USE Using libraries enables GPU acceleration without in-depthknowledge of GPU programming“DROP-IN” Many GPU-accelerated libraries follow standard APIs, thusenabling acceleration with minimal code changesQUALITY Libraries offer high-quality implementations of functionsencountered in a broad range of applicationsPERFORMANCE NVIDIA libraries are tuned by experts7

GPU ACCELERATED LIBRARIES“Drop-in” Acceleration for Your ApplicationsDEEP LEARNINGcuDNNTensorRTSIGNAL, IMAGE & VIDEODeepStream SDKLINEAR ALGEBRAcuBLAScuSPARSECUDAMath librarycuFFTNVIDIA NPPCODEC SDKPARALLEL ALGORITHMScuSOLVERnvGRAPHNCCLcuRAND8

3 STEPS TO CUDA-ACCELERATED APPLICATIONStep 1: Substitute library calls with equivalent CUDA library callssaxpy ( )cublasSaxpy ( )Step 2: Manage data locality- with CUDA:- with CUBLAS:cudaMalloc(), cudaMemcpy(), etc.cublasAlloc(), cublasSetVector(), etc.Step 3: Rebuild and link the CUDA-accelerated librarygcc myobj.o –l cublas9

DROP-IN ACCELERATION (STEP 1)int N 1 20;// Perform SAXPY on 1M elements: y[] a*x[] y[]saxpy(N, 2.0, d x, 1, d y, 1);10

DROP-IN ACCELERATION (STEP 1)int N 1 20;// Perform SAXPY on 1M elements: d y[] a*d x[] d y[]cublasSaxpy(N, 2.0, d x, 1, d y, 1);Add “cublas” prefixand use device variables11

DROP-IN ACCELERATION (STEP 2)int N 1 20;cublasInit();Initialize cuBLAS// Perform SAXPY on 1M elements: d y[] a*d x[] d y[]cublasSaxpy(N, 2.0, d x, 1, d y, 1);cublasShutdown();Shut down cuBLAS12

DROP-IN ACCELERATION (STEP 3)int N 1 20;cublasInit();cublasAlloc(N, sizeof(float), (void**)&d x);cublasAlloc(N, sizeof(float), (void*)&d y);Allocate device vectors// Perform SAXPY on 1M elements: d y[] a*d x[] d y[]cublasSaxpy(N, 2.0, d x, 1, d y, 1);cublasFree(d x);cublasFree(d y);cublasShutdown();Deallocate device vectors13

DROP-IN ACCELERATION (STEP 4)int N 1 20;cublasInit();cublasAlloc(N, sizeof(float), (void**)&d x);cublasAlloc(N, sizeof(float), (void*)&d y);cublasSetVector(N, sizeof(x[0]), x, 1, d x, 1);cublasSetVector(N, sizeof(y[0]), y, 1, d y, 1);Transfer data to GPU// Perform SAXPY on 1M elements: d y[] a*d x[] d y[]cublasSaxpy(N, 2.0, d x, 1, d y, 1);cublasGetVector(N, sizeof(y[0]), d y, 1, y, 1);Read data back GPUcublasFree(d x);cublasFree(d y);cublasShutdown();14

ACCELERATING OCTAVEScientific Programming LanguageMathematics-oriented syntaxDrop-in compatible with many MATLAB scriptsBuilt-in plotting and visualization toolsRuns on GNU/Linux, macOS, BSD, and WindowsFree SoftwareSource: http://www.gnu.org/software/octave/15

NVBLASDrop-in GPU AccelerationRoutinesTypesOperationgemmS,D,C,Z Multiplication of 2 matricessyrkS,D,C,Z Symmetric rank-k updateherkC,Zsyr2kS,D,C,Z Symmetric rank-2k pdateher2kC,ZtrsmS,D,C,Z Triangular solve, mult right-handtrmmS,D,C,Z Triangular matrix-matrix multsymmS,D,C,Z Symmetric matrix-matrix multhemmC,ZHermitian rank-k updateHemitian rank-2k updateHermitian matrix-matrix mult16

PERFORMANCE COMPARISONCPU (openblas) vs GPU (NVBLAS)Dell C4130 128 GB 36-core, E5-2697 v4 @ 2.30GHz 4x NVIDIA Tesla P100-SXM2 NVLINKSGEMM (GFLOPS)DGEMM 40060030040020020010000N 2048CPUN 4096GPU with NVBLAS libraryN 8192N 2048CPUN 4096N 8192GPU with NVBLAS library17

3 WAYS TO ACCELERATE y AccelerateApplicationsMaximumFlexibility18

OpenACC is a directivesbased programming approachto parallelcomputingdesigned for performanceand portability on CPUsAdd Simple Compiler Directivemain(){ serial code #pragma acc kernels{ parallel code }}and GPUs for HPC.19

TOP HPC APPS ADOPTING OPENACCOpenACC – Performance Portability And Ease of ProgrammingANSYS FluentANSYS Fluent R18.0 Radiation SolverVASPGaussian3 of Top 10 ORB5ORNL55ORNLCAAR CAARCodes5 CSCS CodesCodes5 CSCS CodesCPU: (Haswell EP) Intel(R) Xeon(R) CPU E5-2695 v3 @2.30GHz, 2 sockets, 28 coresGPU: Tesla K80 12 12 GB, Driver 346.4620

CFD12X speedupin 1 weekMedicalImaging10X faster kernels2X faster app40 days to2 hours3X speedupNekCEMComputationalElectromagnetics2.5X speedup60% less DAstrophysics40X speedup3X energy efficiency4X speedupSingle CPU/GPU code4.4X speedup4 weeks effort21

2 BASIC STEPS TO GET STARTEDStep 1: Annotatesource code with directives:! acc data copy(util1,util2,util3) copyin(ip,scp2,scp2i)! acc parallel loop ! acc end parallel! acc end dataStep 2: Compile& run:pgf90 -ta nvidia -Minfo accel file.f22

OpenACC DIRECTIVES EXAMPLE! acc data copy(A,Anew)iter 0do while ( err tol .and. iter iter max )Copy arrays into GPU memorywithin data regioniter iter 1err 0. fp kind! acc kernelsdo j 1,mdo i 1,nAnew(i,j) .25 fp kind *( A(i 1,j ) A(i-1,j ) & A(i ,j-1) A(i ,j 1))err max( err, Anew(i,j)-A(i,j))end doend do! acc end kernelsIF(mod(iter,100) 0 .or. iter 1)A Anewend do! acc end dataParallelize code inside regionClose off parallel regionprint *, iter, errClose off data region,copy data back23

HETEROGENEOUS ARCHITECTURESUnified MemoryGPU 0GPU 1GPU 2GPU 0MEMGPU 1MEMGPU 2MEMCPUSYS MEM24

OPENACC FOR EVERYONENew PGI Community Edition Now AvailableFREEPROGRAMMING MODELSOpenACC, CUDA Fortran, OpenMP,C/C /Fortran Compilers and ToolsPLATFORMSX86, OpenPOWER, NVIDIA GPUUPDATES1-2 times a year6-9 times a year6-9 times a yearSUPPORTUser ForumsPGI SupportPGI ite25

RESOURCESFREE CompilerSuccess storiesGuidesTutorialsVideosCoursesCode SamplesTalksBooks SpecificationTeaching MaterialsSlack&StackOverflowSuccess stories: https://www.openacc.org/success-storiesResources: https://www.openacc.org/resourcesFree Compiler: https://www.pgroup.com/products/community.htm26

CUDA PROGRAMMING LANGUAGES27

GPU PROGRAMMING LANGUAGESNumerical analyticsMATLAB, Mathematica, LabVIEW, OctaveFortranCUDA Fortran, OpenACCC, C CUDA C , OpenACCPythonCUDA Python, PyCUDA, Numba, PyCulibC#OtherAltimesh Hybridizer, Alea GPUR, Julia28

CUDA CStandard C CodeParallel C Code}globalvoid saxpy parallel(int n,float a,float *x,float *y){int i blockIdx.x*blockDim.x threadIdx.x;if (i n) y[i] a*x[i] y[i];}// Perform SAXPY on 1M elementssaxpy serial(4096*256, 2.0, x, y);// Perform SAXPY on 1M elementssaxpy parallel 4096,256 (n,2.0,x,y);void saxpy serial(int n,float a,float *x,float *y){for (int i 0; i n; i)y[i] a*x[i] y[i];http://developer.nvidia.com/cuda-toolkit29

CUDA C : DEVELOP GENERIC PARALLEL CODECUDA C features enablesophisticated and flexibleapplications and middlewareClass hierarchiesdevice methodsTemplatesOperator overloadingFunctors (function objects)Device-side new/deleteMore http://developer.nvidia.com/cuda-toolkittemplate typename T struct Functor {device Functor( a) : a( a) {}device T operator(T x) { return a*x; }T a;}template typename T, typename Oper global void kernel(T *output, int n) {Oper op(3.7);output new T[n]; // dynamic allocationint i blockIdx.x*blockDim.x threadIdx.x;if (i n)output[i] op(i); // apply functor}30

CUDA FORTRAN Program GPU using Fortran Key language for HPC Simple language extensions Kernel functions Thread / block IDs Device & datamanagement Parallel loop directives Familiar syntax Use allocate, deallocate Copy CPU-to-GPU withassignment ( )http://developer.nvidia.com/cuda-fortranmodule mymodule containsattributes(global) subroutine saxpy(n,a,x,y)real :: x(:), y(:), a,integer n, iattributes(value) :: a, ni threadIdx%x (blockIdx%x-1)*blockDim%xif (i n) y(i) a*x(i) y(i);end subroutine saxpyend module mymoduleprogram mainuse cudafor; use mymodulereal, device :: x d(2**20), y d(2**20)x d 1.0; y d 2.0call saxpy 4096,256 (2**20,3.0,x d,y d,)y y dwrite(*,*) 'max error ', maxval(abs(y-5.0))end program main31

PYTHON Numba, a just-in-timecompiler for Pythonfunctions (open-source!)import numpy as npfrom numba import vectorize Numba runs inside thestandard Pythoninterpreter@vectorize(['float32(float32, float32)'], target 'cuda')def Add(a, b):return a b Can compile for GPU orCPU!#NABC Includes PyculibInitialize arrays 100000 np.ones(N, dtype np.float32) np.ones(A.shape, dtype A.dtype) np.empty like(A, dtype A.dtype)# Add arrays on GPUC Add(A, B)NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.32

import numpy as np from numba import vectorize @vectorize(['float32(float32, float32)'], target 'cuda') def Add(a, b): return a b # Initialize arrays N 100000 A np.ones(N, dtype np.float32) Bimport numpy as np from numba import vectorize @vectorize(['float32(float32, float32)'], target 'cuda') def Add(a, b): return a b # Initialize arrays N 100000 A np.ones(N, dtype np.float32)PYTHON - PYCULIB Python interface to CUDAlibraries: cuBLAS (dense linearalgebra), cuFFT (FastFourier Transform), andcuRAND (random numbergeneration) Code generates a millionuniformly distributedrandom numbers on theGPU using the “XORWOW”pseudorandom numbergeneratorimport numpy as npfrom pyculib import rand as curandprng curand.PRNG(rndtype curand.PRNG.XORWOW)rand np.empty(100000)prng.uniform(rand)print rand[:10]NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.33

JULIA Up and coming scientific language Cross between Python and Matlab Interpreter (like Python) –or- compiled (like C/Fortran) New approach to multi-processing/multi-node Or use MPI Easy to combine it with other languages Works with Jupyter Notebooks!34

import numpy as np from numba import vectorize @vectorize(['float32(float32, float32)'], target 'cuda') def Add(a, b): return a b # Initialize arrays N 100000 A np.ones(N, dtype np.float32) Bimport numpy as np from numba import vectorize @vectorize(['float32(float32, float32)'], target 'cuda') def Add(a, b): return a b # Initialize arrays N 100000 A np.ones(N, dtype np.float32)JULIA – SIMPLE EXAMPLE Simple matrix multiplicationexample (integers) Double precision (Int64) Can also do elementwisemultiplication (just likeMatlab) A .* Bjulia A [1 2 ; 3 4]2x2 Array{Int64,2}:1 23 4julia B [10 11 ; 12 13]2x2 Array{Int64,2}:10 1112 13julia A * B2x2 Array{Int64,2}:34 3778 85NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.35

JULIA – GPU EXAMPLEusing CUDAdrv, CUDAnative Options: JuliaGPU (github) Native CUDA (new) GPUarrays Simple native GPU examplefunction kernel vadd(a, b, c)# from CUDAnative: (implicit) CuDeviceArray type,#and thread/block intrinsicsi (blockIdx().x-1) * blockDim().x threadIdx().xc[i] a[i] b[i]return nothingenddev CuDevice(0)ctx CuContext(dev)# generate some datalen 512a rand(Int, len)b rand(Int, len)# allocate & upload on the GPUd a CuArray(a)d b CuArray(b)d c similar(d a)# execute and fetch results@cuda (1,len) kernel vadd(d a, d b, d c)c Array(d c)# from CUDAnative.jlusing Base.Test@test c a bdestroy(ctx)NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.36

JULIA – GPU EXAMPLE GPUarrays example Convolutionusing GPUArrays, Colors, FileIO, ImageFilteringusing CLArraysusing GPUArrays: synchronize threadsimport GPUArrays: LocalMemoryusing CLArraysimg pg"));a CLArray(img);out similar(a);k c similar(img)####convolution!(a, out, k);Array(out)outc similar(img)copy!(outc, out)NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.37

R Very popular Statistics language Used heavily in Machine Learning gpuR package38

import numpy as np from numba import vectorize @vectorize(['float32(float32, float32)'], target 'cuda') def Add(a, b): return a b # Initialize arrays N 100000 A np.ones(N, dtype np.float32) Bimport numpy as np from numba import vectorize @vectorize(['float32(float32, float32)'], target 'cuda') def Add(a, b): return a b # Initialize arrays N 100000 A np.ones(N, dtype np.float32)R - GPUR gpuR package Simple integer addition oftwo vectors with 1,000valuesA B gpuAgpuBseq.int(from 0, to 999)seq.int(from 1000, to 1) - gpuVector(A) - gpuVector(B)C - A BgpuC - gpuA gpuBall(C gpuC)NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.39

MATLAB Native support for most operations/functions Next speaker will cover this 40

GET STARTED TODAYThese languages are supported on all CUDA-capable GPUs.You might already have a CUDA-capable GPU in your laptop or desktop PC!CUDA C/C CUDA ://developer.nvidia.com/how-to-cuda-pythonThrust C Template LibraryCUDA -opencl-support/41

THANK YOUdeveloper.nvidia.com42

SIX WAYS TO SAXPYProgramming Languages for GPU Computing43

SINGLE PRECISION ALPHA X PLUS Y (SAXPY)Part of Basic Linear Algebra Subroutines (BLAS) Library𝒛 𝛼𝒙 𝒚x, y, z : vector : scalarGPU SAXPY in multiple languages and librariesA menagerie* of possibilities, not a tutorial44*technically, a program chrestomathy: http://en.wikipedia.org/wiki/Chrestomathy

OpenACC COMPILER DIRECTIVESParallel C CodeParallel Fortran Codevoid saxpy(int n,float a,float *x,float *y){#pragma acc kernelsfor (int i 0; i n; i)y[i] a*x[i] y[i];}subroutine saxpy(n, a, x, y)real :: x(:), y(:), ainteger :: n, i! acc kernelsdo i 1,ny(i) a*x(i) y(i)enddo! acc end kernelsend subroutine saxpy.// Perform SAXPY on 1M elementssaxpy(1 20, 2.0, x, y);.! Perform SAXPY on 1M elementscall saxpy(2**20, 2.0, x d, y d).http://developer.nvidia.com/openacc or http://openacc.org45

cuBLAS LIBRARYSerial BLAS CodeParallel cuBLAS Codeint N 1 20;int N 1 20;.// Use your choice of blas librarycublasInit();cublasSetVector(N, sizeof(x[0]), x, 1, d x, 1);cublasSetVector(N, sizeof(y[0]), y, 1, d y, 1);// Perform SAXPY on 1M elementsblas saxpy(N, 2.0, x, 1, y, 1);// Perform SAXPY on 1M elementscublasSaxpy(N, 2.0, d x, 1, d y, 1);cublasGetVector(N, sizeof(y[0]), d y, 1, y, 1);cublasShutdown();You can also call cuBLAS from Fortran,C , Python, and other languageshttp://developer.nvidia.com/cublas46

Standard Cvoid saxpy(int n, float a,float *x, float *y){for (int i 0; i n; i)y[i] a*x[i] y[i];}CUDA CParallel Cglobalvoid saxpy(int n, float a,float *x, float *y){int i blockIdx.x*blockDim.x threadIdx.x;if (i n) y[i] a*x[i] y[i];}int N 1 20;int N 1 20;cudaMemcpy(d x, x, N, cudaMemcpyHostToDevice);cudaMemcpy(d y, y, N, cudaMemcpyHostToDevice);// Perform SAXPY on 1M elementssaxpy(N, 2.0, x, y);// Perform SAXPY on 1M elementssaxpy 4096,256 (N, 2.0, d x, d y);cudaMemcpy(y, d y, N, m/cuda-toolkit47

THRUST C TEMPLATE LIBRARYSerial C CodeParallel C Codewith STL and Boostint N 1 20;std::vector float x(N), y(N);int N 1 20;thrust::host vector float x(N), y(N);.thrust::device vector float d x x;thrust::device vector float d y y;// Perform SAXPY on 1M elementsstd::transform(x.begin(), x.end(),y.begin(), y.end(),2.0f * 1 2);www.boost.org/libs/lambda// Perform SAXPY on 1M elementsthrust::transform(d x.begin(), d x.end(),d y.begin(),d y.begin(),2.0f * 1 2)http://thrust.github.com48

CUDA FORTRANStandard FortranParallel Fortranmodule mymodule containssubroutine saxpy(n, a, x, y)real :: x(:), y(:), ainteger :: n, ido i 1,ny(i) a*x(i) y(i)enddoend subroutine saxpyend module mymodulemodule mymodule containsattributes(global) subroutine saxpy(n, a, x, y)real :: x(:), y(:), ainteger :: n, iattributes(value) :: a, ni threadIdx%x (blockIdx%x-1)*blockDim%xif (i n) y(i) a*x(i) y(i)end subroutine saxpyend module mymoduleprogram mainuse mymodulereal :: x(2**20), y(2**20)x 1.0, y 2.0! Perform SAXPY on 1M elementscall saxpy(2**20, 2.0, x, y)end program mainprogram mainuse cudafor; use mymodulereal, device :: x d(2**20), y d(2**20)x d 1.0, y d 2.0! Perform SAXPY on 1M elementscall saxpy 4096,256 (2**20, 2.0, x d, y d)end program mainhttp://developer.nvidia.com/cuda-fortran49

Standard PythonPYTHONNumba Parallel Pythonimport numpy as npimport numpy as npfrom numba import vectorizedef saxpy(a, x, y):return [a * xi yifor xi, yi in zip(x, y)]@vectorize(['float32(float32, float32,float32)'], target 'cuda')def saxpy(a, x, y):return a * x yx np.arange(2**20, dtype np.float32)y np.arange(2**20, dtype np.float32)cpu result saxpy(2.0, x, y)N 1048576#ABCInitialize arrays np.ones(N, dtype np.float32) np.ones(A.shape, dtype A.dtype) np.empty like(A, dtype A.dtype)# Add arrays onGPUC saxpy(2.0, X, Y)http://numpy.scipy.orghttps://numba.pydata.org50

ENABLING ENDLESS WAYS TO SAXPYCUDAC, C , FortranNew LanguageSupport Build front-ends for Java, Python, R, DSLs Target other processors like ARM, FPGA,GPUs, x86CUDA Compiler Contributed toOpen Source LLVMLLVM CompilerFor CUDANVIDIAGPUsx86CPUsNew ProcessorSupport51

GPU-ACCELERATED LIBRARIES52

cuBLASDense Linear Algebra on GPUsUp To 5x Faster DeepBench SGEMMThan CPUComplete BLAS Library Plus ExtensionsSupports all 152 standard routines for single, double, complex,and double complexSupports half-precision (FP16) and integer (INT8) matrixmultiplication operationsBatched routines for higher performance on small problem sizesHost and device-callable interfaceXT interface supports distributed computations across multipleGPUshttps://developer.nvidia.com/cublas CUDA 8 (cuBLAS 8.0.88); Driver 375.66; P100 (PCIe, 16GB, Base Clocks). ECC OFF Host System: Intel Xeon Broadwell Dual E5-2690v4 with Ubuntu 14.04.5 and 256GB DDR4memory MKL 2017.3, Compiler v17.0.4; FP32 Input, Output and Compute CPU system; Intel Xeon Broadwell Dual E5-2699v4 (Turbo Enabled) with Ubuntu 14.04.5 and256GB DDR4 memory53

cuFFTComplete Fast Fourier Transforms Library2x Faster Image & Signal Processing thanCUDA 8Complete Multi-Dimensional FFT Library“Drop-in” replacement for CPU FFTW libraryReal and complex, single- and double-precision data typesIncludes 1D, 2D and 3D batched transformsSupport for half-precision (FP16) data typesSupports flexible input and output data layoutsXT interface now supports up to 8 GPUsSpeed up Vs. CUDA 81D2D3D2.5x2.0x1.5x1.0x0.5x0.0x164163844194304Data Size* V100 and CUDA 9 (r384); Intel Xeon Broadwell, dual socket, E5-2698 v4@ 2.6GHz, 3.5GHzTurbo with Ubuntu 14.04.5 x86 64 with 128GB System Memory* P100 and CUDA 8 (r361); For cublas CUDA 8 (r361): Intel Xeon Haswell, single-socket, 16-coreE5-2698 v3@ 2.3GHz, 3.6GHz Turbo with CentOS 7.2 x86-64 with 128GB System Memoryhttps://developer.nvidia.com/cufft54

NPPNVIDIA Performance Primitives LibraryGPU-accelerated Building Blocks for Image, VideoProcessing & Computer VisionOver 2500 image, signal processing and computer visionroutinesColor transforms, geometric transforms, move operations,linear filters, image & signal statistics, image & signalarithmetic, building blocks, image segmentation, median filter,BGR/YUV conversion, 3D LUT c

Introduction to GPU Computing . CPU GPU Add GPUs: Accelerate Science Applications . Small Changes, Big Speed-up Application Code GPU Use GPU to Parallelize CPU Compute-Intensive Functions Rest of Sequential CPU Code . 3 Ways to Accelerate Applications Applications Libraries “Drop-in” Acceleration Programming

Related Documents:

THE GPU COMPUTING ERA - University of Wisconsin-Madison

the gpu computing era gpu computing is at a tipping point, becoming more widely used in demanding consumer applications and high-performance computing.this article describes the rapid evolution of gpu architectures—from graphics processors to massively parallel many-core multiprocessors, recent developments in gpu computing architectures, and how the enthusiastic

13 Views

1y ago

GPU Tutorial 1: Introduction to GPU Computing

GPU Tutorial 1: Introduction to GPU Computing Summary This tutorial introduces the concept of GPU computation. CUDA is employed as a framework for this, but the principles map to any vendor’s hardware. We provide an overview of GPU computation, its origins and development, before presenting both the CUDA hardware and software APIs. New Concepts

43 Views

3y ago

OpenCV on a GPU

OpenCV GPU header file Upload image from CPU to GPU memory Allocate a temp output image on the GPU Process images on the GPU Process images on the GPU Download image from GPU to CPU mem OpenCV CUDA example #include opencv2/opencv.hpp #include <

155 Views

2y ago

Introduction to GPU computing - Boston University

Introduction to GPU computing Felipe A. Cruz Nagasaki Advanced Computing Center Nagasaki University, Japan. Felipe A. Cruz Nagasaki University The GPU evolution The Graphic Processing Unit (GPU) is a processor that was specialized for processing graphics. The GPU has recently evolved towards a more ﬂexible architecture.

44 Views

3y ago

An Introduction to GPU Computing[2] (Read-Only)

GPU Computing in Matlab u Included in the Parallel Computing Toolbox. u Extremely easy to use. To create a variable that can be processed using the GPU, use the gpuArray function. u This function transfers the storage location of the argument to the GPU. Any functions which use this argument will then be computed by the GPU.

39 Views

3y ago

Take GPU processing power beyond graphics with GPU ...

limitation, GPU implementers made the pixel processor in the GPU programmable (via small programs called shaders). Over time, to handle increasing shader complexity, the GPU processing elements were redesigned to support more generalized mathematical, logic and flow control operations. Enabling GPU Computing: Introduction to OpenCL

66 Views

3y ago

Introduction to GPU computing for statisticicans

Will Landau (Iowa State University) Introduction to GPU computing for statisticicans September 16, 2013 20 / 32. Introduction to GPU computing for statisticicans Will Landau GPUs, parallelism, and why we care CUDA and our CUDA systems GPU computing with R CUDA and our CUDA systems Logging in

34 Views

3y ago

GPU Computing Advances in 3D Electromagnetic Simulation

Latest developments in GPU acceleration for 3D Full Wave Electromagnetic simulation. Current and future GPU developments at CST; detailed simulation results. Keywords: gpu acceleration; 3d full wave electromagnetic simulation, cst studio suite, mpi-gpu, gpu technology confere

32 Views

2y ago

Recent Views

Personal insurance - Car & Business insurance King Price Insurance

The king's insurance options 5 Things you need to know 7 The stuff you need to do 14 How to claim 16 Our commitment to you 20 Car insurance 22 Car warranty 37 Shortfall cover 45 Scratch and dent 46 Tyre and rim 48 Motorbike insurance 53 Trailer and caravan insurance 64 Watercraft insurance 68 Home contents insurance 77 Buildings insurance 89

1y ago

673 Views

Decision Tree Tutorial by Kardi Teknomo - TAN THIAM HUAT 陳添發

Male 1 Cheap Medium Bus Female 1 Cheap Medium Train Female 0 Cheap Low Bus Male 1 Cheap Medium Bus Male 0 Standard Medium Train Female 1 Standard Medium Train Female 1 Expensive High Car Male 2 Expensive Medium Car Female 2 Expensive High Car Based on above training data, we can induce a decision tree as the following:

10m ago

84 Views

-xglfldo:Dwfk Xjxvw Wkurxjk)2,

Affordable Care Act - insurance comparison, cheapest insurance, cheap health insurance NJ, cheapest insurance company Priority One High Volume - Washington state health insurance plans, affordable health insurance The best performing ad copy included those that made specific reference to finding "health insurance" for

1y ago

259 Views

Gold Tier - MAPFRE Insurance

Foy Insurance of MA, LLC 198 Frank Consolati Insurance Agency, Inc. 198 County Insurance Agency, Inc. 198 Woodrow W Cross Agency 214 Woodland Insurance Agency, Inc. 214 Tegeler Insurance Services of CT, Inc. 214 Pantano/VonKahle Insurance Agency, Inc. 214 . Hanson Insurance Agency, Inc. 287 J.H. Slattery Insurance Agency, Inc. 287

1y ago

565 Views

Consumer Guide to Auto Insurance - csimt.gov

consumer guide to auto insurance contents introduction to auto insurance 1 understanding your auto insurance policy 2 required auto insurance 3 optional types of auto insurance 4-5 getting the right coverage 6 accidents and violations 7 how to shop for auto insurance 8 shopping tips 9 frequently asked questions 10-11 insurance complaints/when you have a problem 12

2y ago

805 Views

Industry Observations Insurance Industry

Jun 30, 2019 · 6/17/2019 Commercial Insurance Branch of Extraco Banks, N.A. Higginbotham Insurance Group, Inc. Insurance Brokers NA 6/13/2019 Links Insurance Services, LLC World Insurance Associates LLC Property and Casualty Insurance NA 6/13/2019 Abram Interstate Insurance Services, Inc. Risk Placement Services,

2y ago

619 Views

Life Insurance Buyer's Guide Life Insurance - National Association of .

Life Insurance uers uide Naional ssociaion of Insurance Commissioners Compare the Different Types of Insurance Policies There are many types of life insurance pol-icies. You should choose a policy with fea-tures that fit your individual needs. Some things to consider are: Term Insurance vs. Cash Value In-surance. Term insurance is intended to

1y ago

520 Views

your guide to understanding auto ins in nh - New Hampshire

Hampshire Insurance Department does not mandate or set Auto Insurance Rates. Auto Insurance Rates will vary by insurance company. This guide is intended to give New Hampshire consumers basic information on auto insurance. It suggests ways to: Lower the cost of your auto insurance, shop for Auto insurance and, file an auto insurance claim.

1y ago

449 Views

18.01.41 - REPLACEMENT OF LIFE INSURANCE AND ANNUITIES - Idaho

Department of Insurance Replacement of Life Insurance and Annuities. Page 3. 04. Existing Life Insurance or Annuity. "Existing Life Insurance or Annuity" means any life insurance or annuity in force, including life insurance under a binding or conditional receipt or a lif e insurance policy or annuity that is within an unconditional refund period.

1y ago

407 Views

EXAMINATION REPORT OF THE ADMIRAL INSURANCE COMPANY AS OF . - Delaware

Berkley Regional Specialty Insurance Comp 31295 DE Carolina Casualty Insurance Company 10510 IA Clermont Insurance Company 33480 IA Continental Western Insurance Company 10804 IA Firemen's Insurance Com pany of Wash, D.C. 21784 DE Gemini Insurance Company 10833 DE Great Divide Insurance Company 25224 ND

1y ago

258 Views

American International Group, Inc. - Federal Reserve

American General Life Insurance Company AGL U.S. Life Insurance Company AGC Life Insurance Company AGC Life U.S. Life Insurance Company The United States Life Insurance Company in the City of New York U.S. Life U.S. Life Insurance Company The Variable Annuity Life Insurance Company VALIC U.S. Life Insurance Company

1y ago

269 Views

Japan's Insurance Market - Toa Re

with 61.6% of net premiums written, of which automobile insurance totaled 48.8% and compulsory automobile liability insurance totaled 12.8%. Fire insurance accounted for 13.7%, miscellaneous casualty insurance including liability insurance accounted for 11.6%, accident insurance accounted for 9.8%, and marine insurance accounted for 3.2%.

1y ago

179 Views

List of Insurance Companies by Insurance Manager - Cayman Islands dollar

2447 Batan Insurance Company SPC, Ltd. 29-Sep-03 1307714 BBG Insurance Services, Ltd. 09-Aug-16 1254 BCHS Insurance, Ltd. 07-Oct-98 1168 Bearacuda Re 01-Aug-97 2639 Bedrock Insurance Limited 24-Nov-05 2150 Bom Ambiente Insurance Company 14-Jun-00 2565 Boundless Insurance Company, Ltd. 01-Dec-04 769 Bucap Limited 03-Mar-89

1y ago

293 Views

Insurance Certificate 713705-3 and Assistance Program

Name of insurance product: Purchase Protection and Travel Insurance for National Bank of Canada Mastercard credit cards, group insurance policy no. 713705 (Schedule A Certificate number 3)/713705-3 Type of insurance product: Purchase insurance and extended warranty and travel insurance (group insurance) Assistance provider contact information

4m ago

54 Views

The End of Cheap Oil

78 Scientific American March 1998 The End of Cheap Oil The End of Cheap Oil . serves “proved” only if the oil lies near a producing well and there is “reason- . many P90 reserve estimates always un - derstates the amount of proved oil in a region. The only correct way to total

3y ago

153 Views

Introduction To GPU Computing

It looks like you're using an ad-blocker