COMP 605: Introduction To Parallel Computing Lecture : CUDA Shared Memory

1y ago

10 Views

2 Downloads

631.40 KB

16 Pages

Last View : 22d ago

Last Download : 3m ago

Upload by : Nixon Dill

Report this link

Download PDF

Transcription

COMP 605: Introduction to Parallel ComputingLecture : CUDA Shared MemoryMary ThomasDepartment of Computer ScienceComputational Science Research Center (CSRC)San Diego State University (SDSU)Posted: 04/25/17Last Update: 04/25/17

COMP 605:TopicPosted: 04/25/17Last Update: 04/25/172/16Mary ThomasTable of Contents1CUDA Shared Memory & Synchronization (K&H Ch5, S&K Ch5)Shared Memory Model example: dot product (S&K Ch5)

COMP 605: TopicPosted: 04/25/17Last Update: 04/25/17CUDA Shared Memory & Synchronization (K&H Ch5, S&K Ch5)CUDA SHMEM & Synchronization(S&K, Ch5.3, K&H Ch5 )3/16Mary Thomas

COMP 605: TopicPosted: 04/25/17Last Update: 04/25/17CUDA Shared Memory & Synchronization (K&H Ch5, S&K Ch5)The CUDA Memory ModelThe kernel is executed by a batch of threadsThreads are organized into a grid of thread blocks.Each thread has its own registers, no other threadcan access itthe kernel uses registers to store private thread dataShared memory: allocated to thread blocks promotes thread cooperationglobal memory: host/threads can read/writeconstant and texture memory: host/threads readonlythreads in same block can share memoryrequires synchronization – essentiallycommunicationExample: Dot product:(x1 , x2 , x3 , x4 ) · (y1 , y2 , y3 , y4 ) x1 y1 x2 y2 x3 y3 x4 y4Source: NVIDIA4/16Mary Thomas

COMP 605: TopicPosted: 04/25/17Last Update: 04/25/17CUDA Shared Memory & Synchronization (K&H Ch5, S&K Ch5)Programmer View of CUDA MemoriesEach thread can:Read/write per-thread registers ( 1cycle)Read/write per-block shared memory( 5 cycles)Read/write per-grid global memory( 500 cycles)Read/only per-grid constant memory( 5 cycles with caching)Source: NVIDIA5/16Mary Thomas

COMP 605: TopicPosted: 04/25/17Last Update: 04/25/17CUDA Shared Memory & Synchronization (K&H Ch5, S&K Ch5)6/16CUDA Variable Type Qualifiersdeviceis optional when used withshared , orconstantAutomatic variables without any qualifier reside in a registerExcept per-thread arrays that reside in global memoryAll threads have access to Global MemorySource: David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408/CS483/ECE498alMary Thomas

COMP 605: TopicPosted: 04/25/17Last Update: 04/25/17CUDA Shared Memory & Synchronization (K&H Ch5, S&K Ch5)7/16Mary ThomasA Common Programming StrategyGlobal memory resides in device memory (DRAM)Perform computation on device by tiling the input data to takeadvantage of fast shared memory:Partition data into subsets that fit into shared memoryHandle each data subset with one thread block:Loading the subset from global memory to shared memory, usingmultiple threads to exploit memory-level parallelismPerforming the computation on the subset from shared memory; eachthread can efficiently multi-pass over any data elementCopying results from shared memory to global memorySource: David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408/CS483/ECE498al

COMP 605: TopicPosted: 04/25/17Last Update: 04/25/17CUDA Shared Memory & Synchronization (K&H Ch5, S&K Ch5)CUDA Shared MemoryEach thread can:Compiler creates copy of var for eachblock launchedlow latency: var lives on GPU notoff-chip DRAMshared memory is more effective onper-block basisAll threads on a block have access tomemory, so require synchronization toavoid race conditions.8/16Mary Thomas

COMP 605: TopicPosted: 04/25/17Last Update: 04/25/17CUDA Shared Memory & Synchronization (K&H Ch5, S&K Ch5)9/16Mary ThomasInvocation Exampleglobal void MatrixMulKernel(float* M, float* N,float* P, int Width) {shared float subTileM[TILE WIDTH][TILE WIDTH];shared float subTileN[TILE WIDTH][TILE WIDTH];Source: David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408/CS483/ECE498al

COMP 605: TopicPosted: 04/25/17Last Update: 04/25/17CUDA Shared Memory & Synchronization (K&H Ch5, S&K Ch5)Shared Memory Model example: dot product (S&K Ch5)Shared Memory Model: dot.cu (S&K Ch5)10/16Mary Thomas

COMP 605: TopicPosted: 04/25/17Last Update: 04/25/17CUDA Shared Memory & Synchronization (K&H Ch5, S&K Ch5)Shared Memory Model example: dot product (S&K Ch5)11/16Mary ThomasVector Dot Product Y X Y cosθDot Product is: X onto VectorGeometric interpretation:length of the projection of Vector XPn Y X Y i 1 Ai B i A1 B1 A2 B2 · · · An Bn

COMP 605: TopicPosted: 04/25/17Last Update: 04/25/17CUDA Shared Memory & Synchronization (K&H Ch5, S&K Ch5)Shared Memory Model example: dot product (S&K Ch5)12/16Mary ThomasShared Memory Model: dot.cu (S&K 829303132Dot product is good example forshared memory and synchronizationEach thread multiplies a pair ofvector points (line 10)repeats for its chunk of work (line11)Stores its local sum into sharedmem cache entry (line 14)Synchronize threads (line 17)Reduction (lines 22-27): eachthread sums 2 entriesStore block data into global arr(line 29)global void dot( float *a, float *b, float *c ) {// buffer of shared memory - store sumshared float cache[threadsPerBlock];int tid threadIdx.x blockIdx.x * blockDim.x;int cacheIndex threadIdx.x;// each thread computes running sum of productfloattemp 0;while (tid N) {temp a[tid] * b[tid];tid blockDim.x * gridDim.x;}// set the cache values in the shared buffercache[cacheIndex] temp;// synchronize threads in this BLOCKsyncthreads();// for reductions, threadsPerBlock must be a power of 2// because of the following codeint i blockDim.x/2;while (i ! 0) {if (cacheIndex i)cache[cacheIndex] cache[cacheIndex i];syncthreads();i / 2;}if (cacheIndex 0)c[blockIdx.x] cache[0];}

COMP 605: TopicPosted: 04/25/17Last Update: 04/25/17CUDA Shared Memory & Synchronization (K&H Ch5, S&K Ch5)Shared Memory Model example: dot product (S&K Ch5)Reduction Operation// for reductions, threadsPerBlock must be a power of 2//int i blockDim.x/2;while (i ! 0) {if (cacheIndex i)cache[cacheIndex] cache[cacheIndex i];syncthreads();i / 2;}// only need one thread to write to global memoryif (cacheIndex 0)c[blockIdx.x] cache[0];}Source: NVIDIA13/16Mary Thomas

COMP 605: TopicPosted: 04/25/17Last Update: 04/25/17CUDA Shared Memory & Synchronization (K&H Ch5, S&K Ch5)Shared Memory Model example: dot product (S&K Ch5)14/16Mary ThomasShared Memory Model: dot.cu (S&K Ch5)Allocate host memory (lines11-13)Allocate device memory (lines15-18)Populate host arrays (lines22-24)Copy from host to device globmem (27, 29): only need arraysa and 30#include "./common/book.h"#define imin(a,b) (a b?a:b)const int N 33 * 1024;const int threadsPerBlock 256;const int blocksPerGrid imin( 32, (N threadsPerBlock-1)/threadsPerBlock );int main( void ) {float*a, *b, c, *partial c;float*dev a, *dev b, *dev partial c;// allocate memory on the cpu sidea (float*)malloc( N*sizeof(float) );b (float*)malloc( N*sizeof(float) );partial c (float*)malloc(blocksPerGrid*sizeof(float));// allocate the memory on the GPUHANDLE ERROR( cudaMalloc( (void**)&dev a,N*sizeof(float) ) );HANDLE ERROR( cudaMalloc( (void**)&dev b,N*sizeof(float) ) );HANDLE ERROR( cudaMalloc( (void**)&dev partial c,blocksPerGrid*sizeof(float) ) );// fill in the host memory with datafor (int i 0; i N; i ) {a[i] i;b[i] i*2;}// copy the arrays ’a’ and ’b’ to the GPUHANDLE ERROR( cudaMemcpy( dev a, a, N*sizeof(float),cudaMemcpyHostToDevice ) );HANDLE ERROR( cudaMemcpy( dev b, b, N*sizeof(float),cudaMemcpyHostToDevice ) );

COMP 605: TopicPosted: 04/25/17Last Update: 04/25/17CUDA Shared Memory & Synchronization (K&H Ch5, S&K Ch5)Shared Memory Model example: dot product (S&K Ch5)15/16Mary ThomasShared Memory Model: dot.cu (S&K Ch5)host launches kernel (line 1)number of threads and blocksdepend on Nsmallest multiple ofthreadsperblock that is greaterthan N:const int blocksPerGrid 123456789101112131415161718192021222324252627dot blocksPerGrid,threadsPerBlock (dev a, dev b,dev partial c );// copy the array ’c’ back from the GPU to the CPUHANDLE ERROR( cudaMemcpy( partial c,dev partial c, blocksPerGrid*sizeof(float),cudaMemcpyDeviceToHost ) );// finish up on the CPU sidec 0;for (int i 0; i blocksPerGrid; i ) {c partial c[i];}#define sum squares(x) (x*(x 1)*(2*x 1)/6)printf( "Does GPU value %.6g %.6g?\n", c,2 * sum squares( (float)(N - 1) ) );// free memory on the gpu sideHANDLE ERROR( cudaFree( dev a ) );HANDLE ERROR( cudaFree( dev b ) );HANDLE ERROR( cudaFree( dev partial c ) );// free memory on the cpu sidefree( a );free( b );free( partial c );imin( 32, (N threadsPerBlock-1) / threadsPerBlock );

COMP 605: TopicPosted: 04/25/17Last Update: 04/25/17CUDA Shared Memory & Synchronization (K&H Ch5, S&K Ch5)Shared Memory Model example: dot product (S&K Ch5)16/16Mary ThomasShared Memory and Threading (2D matrix example)Each SM in Maxwell has 64KB shared memory (48KB max per block)Shared memory size is implementation dependent!For TILE WIDTH 16, each thread block uses 2*256*4B 2KB ofshared memory.Can potentially have up to 32 Thread Blocks actively executingThis allows up to 8*512 4,096 pending loads. (2 per thread, 256threads per block)The next TILE WIDTH 32 would lead to 2*32*32*4B 8KB sharedmemory usage per thread block, allowing 8 thread blocks active at thesame timeUsing 16x16 tiling, we reduce the accesses to the global memory by afactor of 16The 150GB/s bandwidth can now support (150/4)*16 600 GFLOPSSource: David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408/CS483/ECE498al

A Common Programming Strategy Global memory resides in device memory (DRAM) Perform computation on device bytiling the input datato take advantage of fast shared memory: Partitiondata intosubsetsthat t into shared memory Handleeach data subset with one thread block: Loading the subset from global memory to shared memory,using

Related Documents:

Listen up for Ear Infection Advice! - Yankton Medical Clinic

Kevin Bray, MD, FACOG* 605-665-5538 Amy M. Eichfeld, MD, FACOG* 605-665-5538 David W. Withrow, MD, FAAP* 605-665-5538 Robert T. Ferrell, MD, FACOG* 605-665-5538 Matthew D. Krell, MD, FAAP* 605-624-8643 William J. Dendinger, MD* 605-624-8643 Charles C. Yelverton, MD* 605-624-8643 Michelle Chaussee, MPAS, PA-C 605-624-8643 Amy Fluit, MPAS, PA-C .

18 Views

2y ago

CARPROG VAG Airbag manual - CODECARD.EU

VW Touran, Polo, Caddy 2002-2006 VW62 (with secured HC08 processors 1L02M, 1L67J) - Crash data erase, EEPROM read/write by airbag sensor connector K-line: - 1T0 909 605 VW62 - 1T0 909 605 B VW62 - 6Q0 909 605 AH - 6Q0 909 605 AJ - 6Q0 909 605 P - 1C0 909 605 C Audi: 4A0 959 655 Bosch 0 285 001 036 93C46

105 Views

3y ago

GUIDELINES FOR S OF WEDDING M PRELUDE ~ …

Song of St. Patrick – Haugen – G Comp II # 685 Taste and See – Haugen – G Comp II # 34 Taste and See – Moore – G Comp II # 827 The Love of The Lord – Joncas – G Comp II # 680 The Servant Song – Richard Gillard– – G Comp II # 661 We Have Been Told – Haas – G Comp II # 69

48 Views

2y ago

Insitutional Profile Reports - American University of Beirut

2016-17 HERI Faculty Survey Institutional Profile Report Full-time Undergraduate Faculty Total Men Women CIRP Construct Note: Significance * p .05, ** p .01, *** p .001 Page 1 of 76 1A. American University of Beirut Your Inst Comp 1 Comp 2 Your Inst Comp 1 Comp 2 Your Inst Comp 1 Comp 2

19 Views

1y ago

COMP 605: Introduction to Parallel Computing Lecture ...

COMP 605: Topic Posted: 04/25/17 Last Update: 04/25/17 2/17 Mary Thomas Table of Contents 1

15 Views

2y ago

En esta edición El futuro del reconocimiento de los ...

cuando determine si existe evidencia persuasiva según la ASC 985-605-25 o el SAB Topic 13.A (codificado en la ASC 605-10-S99-1), los contratos que actualmente sean contabilizados según la ASC 985-605 o la ASC 605 pueden no estar dentro del alcance de la ASU a menos

17 Views

2y ago

Nathan Woods 409 Deadwood Ave Rapid City, SD 57702 409 Deadwood Ave ...

409 Deadwood Ave Rapid City, SD 57702 605-399-5181 Kelsey.Peterson@blackhillscorp.com In the Northern Hills: Drew Heid 1251 Otter Road Sturgis, SD 57785 605-206-2967 drew.heid@blackhillscorp.com Luke Ross 409 Deadwood Ave Rapid City, SD 57702 605-721-1724 luke.ross@blackhillscorp.com Levi Buck 409 Deadwood Ave Rapid City, SD 57702 605-399-5220

10 Views

7m ago

Salary Schedule for Full-Time Appointed Teachers

October 2019 5 Salary Tables 208-day schedule (cont.) LANE 3 2019-2020 2020-2021 2021-2022 2022-2023 2023-2024 Year Step Salary Total Comp. Salary Total Comp. Salary Total Comp. Salary Total Comp. Salary Total Comp. 7 70

86 Views

2y ago

Recent Views

IN THIS ISSUE CAR WASH INSIGHT Recent, Notable M&A Transactions .

9/8/2022 Club Car Wash Sites of Tidal Wave Express Car Wash 8 8/29/2022 Take 5 Car Wash Soft Touch Car Wash, Auto Oasis Car Wash, Clearwater Car Wash and Birdie's Car Wash 5 8/25/2022 WhiteWater Express Geaux Clean Car Wash 7 8/19/2022 ModWash Home Team Car Wash 3 8/18/2022 Splash In ECO Car Wash (Wills Group) Blue Hen Car Wash 2

9m ago

100 Views

Personal insurance - Car & Business insurance King Price Insurance

The king's insurance options 5 Things you need to know 7 The stuff you need to do 14 How to claim 16 Our commitment to you 20 Car insurance 22 Car warranty 37 Shortfall cover 45 Scratch and dent 46 Tyre and rim 48 Motorbike insurance 53 Trailer and caravan insurance 64 Watercraft insurance 68 Home contents insurance 77 Buildings insurance 89

1y ago

673 Views

ESSENTIAL PLAN - Discovery

Car insurance only Car and home insurance Car insurance only Car and home insurance 12.5% 25% 5% 10% YOUR FUEL CASH BACK PERCENTAGE GET TO THE HIGHEST CASH BACK PERCENTAGE Add at least R250 000 of home insurance (household contents, buildings or both) Take your car to Tiger Wheel & Tyre and pass the Annual MultiPoint check

1y ago

269 Views

CAR INSURANCE EVERYTHING EXPLAINED - RSA Insurance Group

CAR INSURANCE 93013821.indd 1 15/03/2018 10:46. 2 WELCOME TO µ CAR INSURANCE Thank you for choosing µ to protect you and your car. This booklet is intended to help you check your cover and to reassure you that µ will give you the protection you need for the year ahead. First of all, to help you understand your car insurance policy we want to .

1y ago

274 Views

Describe types and purposes of insurance.

D.O. CAPS Consumer Skills: Insurance—10E 3 Your car - The car you drive can also affect your insurance rates. Insurance companies place certain kinds of cars in special risk categories. You should ask your insurance agent before making a car purchase to make sure you aren't getting a car that will cost you extra for your liability insurance.

1y ago

233 Views

Money Online Price Comparison - WordPress

you to compare car insurance quotes. You'll notice at the top of the screen is a warning regarding telling the truth when completing any form of car insurance quote as something withheld, which later becomes known, can void an insurance claim. 7 The process of completing a car insurance price comparison is broken down into 4

1y ago

174 Views

Contours Options Infant Car Seat Adapter Instruction Sheet

your Infant Car Seat, as described in the instruction manual provided by the Infant Car Seat manufacturer. † WHEN USING ONLY ONE INFANT CAR SEAT ADAPTER OR TWO FOR TWINS, THE FOLLOWING INFANT CAR SEATS CAN BE USED: † If your Infant Car Seat is not one of the models listed above, DO NOT use your infant car seat with this car seat adapter.

2y ago

564 Views

Microsoft Advertising Travel Update

last minute cruise deals -58.50% Car Rental Queries WoW Change car rental -43.80% rental cars -46.30% car rentals -40.60% cheap car rentals -48.00% car rentals cheapest rates -52.20% rent a car- 40.30% cheap rental cars -45.60% rental car -41.80% car rental deals -49.30% rental cars lowest price -53.90% Flight Queries WoW Change cheap flights .

1y ago

337 Views

Design and development of lift for an automatic car parking system

1. Stacker type car parking system 2. Puzzle type car parking system 3. Level type car parking system 4. Chess type car parking system 5. Rotary type car parking system 6. Tower type car parking system But lift is used only in tower type car parking system. Objectives:-

6m ago

172 Views

Gold Tier - MAPFRE Insurance

Foy Insurance of MA, LLC 198 Frank Consolati Insurance Agency, Inc. 198 County Insurance Agency, Inc. 198 Woodrow W Cross Agency 214 Woodland Insurance Agency, Inc. 214 Tegeler Insurance Services of CT, Inc. 214 Pantano/VonKahle Insurance Agency, Inc. 214 . Hanson Insurance Agency, Inc. 287 J.H. Slattery Insurance Agency, Inc. 287

1y ago

565 Views

Car Insurance This booklet covers:Car Rapid Bonus Business

Car Insurance This booklet covers:Car Rapid Bonus Business RAC Direct Insurance is a trading name of London and Edinburgh Insurance Company Limited. Registered in England No 924430. Registered Office: 8 Surrey Street, Norwich NR1 3NG. Member of the Aviva Group. Authorised and regulated by the Financial Services Authority. RAC052(V27)-1971-06.06 .

1y ago

218 Views

Root Insurance (ROOT) - Citron Research

Root Insurance (ROOT) Leveling the Playing Field of Car Insurance What every trader needs to know about one of the mostheavily shorted stocks in the market Traditional Credit-Based Car Insurance PerpetuatesEconomic and Racial Inequalities as one in three American cannot affordessentials because of car insurance premiums

1y ago

209 Views

-xglfldo:Dwfk Xjxvw Wkurxjk)2,

Affordable Care Act - insurance comparison, cheapest insurance, cheap health insurance NJ, cheapest insurance company Priority One High Volume - Washington state health insurance plans, affordable health insurance The best performing ad copy included those that made specific reference to finding "health insurance" for

1y ago

259 Views

The Pricing of Group Life Insurance Schemes - Actuaries

Thus, in comparison to individual life insurance, group life insurance is more cost-effective per thousand of rupees insurance cover. 2. General Characteristics of Group Life Insurance Group life insurance, within certain restrictions and conditions, provides insurance to members of a group without requiring evidence of insurability. There is a .

1y ago

173 Views

NK-ID 0192-8365-3702-0D3E - Car-O-Liner

CAR-O-DATA. 4. The vast majority of vehicles on the road today can be found in Car-O-Liner's database. Your . Car-O-Tronic. is delivered with a 14-day trial . Car-O-Data Vision2. subscription. Car-O-Data. is available with different subscription periods and database. 4. Check all options with our distributors. SOFTWARE PART. NO. Vision2 X1 .

3y ago

321 Views

COMP 605: Introduction To Parallel Computing Lecture : CUDA Shared Memory

It looks like you're using an ad-blocker