Testing GPU Memory Models

2y ago

11 Views

2 Downloads

1.53 MB

51 Pages

Last View : 1m ago

Last Download : 3m ago

Upload by : Warren Adams

Report this link

Download PDF

Transcription

Testing GPU Memory ModelsDaniel PoetzlJoint work withJade Alglave (UCL), Mark Batty (Cambridge), Alastair Donaldson(Imperial), Ganesh Gopalakrishnan (Utah), Tyler Sorensen (Utah),John Wickerson (Imperial)

Outline1. Introduction2. GPU Architectures3. Weak Memory 1014. Testing GPU Memory Models5. Results2/31

Introduction3/31

Graphics Processing Units (GPUs)IGPUs have traditionally beendesigned to accelerate graphicsapplicationsIII3D gamesVideo processingGeneral-purpose computing on GPUs (GPGPU) is becomingincreasingly widespreadIRegular applications:IIIIrregular applications:IIWeather forecastingBrute-force password crackingGraph traversalNumerous papers are published each year that aim toaccelerate traditional algorithms using GPUs4/31

Graphics Processing Units (GPUs)GPUs have found their way into many types of computer systems:IDesktops and LaptopsIGame consolesMobile devices:IIIIiPhone 5SSamsung Galaxy SCars:IAudi self-driving carIIITesla Motors Model SIIVideo processingSafety-critical (!)Infotainment systemSupercomputers5/31

GPUs in SupercomputersIGreen500 list of mostenergy-efficient supercomputersIAll top ten places are occupied bysystems using lkesHA-PACSPiz -ArdenneFor comparison:IITianhe-2 (Guangzhou, 1st in Top500): 1.9 GFLOPS/WStampede (Austin, 7th in Top500): 1.1 GFLOPS/W6/31

GPU ResearchThe number of publications/year on GPU programming hascontinuously grown over the last years (Source: 0720062020050.520Number of PublicationsI7/31

GPU architectures8/31

GPU ArchitecturesNvidia’s Maxwell (2014)Streaming Multiprocessor (SM)Processing BlockProcessing BlockWarp SchedulerPE1PE2.Warp SchedulerPEnPE1PE2.PEnL1 Cachex5Processing BlockProcessing BlockWarp SchedulerPE1PE2.Warp SchedulerPEnPE1PE2.PEnL1 CacheShared MemoryL2 CacheDRAM9/31

Programming ModelCUDAICUDA is Nvidia’s framework for general-purpose computingon GPUsIThreads are hierarchically organized:KernelBlockIBlockWarpWarpWarpWarpT0 . . . T31T32 . . . T63T64 . . . T95T96 . . .T127Different memory spaces: global, shared, local, constant,texture, parameter10/31

Vector AdditionCPU implementationISumming two vectors of size N in C:void add(int *a, int *b, int *c) {for (int i 0; i N; i ) {c[i] a[i] b[i];}}IRuntime: O(N)11/31

Vector AdditionCUDA C implementationISumming two vectors of size N in CUDA C:global void add(int *a, int *b, int *c) {int tid blockIdx.x;if (tid N)c[tid] a[tid] b[tid];}IIf number of processing elements is greater or equal to NIRuntime: O(1)12/31

CPUs vs. GPUsIKey differences between CPUs and GPUs:CoresCore complexityCachesMemory bandwidthContext switchesExplicit concurrency hierarchyDifferent memory ighFastYesYes13/31

Weak Memory 10114/31

Interleaved ExecutionIA simple model of concurrency is Lamport’s sequentialconsistency (SC), i. e. interleaved execution12123// I n i tdata f l a g 0// P r o d u c e rdata 0x7fflag 1123// Consumerwhile (flag 0) {}assert(data ! 0)15/31

Interleaved ExecutionIA simple model of concurrency is Lamport’s sequentialconsistency (SC), i. e. interleaved execution12123I// P r o d u c e rdata 0x7fflag 1123// Consumerwhile (flag 0) {}assert(data ! 0)Example interleaving:12I// I n i tdata f l a g 0data 0x7f1 flag 0 ?flag 11 flag 0 ?2 assert(data ! 0)Assertion is satisfied on all interleavings.15/31

Interleaved ExecutionIA simple model of concurrency is Lamport’s sequentialconsistency (SC), i. e. interleaved execution12123I// I n i tdata f l a g 0// P r o d u c e rdata 0x7fflag 1123// Consumerwhile (flag 0) {}assert(data ! 0)Multi- and manycore processors exhibit weak memoryconsistency:IIIOut-of-order executionSpeculative executionCachingIAssertion can fail on those systems!ISynchronization algorithms (Dekker, Peterson, . . . ) we’vebeen taught in school do not work on multicore systems15/31

Weak Memory ConsistencyCachingCore 0CacheExecutionCore 1Memorydata : 0flag : 0CacheExecution// Initflag data 0// Producerdata 0x7fflag 1// Consumerwhile (flag 0) {}assert(data ! 0)16/31

Weak Memory ConsistencyCachingCore 0CacheExecutiondata : 0x7fCore 1Memorydata : 0flag : 0CacheExecution// Initflag data 0// Producerdata 0x7fflag 1// Consumerwhile (flag 0) {}assert(data ! 0)16/31

Weak Memory ConsistencyCachingCore 0CacheExecutionCore 1Memorydata : 0x7fdata : 0flag : 1flag : 0CacheExecution// Initflag data 0// Producerdata 0x7fflag 1// Consumerwhile (flag 0) {}assert(data ! 0)16/31

Weak Memory ConsistencyCachingCore 0CacheExecutionCore 1Memorydata : 0x7fdata : 0flag : 1flag : 1CacheExecution// Initflag data 0// Producerdata 0x7fflag 1// Consumerwhile (flag 0) {}assert(data ! 0)Cache coherency protocol commits flag before data to main memory.16/31

Weak Memory ConsistencyCachingCore 0CacheExecutionCore 1Memorydata : 0x7fdata : 0flag : 1flag : 1Cacheflag : 1Execution// Initflag data 0// Producerdata 0x7fflag 1// Consumerwhile (flag 0) {}assert(data ! 0)16/31

Weak Memory ConsistencyCachingCore 0MemoryCachedata : 0x7fdata : 0data : 0flag : 1flag : 1flag : 1CacheExecutionCore 1Execution// Initflag data 0// Producerdata 0x7fflag 1// Consumerwhile (flag 0) {}assert(data ! 0) 16/31

Memory BarriersICPUs/GPUs provide memory barrier instructions to enforceordering constraints on memory accesses.IExpensive: 100s of clock cyclesIDifferent types of barriersIFix for the example (on Nvidia GPUs; assuming the producerand consumer are in different blocks):1234// P r o d u c e rdata 0 x7fasm (”membar.gl”)flag 11234// Consumerw h i l e ( f l a g 0 ) {}asm (”membar.gl”)a s s e r t ( d a t a ! 0 )17/31

Axiomatic Memory ModelsI12Executions are not represented as interleavings, but asexecution graphs:data 0x7f1 flag 0 ?flag 11 flag 0 ?2 assert(data ! 0)read flag: 1data 0x7ffrpoflag 1rfporead data: 0IAn execution graph is acyclic if and only if it corresponds toan interleavingIAxiomatic memory models: Give a set of formal rules definingwhich executions are possible on a certain architectureFull details:IIHerding Cats. Alglave et al. TOPLAS ’1418/31

Testing GPU Memory Models19/31

Weak Memory ModelsWhich behaviors can be observed when threads concurrently accessshared memory?IIAs we’ve seen, we cannot expect sequential consistency(interleaved execution) on GPUsBut what exactly can we expect?IIConsult the manual: prose, ambiguous, little detail, sometimesplain wrongWe want a formal memory model!20/31

Weak Memory ModelsWhich behaviors can be observed when threads concurrently accessshared memory?IIAs we’ve seen, we cannot expect sequential consistency(interleaved execution) on GPUsBut what exactly can we expect?IIIConsult the manual: prose, ambiguous, little detail, sometimesplain wrongWe want a formal memory model!Formal memory model based on:IIIVendor documentationTestingDiscussion with industry contacts20/31

Test FrameworkIWe extended the diy and litmus tools to generate and runGPU litmus testsIdiy to generate testsIIIShort assembly code snippets called litmus testsTest generation based on an axiomatic modeling frameworklitmus to run testsIIRuns tests produced by diy many timesAdds additional code to create noise (“incantations”) to makeweak behaviors appearCUDA codediylitmus testslitmusOpenCL code21/31

Test GenerationdiyIExecutions are represented as directed graphsINon-SC executions have cyclesRy1Rx1rfpoWy1IrfpoWx1Which non-SC executions are possible on a certain chip?22/31

Test GenerationdiyIExecutions are represented as directed graphsINon-SC executions have cyclesRy1Rx1rfpoWy1rfpoWx1IWhich non-SC executions are possible on a certain chip?IKey idea of diy:IIEnumerate non-SC executions (i. e. cyclic execution graphs)From each such graph, generate a test such that one of itsexecutions is the execution from which it was generated22/31

Test GenerationExampleIExecution graph:Ry1Rx1rfpoWy1IrfpoWx1Generated litmus test:P0P123/31

Test GenerationExampleIExecution graph:Ry1Rx1rfpoWy1IrfpoWx1Generated litmus test:P0r1 xP123/31

Test GenerationExampleIExecution graph:Ry1Rx1rfpoWy1IrfpoWx1Generated litmus test:P0r1 xy 1P123/31

Test GenerationExampleIExecution graph:Ry1Rx1rfpoWy1IrfpoWx1Generated litmus test:P0r1 xy 1P1r2 y23/31

Test GenerationExampleIExecution graph:Ry1Rx1rfpoWy1IrfpoWx1Generated litmus test:P0r1 xy 1P1r2 yx 123/31

Test GenerationExampleIExecution graph:Ry1Rx1rfpoWy1IrfpoWx1Generated litmus test:P0r1 xy 1P1r2 yx 1r1 123/31

Test GenerationExampleIExecution graph:Ry1Rx1rfpoWy1IrfpoWx1Generated litmus test:P0P1r1 xr2 yy 1x 1r1 1 r2 123/31

Running TestslitmusINow that we can generate tests, we want to run them on thehardware!ITo make the weak behaviors appear, we need “incantations”:IIIIIPut variables on different cache linesNoise maker threads that write random memory locationsRandom launch parametersTrigger bank conflicts.24/31

Running TestsBank ConflictsIMemory is divided into banksBanks are interleaved, not contiguousAccesses to the same bank are serializedINo bank conflict:IIAddressBankThread 00x00Thread 10x11Thread 20x22Thread 30x330x400x510x620x7325/31

Running TestsBank ConflictsIMemory is divided into banksBanks are interleaved, not contiguousAccesses to the same bank are serializedIBank conflict:IIAddressBankThread 00x00Thread 10x11Thread 20x220x330x400x510x620x73Thread 325/31

Running TestsBank ConflictsIAccesses to the same bank are serializedIAccesses can be delayed due to a bank conflictP0P1x 1r3 yy 1r4 xr3 1 r4 0IIf x 1 has a bank conflict, it may be delayed leading to y 1 being executed firstIThe order in which accesses to the same bank are serialized isunspecified26/31

Test Results27/31

Read-Read-CoherenceIConsider the following test, with P0 and P1 in differentblocks, and initially x 0:P0x 1P1r1 xr2 xr1 1 r2 028/31

Read-Read-CoherenceIConsider the following test, with P0 and P1 in differentblocks, and initially x 0:P0x 1P1r1 xr2 xr1 1 r2 0IRunning this test 100,000 times with litmus on the GeForceGTX 660 yields the following histogram:T e s t CoRR A l l o w e dHistogram (4 s t a t e s )59875 : 1: r 0 0; 1 : r 2 0;828 1: r 0 1; 1 : r 2 0;2422 : 1: r 0 0; 1 : r 2 1;36875 : 1: r 0 1; 1 : r 2 1;28/31

Read-Read-CoherenceIConsider the following test, with P0 and P1 in differentblocks, and initially x 0:P0x 1P1r1 xr2 xr1 1 r2 0IBehavior is considered a bug:IIIDoes not guarantee what is typically required by programminglanguage standards (OpenCL, C 11)OpenCL and C 11 require that there is a total order on allwrites to a memory location (coherence order)No thread shall read values that contradict this orderIBug occured in all Nvidia chips of the Fermi and Keplergenerations we testedIFixed in the new Maxwell architecture28/31

Read-Read-CoherenceIConsider the following test, with P0 and P1 in differentblocks, and initially x 0:P0x 1P1r1 xr2 xr1 1 r2 0IGPUs are fairly deterministic (compared to CPUs)IBy fixing the random test parameters, we can make the bugdeterministically show up (on Fermi and Kepler GPUs):T e s t CoRR A l l o w e dHistogram (1 s t a t e )100000 1: r 0 1; 1 : r 2 0;28/31

Compare-and-swapMutex idiomIConsider the following test, with P0 and P1 in differentblocks, and initially x 0 and mutex 1:P0x 1membar.glmutex 0P1b cas(&mutex, 0, 1)r xb true r 0IP0 : Write data to x, then unlock the mutexIP1 : Attempt to lock the mutex; read x if successfulCan P1 read a stale value from x when the CAS succeeds?IIIIYes! (on Fermi and Kepler)CAS does not imply a memory fence on Nvidia GPUsSeveral papers assume this and are thus wrong (among themthe textbook CUDA by Example)29/31

SummaryIdiy to generate GPU litmus testsIlitmus to run GPU litmus testsITesting the hardware is a necessary first step towards buildinga formal memory model:IDocumentation is insufficient:IIIambiguouslittle detailsometimes wrongISide effect: We find bugs in the hardwareITest results serve as a basis for communication with industrycontacts30/31

Thank you!31/31

Apr 17, 2014 · 21/31 Test Framework I We extended the diy and litmus tools to generate and run GPU litmus tests I diy to generate tests I Short assembly code snippets calledlitmus tests I Test generation based on an axiomatic modeling framework I litmus to run tests I Runs tests produced by diy many times I Adds additi

Related Documents:

OpenCV on a GPU

OpenCV GPU header file Upload image from CPU to GPU memory Allocate a temp output image on the GPU Process images on the GPU Process images on the GPU Download image from GPU to CPU mem OpenCV CUDA example #include opencv2/opencv.hpp #include <

157 Views

2y ago

Load Balanced Parallel GPU Out-of-Core for Continuous LOD Model ...

transplant a parallel approach from a single-GPU to a multi-GPU system. One major reason is the lacks of both program-ming models and well-established inter-GPU communication for a multi-GPU system. Although major GPU suppliers, such as NVIDIA and AMD, support multi-GPUs by establishing Scalable Link Interface (SLI) and Crossﬁre, respectively .

17 Views

1y ago

GPU Tutorial 1: Introduction to GPU Computing

GPU Tutorial 1: Introduction to GPU Computing Summary This tutorial introduces the concept of GPU computation. CUDA is employed as a framework for this, but the principles map to any vendor’s hardware. We provide an overview of GPU computation, its origins and development, before presenting both the CUDA hardware and software APIs. New Concepts

46 Views

3y ago

Take GPU processing power beyond graphics with GPU ...

limitation, GPU implementers made the pixel processor in the GPU programmable (via small programs called shaders). Over time, to handle increasing shader complexity, the GPU processing elements were redesigned to support more generalized mathematical, logic and flow control operations. Enabling GPU Computing: Introduction to OpenCL

68 Views

3y ago

GPU Ray Tracing - GPU Technology Conference 2012

Possibly: OptiX speeds both ray tracing and GPU devel. Not Always: Out-of-Core Support with OptiX 2.5 GPU Ray Tracing Myths 1. The only technique possible on the GPU is “path tracing” 2. You can only use (expensive) Professional GPUs 3. A GPU farm is more expensive than a CPU farm 4. A

42 Views

2y ago

GPU Computing Advances in 3D Electromagnetic Simulation

Latest developments in GPU acceleration for 3D Full Wave Electromagnetic simulation. Current and future GPU developments at CST; detailed simulation results. Keywords: gpu acceleration; 3d full wave electromagnetic simulation, cst studio suite, mpi-gpu, gpu technology confere

34 Views

2y ago

NVIDIA Multi-Instance GPU and NVIDIA Virtual Compute Server

NVIDIA vCS Virtual GPU Types NVIDIA vGPU software uses temporal partitioning and has full IOMMU protection for the virtual machines that are configured with vGPUs. Virtual GPU provides access to shared resources and the execution engines of the GPU: Graphics/Compute , Copy Engines. A GPU hardware scheduler is used when VMs share GPU resources.

20 Views

1y ago

NVIDIA GRID™ GPU Acceleration for Virtualization

NVIDIA GRID K2 1 Number of users depends on software solution, workload, and screen resolution NVIDIA GRID K1 GPU 4 Kepler GPUs 2 High End Kepler GPUs CUDA cores 768 (192 / GPU) 3072 (1536 / GPU) Memory Size 16GB DDR3 (4GB / GPU) 8GB GDDR5 Max Power 130 W 225 W Form Factor Dual Slot ATX, 10.5” Dual Slot ATX,

23 Views

2y ago

Recent Views

Personal insurance - Car & Business insurance King Price Insurance

The king's insurance options 5 Things you need to know 7 The stuff you need to do 14 How to claim 16 Our commitment to you 20 Car insurance 22 Car warranty 37 Shortfall cover 45 Scratch and dent 46 Tyre and rim 48 Motorbike insurance 53 Trailer and caravan insurance 64 Watercraft insurance 68 Home contents insurance 77 Buildings insurance 89

1y ago

685 Views

Gold Tier - MAPFRE Insurance

Foy Insurance of MA, LLC 198 Frank Consolati Insurance Agency, Inc. 198 County Insurance Agency, Inc. 198 Woodrow W Cross Agency 214 Woodland Insurance Agency, Inc. 214 Tegeler Insurance Services of CT, Inc. 214 Pantano/VonKahle Insurance Agency, Inc. 214 . Hanson Insurance Agency, Inc. 287 J.H. Slattery Insurance Agency, Inc. 287

1y ago

577 Views

Consumer Guide to Auto Insurance - csimt.gov

consumer guide to auto insurance contents introduction to auto insurance 1 understanding your auto insurance policy 2 required auto insurance 3 optional types of auto insurance 4-5 getting the right coverage 6 accidents and violations 7 how to shop for auto insurance 8 shopping tips 9 frequently asked questions 10-11 insurance complaints/when you have a problem 12

2y ago

815 Views

Industry Observations Insurance Industry

Jun 30, 2019 · 6/17/2019 Commercial Insurance Branch of Extraco Banks, N.A. Higginbotham Insurance Group, Inc. Insurance Brokers NA 6/13/2019 Links Insurance Services, LLC World Insurance Associates LLC Property and Casualty Insurance NA 6/13/2019 Abram Interstate Insurance Services, Inc. Risk Placement Services,

2y ago

630 Views

Life Insurance Buyer's Guide Life Insurance - National Association of .

Life Insurance uers uide Naional ssociaion of Insurance Commissioners Compare the Different Types of Insurance Policies There are many types of life insurance pol-icies. You should choose a policy with fea-tures that fit your individual needs. Some things to consider are: Term Insurance vs. Cash Value In-surance. Term insurance is intended to

1y ago

531 Views

your guide to understanding auto ins in nh - New Hampshire

Hampshire Insurance Department does not mandate or set Auto Insurance Rates. Auto Insurance Rates will vary by insurance company. This guide is intended to give New Hampshire consumers basic information on auto insurance. It suggests ways to: Lower the cost of your auto insurance, shop for Auto insurance and, file an auto insurance claim.

1y ago

460 Views

18.01.41 - REPLACEMENT OF LIFE INSURANCE AND ANNUITIES - Idaho

Department of Insurance Replacement of Life Insurance and Annuities. Page 3. 04. Existing Life Insurance or Annuity. "Existing Life Insurance or Annuity" means any life insurance or annuity in force, including life insurance under a binding or conditional receipt or a lif e insurance policy or annuity that is within an unconditional refund period.

1y ago

412 Views

EXAMINATION REPORT OF THE ADMIRAL INSURANCE COMPANY AS OF . - Delaware

Berkley Regional Specialty Insurance Comp 31295 DE Carolina Casualty Insurance Company 10510 IA Clermont Insurance Company 33480 IA Continental Western Insurance Company 10804 IA Firemen's Insurance Com pany of Wash, D.C. 21784 DE Gemini Insurance Company 10833 DE Great Divide Insurance Company 25224 ND

1y ago

263 Views

American International Group, Inc. - Federal Reserve

American General Life Insurance Company AGL U.S. Life Insurance Company AGC Life Insurance Company AGC Life U.S. Life Insurance Company The United States Life Insurance Company in the City of New York U.S. Life U.S. Life Insurance Company The Variable Annuity Life Insurance Company VALIC U.S. Life Insurance Company

1y ago

275 Views

Japan's Insurance Market - Toa Re

with 61.6% of net premiums written, of which automobile insurance totaled 48.8% and compulsory automobile liability insurance totaled 12.8%. Fire insurance accounted for 13.7%, miscellaneous casualty insurance including liability insurance accounted for 11.6%, accident insurance accounted for 9.8%, and marine insurance accounted for 3.2%.

1y ago

188 Views

List of Insurance Companies by Insurance Manager - Cayman Islands dollar

2447 Batan Insurance Company SPC, Ltd. 29-Sep-03 1307714 BBG Insurance Services, Ltd. 09-Aug-16 1254 BCHS Insurance, Ltd. 07-Oct-98 1168 Bearacuda Re 01-Aug-97 2639 Bedrock Insurance Limited 24-Nov-05 2150 Bom Ambiente Insurance Company 14-Jun-00 2565 Boundless Insurance Company, Ltd. 01-Dec-04 769 Bucap Limited 03-Mar-89

1y ago

301 Views

Insurance Certificate 713705-3 and Assistance Program

Name of insurance product: Purchase Protection and Travel Insurance for National Bank of Canada Mastercard credit cards, group insurance policy no. 713705 (Schedule A Certificate number 3)/713705-3 Type of insurance product: Purchase insurance and extended warranty and travel insurance (group insurance) Assistance provider contact information

4m ago

59 Views

S OF GENERAL INSURANCE

General Insurance comprises of insurance of property against fire, burglary etc, personal insurance such as Accident and Health Insurance, and liability insurance which covers legal liabilities. Suitable general Insurance covers are necessary for every family. It is important to protect one’s property, which

3y ago

286 Views

Insurance Act 1978 - Bermuda Laws

INSURANCE MANAGERS, BROKERS, AGENTS, INSURANCE MARKETPLACE PROVIDERS AND SALESMEN Insurance managers, agents and insurance marketplace providers to maintain lists of insurers for which they act Insurance broker, agent, salesman or insurance marketplace provider deemed agent of insurer in cert

2y ago

288 Views

NextWave Insurance: Life insurance and retirement 2021 (pdf)

3 NextWave Insurance: life insurance and retirement NextWave Insurance: life insurance and retirement Given the nature of the life insurance and retirement market, its leaders have always taken long-term views of their strategic horizons and growth prospects. Today, a combina

2y ago

489 Views

Testing GPU Memory Models

It looks like you're using an ad-blocker