CUDA C Best Practices Guide - Nvidia

3y ago

19 Views

2 Downloads

2.34 MB

93 Pages

Last View : 1m ago

Last Download : 3m ago

Upload by : Gia Hauser

Report this link

Download PDF

Transcription

CUDA C Best Practices GuideDesign GuideDG-05603-001 v11.1 September 2020

Table of ContentsPreface.viiiWhat Is This Document?. viiiWho Should Read This Guide?.viiiAssess, Parallelize, Optimize, Deploy. ixAssess.ixParallelize.xOptimize.xDeploy. xRecommendations and Best Practices.xiChapter 1. Assessing Your Application.1Chapter 2. Heterogeneous Computing. 22.1. Differences between Host and Device.22.2. What Runs on a CUDA-Enabled Device?.3Chapter 3. Application Profiling. 53.1. Profile.53.1.1. Creating the Profile.53.1.2. Identifying Hotspots.63.1.3. Understanding Scaling. 63.1.3.1. Strong Scaling and Amdahl's Law.63.1.3.2. Weak Scaling and Gustafson's Law.73.1.3.3. Applying Strong and Weak Scaling.7Chapter 4. Parallelizing Your Application.8Chapter 5. Getting Started. 95.1. Parallel Libraries. 95.2. Parallelizing Compilers. 95.3. Coding to Expose Parallelism. 10Chapter 6. Getting the Right Answer. 116.1. Verification. 116.1.1. Reference Comparison.116.1.2. Unit Testing.116.2. Debugging.126.3. Numerical Accuracy and Precision.126.3.1. Single vs. Double Precision. 126.3.2. Floating Point Math Is not Associative.13CUDA C Best Practices GuideDG-05603-001 v11.1 ii

6.3.3. IEEE 754 Compliance. 136.3.4. x86 80-bit Computations. 13Chapter 7. Optimizing CUDA Applications. 14Chapter 8. Performance Metrics. 158.1. Timing. 158.1.1. Using CPU Timers.158.1.2. Using CUDA GPU Timers.168.2. Bandwidth. 168.2.1. Theoretical Bandwidth Calculation. 178.2.2. Effective Bandwidth Calculation. 178.2.3. Throughput Reported by Visual Profiler. 18Chapter 9. Memory Optimizations.199.1. Data Transfer Between Host and Device. 199.1.1. Pinned Memory.209.1.2. Asynchronous and Overlapping Transfers with Computation.209.1.3. Zero Copy. 239.1.4. Unified Virtual Addressing. 249.2. Device Memory Spaces.249.2.1. Coalesced Access to Global Memory. 269.2.1.1. A Simple Access Pattern.269.2.1.2. A Sequential but Misaligned Access Pattern. 279.2.1.3. Effects of Misaligned Accesses. 279.2.1.4. Strided Accesses.289.2.2. L2 Cache. 309.2.2.1. L2 Cache Access Window. 309.2.2.2. Tuning the Access Window Hit-Ratio. 319.2.3. Shared Memory. 349.2.3.1. Shared Memory and Memory Banks. 349.2.3.2. Shared Memory in Matrix Multiplication (C AB).359.2.3.3. Shared Memory in Matrix Multiplication (C AAT). 389.2.3.4. Asynchronous Copy from Global Memory to Shared Memory.409.2.4. Local Memory. 439.2.5. Texture Memory.439.2.5.1. Additional Texture Capabilities. 439.2.6. Constant Memory. 449.2.7. Registers. 449.2.7.1. Register Pressure.449.3. Allocation. 45CUDA C Best Practices GuideDG-05603-001 v11.1 iii

9.4. NUMA Best Practices. 45Chapter 10. Execution Configuration Optimizations.4610.1. Occupancy.4610.1.1. Calculating Occupancy. 4610.2. Hiding Register Dependencies. 4810.3. Thread and Block Heuristics.4910.4. Effects of Shared Memory.5010.5. Concurrent Kernel Execution. 5010.6. Multiple contexts. 51Chapter 11. Instruction Optimization. 5211.1. Arithmetic Instructions. 5211.1.1. Division Modulo Operations. 5211.1.2. Loop Counters Signed vs. Unsigned. 5211.1.3. Reciprocal Square Root. 5311.1.4. Other Arithmetic Instructions. 5311.1.5. Exponentiation With Small Fractional Arguments.5311.1.6. Math Libraries. 5411.1.7. Precision-related Compiler Flags. 5611.2. Memory Instructions. 56Chapter 12. Control Flow. 5712.1. Branching and Divergence. 5712.2. Branch Predication. 57Chapter 13. Deploying CUDA Applications. 59Chapter 14. Understanding the Programming Environment. 6014.1. CUDA Compute Capability. 6014.2. Additional Hardware Data. 6114.3. Which Compute Capability Target.6114.4. CUDA Runtime. 62Chapter 15. CUDA Compatibility and Upgrades. 6315.1. CUDA Runtime and Driver API Version. 6315.2. Standard Upgrade Path. 6415.3. Flexible Upgrade Path. 6515.4. CUDA Compatibility Platform Package.6615.5. Extended nvidia-smi.67Chapter 16. Preparing for Deployment.6816.1. Testing for CUDA Availability.6816.2. Error Handling.69CUDA C Best Practices GuideDG-05603-001 v11.1 iv

16.3. Building for Maximum Compatibility. 6916.4. Distributing the CUDA Runtime and Libraries. 7016.4.1. CUDA Toolkit Library Redistribution. 7116.4.1.1. Which Files to Redistribute. 7216.4.1.2. Where to Install Redistributed CUDA Libraries. 73Chapter 17. Deployment Infrastructure Tools.7517.1. Nvidia-SMI.7517.1.1. Queryable state.7517.1.2. Modifiable state. 7617.2. NVML.7617.3. Cluster Management Tools. 7617.4. Compiler JIT Cache Management Tools.7717.5. CUDA VISIBLE DEVICES. 77Appendix A. Recommendations and Best Practices.78A.1. Overall Performance Optimization Strategies. 78Appendix B. nvcc Compiler Switches.80B.1. nvcc.80CUDA C Best Practices GuideDG-05603-001 v11.1 v

List of FiguresFigure 1. Timeline comparison for copy and kernel execution .22Figure 2. Memory spaces on a CUDA device .25Figure 3. Coalesced access . 27Figure 4. Misaligned sequential addresses that fall within five 32-byte segments . 27Figure 5. Performance of offsetCopy kernel .28Figure 6. Adjacent threads accessing memory with a stride of 2 . 29Figure 7. Performance of strideCopy kernel . 30Figure 8. Mapping Persistent data accesses to set-aside L2 in sliding window experiment.32Figure 9. The performance of the sliding-window benchmark with fixed hit-ratio of 1.0 .33Figure 10. The performance of the sliding-window benchmark with tuned hit-ratio . 34Figure 11. Block-column matrix multiplied by block-row matrix .35Figure 12. Computing a row of a tile . 36Figure 13. Comparing Synchronous vs Asynchronous Copy from Global Memory to SharedMemory. 42Figure 14. Comparing Performance of Synchronous vs Asynchronous Copy from GlobalMemory to Shared Memory.42Figure 15. Using the CUDA Occupancy Calculator to project GPU multiprocessoroccupancy.48Figure 16. Sample CUDA configuration data reported by deviceQuery . 61Figure 17. Compatibility of CUDA Versions .64Figure 18. Standard Upgrade Path .65Figure 19. Flexible Upgrade Path . 66Figure 20. CUDA Compatibility Platform Package . 67CUDA C Best Practices GuideDG-05603-001 v11.1 vi

List of TablesTable 1. Salient Features of Device Memory . 25Table 2. Performance Improvements Optimizing C AB Matrix Multiply . 38Table 3. Performance Improvements Optimizing C AAT Matrix Multiplication . 40Table 4. Useful Features for tex1D(), tex2D(), and tex3D() Fetches .43Table 5. Formulae for exponentiation by small fractions . 54CUDA C Best Practices GuideDG-05603-001 v11.1 vii

PrefaceWhat Is This Document?T

Figure 8. Mapping Persistent data accesses to set-aside L2 in sliding window experiment.32 Figure 9. The performance of the sliding-window benchmark with fixed hit-ratio of 1.0.33 Figure 10. The performance of the sliding-window benchmark with tuned hit-ratio.34 Figure 11.

Related Documents:

DU-05227-042 v6.5 | August 2014 CUDA-GDB CUDA DEBUGGER

CUDA-GDB runs on Linux and Mac OS X, 32-bit and 64-bit. CUDA-GDB is based on GDB 7.6 on both Linux and Mac OS X. 1.2. Supported Features CUDA-GDB is designed to present the user with a seamless debugging environment that allows simultaneous debugging of both GPU and CPU code within the same application.

32 Views

3y ago

DU-05227-042 v5.5 | July 2013 CUDA-GDB CUDA DEBUGGER

www.nvidia.com CUDA Debugger DU-05227-042 _v5.5 3 Chapter 2. RELEASE NOTES 5.5 Release Kernel Launch Stack Two new commands, info cuda launch stack and info cuda launch children, are introduced to display the kernel launch stack and the children k

29 Views

2y ago

NVIDIA CUDA Toolkit 10.0

CUDA Toolkit Major Components www.nvidia.com NVIDIA CUDA Toolkit 10.0.153 RN-06722-001 _v10.0 2 ‣ cudadevrt (CUDA Device Runtime) ‣ cudart (CUDA Runtime) ‣ cufft (Fast Fourier Transform [FFT]) ‣ cupti (Profiling Tools Interface) ‣ curand (Random Number Generation) ‣ cusolver (Dense and Sparse Direct Linear Solvers and Eigen Solvers) ‣ cusparse (Sparse Matrix)

14 Views

3m ago

Nvidia Cuda C Getting Started Guide for Microsoft Windows

NVIDIA CUDA C Getting Started Guide for Microsoft Windows DU-05349-001_v03 1 INTRODUCTION NVIDIA CUDATM is a general purpose parallel computing architecture introduced by NVIDIA. It includes the CUDA Instruction Set Architecture (ISA) and the parallel compute engine in the GPU. To program to the CUDA architecture, developers can use

22 Views

1y ago

TotalView 2016.06 Release Notes - IDRIS

See the TotalView for HPC Installation guide for more information about setting up the license server. The updated licensing software is included in the distribution. CUDA 8 Support TotalView has been tested against the latest CUDA 8 release candidate and works as expected for CUDA debugging. We will revalidate TotalView's CUDA 8 support

9 Views

1y ago

Introduction to GPU computing for statisticicans

Will Landau (Iowa State University) Introduction to GPU computing for statisticicans September 16, 2013 20 / 32. Introduction to GPU computing for statisticicans Will Landau GPUs, parallelism, and why we care CUDA and our CUDA systems GPU computing with R CUDA and our CUDA systems Logging in

35 Views

3y ago

Standard Introduction to CUDA C Programming

Expose GPU parallelism for general-purpose computing Retain performance CUDA C/C Based on industry-standard C/C Small set of extensions to enable heterogeneous programming Straightforward APIs to manage devices, memory etc. This session introduces CUDA C/C . Introduction to CUDA C/C

42 Views

3y ago

CUDA Compiler Driver NVCC - NVIDIA Developer

CUDA Compiler Driver NVCC TRM-06721-001_v11.8 1 Chapter 1. Introduction 1.1. Overview 1.1.1. CUDA Programming Model The CUDA Toolkit targets a class of applications whose control part runs as a process on a

16 Views

1y ago

Recent Views

Legal Proceedings and Legal Privilege Exemptions: Myth-busting - ICO

If asking for legal advice, say so, and start new email chain If giving legal advice, say so Involve lawyers (before litigation contemplated) Maintain confidentiality of legal advice documents Limit dissemination of legal advice (need to know; original only) Make internal communications re legal advice factual

1y ago

247 Views

Smart People Ask for (My) Advice: Seeking Advice Boosts .

advice strategically is likely to be a different experi-ence for the advice seeker than seeking advice with the intention of using it, from the advisor’s perspec-tive, strategic advice seeking may elicit the same per-ceptual effects as authentic advice seeking because the advice seeker’s intentions (and her reliance on advice)

3y ago

183 Views

Legal Action Group The Role of Advice Services in Health Outcomes

The Role of Advice Services in Health Outcomes Evidence Review and Mapping Study June 2015 The Role of Advice Services in Health Outcomes . tor.!Our! r,!

1y ago

176 Views

Legal Information vs Legal Advice Guidelines - TMCEC

giving legal advice. Legal advice is a written or oral statement that: o Interprets some aspect of the law, court rules, or court procedures; o Recommends a specific course of conduct a person should take in an actual or potential legal proceeding; or o Applies the law to the individual person's specific factual circumstances. What is Legal .

1y ago

233 Views

ProQual L2 Certificate Supporting Access to Legal Advice

R/502/7657 Communicating with legal advice clients 2 3 D/503/0822 Supporting clients to make use of the legal advice service 2 3 R/502/7660 Enabling legal advice clients to access signposting and referral opportunities 2 3 Optional Units - a minimum of 6 credits Unit Reference Number Unit Title Unit Level Credit Value

1y ago

182 Views

Guidance for opponents in civil legal aid cases - Scottish Legal Aid Board

injury case - may apply for civil legal aid (since this leaﬂet deals only with civil legal aid, where we refer to "legal aid" we mean "civil legal aid"). Legal aid is ﬁnancial help from public funds. It helps people who qualify to get legal advice and the help of a solicitor to put their case in court.

4m ago

119 Views

Priority Banking Tariff - Standard Chartered

Foreign exchange rate Free Free Free Free Free Free Free Free Free Free Free Free Free Free Free SMS Banking Daily Weekly Monthly. in USD or in other foreign currencies in VND . IDD rates min. VND 85,000 Annual Rental Fee12 Locker size Small Locker size Medium Locker size Large Rental Deposit12,13 Lock replacement

2y ago

210 Views

legal and ethical dimensions of practice - Dovetail

Material in this Guide should never be taken as providing you or any other person with legal advice. Legal advice regarding the application of the law to a particular circumstance or situation can only come from a legal practitioner. A range of sources for legal advice can be found in the Guide.

1y ago

173 Views

How Social Welfare Legal Advice and Social Prescribing can work .

The position of social welfare legal advice and its role in London's recovery The Mayor of London and partners should position social welfare legal advice as a core pillar of Londons recovery from the OVID-19 pandemic, with a core focus on ensuring adequate funding and practical support for advice agencies to ensure ongoing viability.

1y ago

179 Views

WHAT TO DO IF YOU ARE SEXUALLY HARASSED

There are many legal clinics or legal information centres you can contact to obtain legal information, educational resources or legal referrals. Alberta Central Alberta Community Legal Clinic (Red Deer) Centre for Public Legal Education Alberta Pro Bono Law Alberta Women's Centre Legal Advice Clinic (Calgary)

3y ago

250 Views

Legal Advocacy Essentials

Legal Advocacy Essentials: a core training for legal advocates Presented by the Washington State Coalition Against Domestic Violence, 2008. This information is not intended as a substitute for legal advice. 1 Legal Advocacy Essentials . A core training for legal advocates . Table of Contents . What is a legal advocate?

1y ago

259 Views

Legal & Corporate Services: Strategic Plan - CP6

the provision of legal advice, managing legal risk and managing the legal supply chain. By doing this well, the team will move towards its vision. Legal Services is made up of 4 teams, each serving different customers with a dedicated legal resource. This is summarised in the figure right. Although Legal Services has customerdistinct, -focussed .

1y ago

178 Views

Regulatory Guide RG 90 Example Statement of Advice: Scaled advice for a .

representatives and advisers who give personal advice to retail clients. It explains how and why we have developed an example Statement of Advice (SOA) for scaled advice (i.e. personal advice that is limited in scope) on personal insurance for a new retail client. The example SOA was developed in consultation with stakeholders, and we

1y ago

190 Views

Removal of licence disqualification - Legal Aid WA

agencies, permission must first be obtained from Legal Aid Western Australia. This Kit provides information about the law only and does not constitute legal advice. You should seek legal advice if you have a specific legal problem. Every effort is made to ensure that the information contai

3y ago

260 Views

Legal Information vs - txcourts.gov

giving legal advice. Legal advice is a written or oral statement that: Inter p rets some as ect of th elaw, courtles, or du s; Recomme nd s a pecific c ourse of ndu ters h ld k ein an actual or ntial legal proceeding; or 'sApplies th elaw to individu alperso n seci fic actu circums a . What is Legal Information?

1y ago

182 Views

CUDA C Best Practices Guide - Nvidia

It looks like you're using an ad-blocker