GPU Architecture Presentation(1)

1y ago

8 Views

2 Downloads

944.35 KB

29 Pages

Last View : 1m ago

Last Download : 3m ago

Upload by : Brady Himes

Report this link

Download PDF

Transcription

Evolution of the NVIDIA GPU Architecture Jason Lowden Advanced Computer Architecture November 7, 2012

Agenda Introduction of the NVIDIA GPU Graphics Pipeline GPU Terminology Architecture of a GPU Computing Elements Memory Types Fermi Architecture Kepler Architecture GPUs as a Computational Device CUDA Programming Performance Comparison Relation to SMT, Vector Processors, and DSPs Summary

NVIDIA GPU History First GPU is released in 1999 Used for the purpose of graphics processing GeForce and Quadro CUDA Architecture released in 2006 Designed for use by industry and academia as a computing device Move towards commodity parallel processing Tesla GPU series released in 2007 Fermi Architecture released in 2009 Kepler Architecture released in 2012

Graphics Pipeline

Terminology Thread – The smallest grain of the hierarchy of device computation Block – A group of threads Grid – A group of blocks Warp – A group of 32 threads that are executed simultaneously on the device Kernel ‐ The creator of a grid for GPU execution

Architecture of a GPU Same components as a typical CPU However, More computing elements More types of memory Original GPUs had vertex and pixel shaders Specifically for graphics Modern GPUs are slightly different CUDA – Compute Unified Device Architecture

Computational Elements of a GPU Streaming Processor – Core of the design Place where all of the computation takes place Streaming Multiprocessor Groups of streaming multiprocessors In addition to the SPs, these also contain the Special Function Units and Load/Store Units Instructional Schedulers Complex Control Logic

Streaming Multiprocessor Architecture

Types of GPU Memory Global DRAM Slowest Performance Texture Cached Global Memory “Bound” at runtime Constant Cached Global Memory Shared Local to a block of threads

Architectural Memory Hierarchy

Fermi Architecture

Fermi Improvements Increase the number of SPs per SM Unified Request Path for load/store instructions Implementation of a cache hierarchy L1 cache per SM Configurable with Shared Memory L2 cache is shared globally Register Spilling Occurs when the register requirements of a thread exceed what is available on the device Previous Generation: Spill to DRAM (global memory) Fermi: Use of the L1 cache

Summary

Kepler SM Overview Goal: Improve GPU performance and power efficiency Improved to 3 times performance per watt over Fermi Increased to 192 SPs per SM 32 Special Floating Point units Improved Warp Scheduling 14

Kepler SM Design 15

Warp Scheduler 4 warp schedulers Each scheduler can issue up to 2 independent instructions when it is ready to issue. 16

Kepler Memory Architecture Shared Memory and L1 are still physically shared New configuration: 32K L1, 32K Shared Shared memory bandwidth is doubled compared with Fermi Increased the size of L2 Doubled the size Fermi, increasing it to 1536 KB Introduction of Read‐Only Cache Previously, this was used in Fermi for Texture cache 48 KB of storage 17

Warp Shuffle Instructions In Fermi, data could only be exchanged between threads using shared memory. Resulted in additional synchronization time Kepler allows the shuffle functions, which Exchange data between threads without using shared memory Handles the store‐and‐load operation as a single step Data can only be shared within the same warp In their example, an FFT algorithm saw 6% performance increase when using this instruction. 18

Kepler Hardware Features Dynamic Parallelism Any kernel can launch more kernels from within itself Takes additional load off of the CPU Hyper‐Q 32 hardware managed work queues Fermi had 1 queue Grid Management Unit Needed to manage the number of grids that are executed Introduction of the GMU to handle all of the grids that can be active at one time NVIDIA GPUDirectTM Ability for CUDA enabled GPUs to interact without the need for CPU intervention The GPU can interact directly with the NIC 19

Comparison of Kepler and Fermi 20

Use for Computation Historically, GPUs were used for graphics to offload CPU work Current trend – Combine CPU and GPU on a single core Due to the massively parallel computations of the work, GPUs are ideal for their number of processing cores. However, these are only ideal when there are few data dependencies. Introduction of CUDA and the Tesla GPUs

CUDA Programming Extensions to the C language With some C support Programming Support Windows – Visual Studio Linux/Mac – Eclipse Programming paradigm where each computation take place on a separate thread Requires NVIDIA GPU for acceleration Simulators are used for research purposes

Example – Vector Addition C for( int i 0; i SIZE; i ) { c[ i ] a[ i ] b[ i ]; } CUDA global void addVectors( float* a, float* b, float* c ) { int id threadIdx.x; if( id SIZE ) { c[ id ] a[ id ] b[ id ]; } }

Programming Requirements Explicit Memory Operations to allocate and copy data from the CPU to GPU Some exceptions do apply All kernels execute asynchronously of the CPU Explicit synchronization barriers between the processors

Synchronization and Performance To meet data dependencies, Synchronization Primitives syncthreads() – Synchronizes all threads in a block Atomic Operations – Depending on compute/CUDA version, these are possible on global and shared memory Performance is dictated by memory operations and synchronization cost Memory Coalescence Warp Divergence

Performance Comparison

Relation to Other Architectures SMT Many smaller cores, with less functionality, to compute results Each core has a hardware context for a thread that can be switched out Vector Processors Computation of results in parallel that could be done sequentially by a CPU Ability to access large chunks of data from memory at a given time Banks of shared memory ‐ could lead to bank conflicts Digital Signal Processors As with DSP algorithms, many applications could also use the MAC elements; these are built into the GPU by design

Conclusions GPUs are massively parallel devices that can be used for general purpose computing, in addition to graphics processing As the cost continues to decrease, these devices become off‐the‐shelf components that can be used to build larger system. In addition to compute capabilities, Kepler offers the benefit of additional performance per watt, making a more power efficient design. When used with other technologies, like OpenCL, GPUs can be used in heterogeneous platforms.

References http://www.nvidia.com/page/corporate timeline.html http://www.pcmag.com/encyclopedia term/0,2542,t graphics pipeline&i 43933,00.asp S. L. Alarcon, “CUDA Memories,” unpublished. NVIDIA. (2012 April 16). NVIDIA CUDA C Programming Guide. [Online]. Available: ne/docs/html/C/doc/CUDA C Progr amming Guide.pdf. NVIDIA. (2009). NVIDIA’s Next Generation CUDATM Compute Architecture: Fermi. [Online]. Available: http://www.nvidia.com/content/PDF/fermi white papers/NVIDIA Fermi Compute Archite cture Whitepaper.pdf. NVIDIA. (2012). NVIDIA’s Next Generation CUDATM Compute Architecture: KeplerTM GK110. [Online]. Available: Kepler‐GK110‐ Architecture‐Whitepaper.pdf. NVIDIA. (2012). NVIDIA GeForce GTX 680. [Online]. Available: http://www.geforce.com/Active/en US/en f

Architecture Jason Lowden Advanced Computer Architecture November 7, 2012. Introduction of the NVIDIA GPU Graphics Pipeline GPU Terminology Architecture of a GPU Computing Elements Memory Types Fermi Architecture Kepler Architecture GPUs as a Computational Device .

Related Documents:

OpenCV on a GPU

OpenCV GPU header file Upload image from CPU to GPU memory Allocate a temp output image on the GPU Process images on the GPU Process images on the GPU Download image from GPU to CPU mem OpenCV CUDA example #include opencv2/opencv.hpp #include <

156 Views

2y ago

GPU Tutorial 1: Introduction to GPU Computing

GPU Tutorial 1: Introduction to GPU Computing Summary This tutorial introduces the concept of GPU computation. CUDA is employed as a framework for this, but the principles map to any vendor’s hardware. We provide an overview of GPU computation, its origins and development, before presenting both the CUDA hardware and software APIs. New Concepts

45 Views

3y ago

Take GPU processing power beyond graphics with GPU ...

limitation, GPU implementers made the pixel processor in the GPU programmable (via small programs called shaders). Over time, to handle increasing shader complexity, the GPU processing elements were redesigned to support more generalized mathematical, logic and flow control operations. Enabling GPU Computing: Introduction to OpenCL

67 Views

3y ago

GPU Ray Tracing - GPU Technology Conference 2012

Possibly: OptiX speeds both ray tracing and GPU devel. Not Always: Out-of-Core Support with OptiX 2.5 GPU Ray Tracing Myths 1. The only technique possible on the GPU is “path tracing” 2. You can only use (expensive) Professional GPUs 3. A GPU farm is more expensive than a CPU farm 4. A

41 Views

2y ago

GPU Computing Advances in 3D Electromagnetic Simulation

Latest developments in GPU acceleration for 3D Full Wave Electromagnetic simulation. Current and future GPU developments at CST; detailed simulation results. Keywords: gpu acceleration; 3d full wave electromagnetic simulation, cst studio suite, mpi-gpu, gpu technology confere

33 Views

2y ago

Load Balanced Parallel GPU Out-of-Core for Continuous LOD Model ...

transplant a parallel approach from a single-GPU to a multi-GPU system. One major reason is the lacks of both program-ming models and well-established inter-GPU communication for a multi-GPU system. Although major GPU suppliers, such as NVIDIA and AMD, support multi-GPUs by establishing Scalable Link Interface (SLI) and Crossﬁre, respectively .

16 Views

1y ago

NVIDIA Multi-Instance GPU and NVIDIA Virtual Compute Server

NVIDIA vCS Virtual GPU Types NVIDIA vGPU software uses temporal partitioning and has full IOMMU protection for the virtual machines that are configured with vGPUs. Virtual GPU provides access to shared resources and the execution engines of the GPU: Graphics/Compute , Copy Engines. A GPU hardware scheduler is used when VMs share GPU resources.

20 Views

1y ago

Review of ISO 14001:2015 Resources - EMSmastery

development of the International Standard and its recent publication, now, is a good opportunity to reflect on the body of information and guidance that is available a wide range of organisations. Whether you are trying to make sense of the variety of views on the revised International Standard, prepare for your transition or to keep up with the latest developments in Environmental Management .

58 Views

3y ago

Recent Views

Technological Revolutions and Stock Prices National Bureau of Economic .

stock prices up, but the discount rate eﬀect prevails eventually, pushing the stock prices down. The resulting pattern in the new economy stock prices looks like a bubble but it obtains under rational expectations through a general equilibrium eﬀect. The bubble-like pattern in stock prices arises in part due to an ex post selection bias.

1y ago

121 Views

Dynamic correlation between stock market and oil prices: The case of .

between stock market and oil prices is still growing. Nevertheless, there are very few studies on the dynamic correlation between these two markets. A first approach on the dynamic co-movements between oil prices and stock markets was performed by Ewing and Thomson (2007), using the cyclical components of oil prices and stock prices.

1y ago

122 Views

Prices Effective January 1, 2020 Machine Prices and Speci cations

Prices Effective January 1, 2020 Machine Prices and Speci cations Prices Effective January 1, 2020 ZERO TURN-4 SERIES REVISED MAY 18, 2020. Machine Prices and Secications Prices Eectie anuar , 2 ZT1. Prices F.O.B. Selma, Alabama and Subject to Change Without Notice. ESTATE SERIES

1y ago

119 Views

Vanguard U.S. Stock Index Small-Capitalization Funds .

† Stock market risk, which is the chance that stock prices overall will decline. Stock markets tend to move in cycles, with periods of rising prices and periods of falling prices.The Fund’s target index tracks a subset of the U.S. stock market, which could cause the Fund to perform di

2y ago

260 Views

Forecasting Prices on the Stock Exchange Using a Trading System

Forecasting prices in stock markets is a matter of great interest both in the academic field and in business. The forecasting of stock prices and stock returns is possible using various techniques and methods. Many researchers study price trends in stock markets with the help of artificial neural networks [1-2] or fuzzy-trends [3, 4]. The

1y ago

132 Views

A Hybrid Prediction Method for Stock Price Using LSTM and . - Hindawi

the relationship between stock prices and these factors. Although these factors will temporarily change the stock price, in essence, these factors will be reﬂected in the stock price and will not change the long-term trend of the stock price. erefore, stock prices can be predicted simply with historical data.

1y ago

159 Views

Columbus,Ohio 1890

Slicing Steaks 3563 Beef Tender, Select In Stock 3852 Angus XT Shoulder Clod, Choice In Stock 3853 Angus XT Chuck Roll, Choice 20/up In Stock 3856 Angus XT Peeled Knuckle In Stock 3857 Angus XT Inside Rounds In Stock 3858 Angus XT Flats, Choice In Stock 3859 Angus XT Eye Of Round, Choice In Stock 3507 Point Off Bnls Beef Brisket, Choice In Stock

2y ago

268 Views

Determine your Pricing Point Taxonomy to Help you Pricing Strategy: Using

Lowering of prices to match competitor prices can be done on a more precise level with Taxonomy. Price matching can be done based on the comparison of further classification via Taxonomy. Stock Availability: In stock/ Out of stock When both the competitor and us have the same product in stock, we ought to markdown to match prices when

1y ago

117 Views

COVID-19 and Energy COVID-19 and the Oil Price - Stock Market Nexus .

stock market. 1. Introduction Oil prices play a key role in stock market performance of oil-importing economies. A decline in oil prices reduces the cost of production and increases economic growth (Narayan et al., 2014). The effect of this is a rise in stock prices due to higher future earnings and dividends (Filis, 2010; Jones &

1y ago

125 Views

Relationship between Financial Ratios in the Stock Prices of .

projected financial ratios. Miri and Abraham (2010) linear and non-linear relationship between the ratio of stock prices in the financial and non-metallic minerals industry Tehran Stock Exchange for the years 2003 to 2007 were reviewed. The results showed that linear and nonlinear relationships between financial ratios and stock prices there is no

10m ago

99 Views

Stock prediction using a Hidden Markov Model versus a Long Short-Term .

the close price of a stock is used when training and predicting stock prices. All data was retrieved from Yahoo. Historical stock prices from 1 January 1990 until 1 June 2019 results in approximately 7000 data points (trading days) per stock. Data preparation In order to create enough data for the two models to

1y ago

121 Views

Learning from Peers Stock Prices and Corporate Investment - USI

stock prices then an increase in the -rm s own stock price informativeness reduces the sensitivity of its investment to its peer stock price (prediction 1). Indeed, as the signal conveyed by its own . stock price (prediction 2), but not otherwise. The same prediction holds for an increase in the correlation of the fundamentals of a -rm .

1y ago

128 Views

Do stock prices respond to changes in corporate income tax rates?

In the event studies, I regress stock returns on market returns and other factors over a time span well before the events of a tax change, creating a model of how the stock returns behave. Then I use the deviation of stock prices from the model's prediction around the events of the tax change to establish the stock's abnormal returns.

1y ago

117 Views

The Impact of Persian News on Stock Returns Through Text Mining Techniques

Persian news - on the stock prices has been neglected. Consequently, this study aimed to fill this gap. To this aim, the stock index values were collected from the Tehran Stock Exchange along with the . Stock market prediction is a way to understand the future fluctuations of a company's stock price (Jishag et al., 2020). Generally, two .

1y ago

225 Views

Buying Your First Stock - Stock-Trak

Stock Market Game Time: 15 Minutes Requires: StockTrak Curriculum , Computer Access Buying Your First Stock This lesson is an introduction to buying a stock. Students will be introduced to basic vocabulary that is involved with a buying and owning a stock. Stu-dents will be going through the entire process of buying a stock from looking

1y ago

164 Views

GPU Architecture Presentation(1)

It looks like you're using an ad-blocker