GPU Accelerated Game AI

2y ago
39 Views
4 Downloads
4.16 MB
44 Pages
Last View : 4d ago
Last Download : 3m ago
Upload by : Victor Nelms
Transcription

GPU Accelerated Game nd-ai.htmlSelected Topics of AI, Prof. Dr. S.R. Radicke, Prof. Dr. J. Hahn, WS 2019/20201 / 44

Contents Motivation Recent Media Articles Nvidia launched their RTX GPUsCPU vs GPUs GPU Architecture Basics GPU Programming Model: CUDAGame AI on GPUs? Investigating Common AI Techniques Neural Networks and Deep LearningNvidia’s RTX Architecture Real-Time Rendering now relies on AI!?Selected Topics of AI, Prof. Dr. S.R. Radicke, Prof. Dr. J. Hahn WS 2019/20202 / 44

Motivation (1/5) GPUs have evolved much faster than CPUs in recent years Much higher theoretical data throughput Considerably larger memory bandwidth Massively-parallel architectureGPUs can be programmed to perform arbitrary tasks Nvidia CUDA, OpenCL, Compute Shaders Successful use in scientific applications Intel i9-9900K: max 39.7 GB/sRTX 2080 Ti: 616.0 GB/sMoore’s Law is strong with this one! Intel i9-9900K: 1,2 TFLOPSRTX 2080 Ti: 14,2 TFLOPS, Tensor-Cores: 113,9 TFLOPSChemistry, physics, finance, weather, numerics, .Image processing and video codingMachine Learning and Data ScienceIt’s not that easy, though GPUs, due to their architecture, require specific programming ming-guide/index.html, uda-parallel-programming-for-gpus.htmlSelected Topics of AI, Prof. Dr. S.R. Radicke, Prof. Dr. J. Hahn WS 2019/20203 / 44

Motivation (2/5) Recent media articles GPUs: Designed for gaming now crucial to HPC and AI (May 2018) “ now powers everything from Adobe Premier and databases to high-performancecomputing (HPC) and artificial intelligence (AI).” “ up to 5,000 cores. the design lends itself to massive parallel processing.” “The GPU is ideally suited to accelerate the processing of SQL queries.” “ 34 of the 50 most popular HPC application packages offer GPU support.”Comparing Hardware for Artificial Intelligence: FPGAs vs. GPUs vs. ASICs(July 2018) “Deep Neural Networks (DNNs). are all about completing repetitive math algorithms orfunctions on a massive scale at blazing speeds.”“ when the application is also performance-, power-, and latency-critical, FPGAs reallyshine versus GPUs.”“FPGAs are also better than GPUs wherever custom data types exist or irregularparallelism tends to develop.”“The architecture of GPUs is not as flexible in making changes to an existing system asare ce-fpgas-vs-gpus-vs-asics/Selected Topics of AI, Prof. Dr. S.R. Radicke, Prof. Dr. J. Hahn WS 2019/20204 / 44

Motivation (3/5) Recent media articles How the GPU became the heart of AI and machine learning(August 2018) “ the big story over the last couple of years, has been the evolution of theGPU from purely the graphics pipeline to really being one of the three corecomponents of deep learning and machine learning.”“ new, amazing frameworks that are coming out all the time, the big onesbeing TensorFlow and Torch and a handful of others.” if you have any access to, say, structured data. a deep-learning modelcan actually give you a pretty magical amount of predictive power.“Google has re-orientated itself entirely around AI.”“You can use the same GPU with videogames as you could use for trainingdeep learning models.”“ Nvidia came out with their first GPU that was designed for machinelearning, their Volta g/Selected Topics of AI, Prof. Dr. S.R. Radicke, Prof. Dr. J. Hahn WS 2019/20205 / 44

Motivation (4/5) Recent media articles GPU Usage in Cryptocurrency Mining (February 2018) “Rapidly evolving technology has made cryptocurrency mining a reality onhome computers.”“To draw an analogy, the master (CPU) managing the whole organization(the computer system) has a dedicated employee (GPU) to take care of aspecialized department (video-rendering functions).”“ the mining process requires higher efficiency in performing similar kindsof repetitive computations.”“The mining device continuously tries to decode the different hashesrepeatedly with only one digit changing in each attempt in the data that isgetting hashed. A GPU fits better for such dedicated processing.”“Despite technological advancements, certain challenges - like excessivepower consumption and limited profit potential – remain which mar theefficiency of the mining hardware.”“GPUs are now facing competition. FPGAs and ASICs, which scorebetter. at performing hash calculations, an essential function to blockchainmanagement in h/gpu-cryptocurrency-mining/Selected Topics of AI, Prof. Dr. S.R. Radicke, Prof. Dr. J. Hahn WS 2019/20206 / 44

Motivation (5/5) Nvidia launched their new RTX GPUs in August 2018 Based on the Turing architecture Tensor cores designed for deep learning Real-time ray tracing acceleration Advancement (more intelligent) shading techniquesNVIDIA’s GPU Technology Conference (GTC) is a global conferenceseries providing training, insights, and direct access to experts on thehottest topics in computing es/, ture-in-depth/,https://www.nvidia.com/en-us/gtc/, https://www.youtube.com/watch?v tjf-1BxpR9c&t 29sSelected Topics of AI, Prof. Dr. S.R. Radicke, Prof. Dr. J. Hahn WS 2019/20207 / 44

CPU vs GPUs (1/2) CPUs are designed to execute a single,arbitrary thread as fast as possible Out-of-order execution Branch prediction Memory pre-fetching Multi-level cache hierarchyGPUs are designed to perform thousands of identicalinstructions on different data elements in parallel, e.g.: Transforming every single vertex of a mesh tohomogeneous clip space Calculating texture and light for millions of pixelson the screen Simulating thousands of independent particles and so p-shot-vega-gpu-die/index.htmlSelected Topics of AI, Prof. Dr. S.R. Radicke, Prof. Dr. J. Hahn WS 2019/20208 / 44

CPU vs GPUs (2/2)https://www.youtube.com/watch?v -P28LKWTzrISelected Topics of AI, Prof. Dr. S.R. Radicke, Prof. Dr. J. Hahn WS 2019/20209 / 44

GPU Architecture Basics (1/6) GPUs operate on many data elements in parallel Example: A diffuse reflectance shaderImage source: From Shader Code to a Teraflop, Kayvon Fatahalian, SIGGRAPH 2008/2009Selected Topics of AI, Prof. Dr. S.R. Radicke, Prof. Dr. J. Hahn WS 2019/202010 / 44

GPU Architecture Basics (2/6) Comparison: CPU style coresImage source: From Shader Code to a Teraflop, Kayvon Fatahalian, SIGGRAPH 2008/2009Selected Topics of AI, Prof. Dr. S.R. Radicke, Prof. Dr. J. Hahn WS 2019/202011 / 44

GPU Architecture Basics (3/6) Four cores in parallel: Sixteen cores in parallel:Image source: From Shader Code to a Teraflop, Kayvon Fatahalian, SIGGRAPH 2008/2009Selected Topics of AI, Prof. Dr. S.R. Radicke, Prof. Dr. J. Hahn WS 2019/202012 / 44

GPU Architecture Basics (4/6)Image source: From Shader Code to a Teraflop, Kayvon Fatahalian, SIGGRAPH 2008/2009Selected Topics of AI, Prof. Dr. S.R. Radicke, Prof. Dr. J. Hahn WS 2019/202013 / 44

GPU Architecture Basics (5/6)Image source: From Shader Code to a Teraflop, Kayvon Fatahalian, SIGGRAPH 2008/2009Selected Topics of AI, Prof. Dr. S.R. Radicke, Prof. Dr. J. Hahn WS 2019/202014 / 44

GPU Architecture Basics (6/6) Using 16 of these SIMD cores:Image source: From Shader Code to a Teraflop, Kayvon Fatahalian, SIGGRAPH 2008/2009Selected Topics of AI, Prof. Dr. S.R. Radicke, Prof. Dr. J. Hahn WS 2019/202015 / 44

But what about branches? (1/2)Image source: From Shader Code to a Teraflop, Kayvon Fatahalian, SIGGRAPH 2008/2009Selected Topics of AI, Prof. Dr. S.R. Radicke, Prof. Dr. J. Hahn WS 2019/202016 / 44

But what about branches? (2/2)Image source: From Shader Code to a Teraflop, Kayvon Fatahalian, SIGGRAPH 2008/2009Selected Topics of AI, Prof. Dr. S.R. Radicke, Prof. Dr. J. Hahn WS 2019/202017 / 44

Dealing with stalls: CPUs vs. GPUs Memory access is a bottleneck(latency 200-600 clock cycles) CPUs: avoid stalls (while waiting for data)by sophisticated caching and flow control Spend lots of transistors for caching and flowcontrol Longer time slices between thread switches(Save and Restore of execution context is expensive) GPUs: hide stalls by doing computations for otherthreads in the meantime Spend most of the transistors for compute units frequent thread switches no cost(all execution contexts simultaneously available)source: OpenCL programming overview (nVidia)Selected Topics of AI, Prof. Dr. S.R. Radicke, Prof. Dr. J. Hahn WS 2019/202018 / 44

Hiding Shader Stalls (1/2)Image source: From Shader Code to a Teraflop, Kayvon Fatahalian, SIGGRAPH 2008/2009Selected Topics of AI, Prof. Dr. S.R. Radicke, Prof. Dr. J. Hahn WS 2019/202019 / 44

Hiding Shader Stalls (2/2)Image source: From Shader Code to a Teraflop, Kayvon Fatahalian, SIGGRAPH 2008/2009Selected Topics of AI, Prof. Dr. S.R. Radicke, Prof. Dr. J. Hahn WS 2019/202020 / 44

Storing ContextsImage source: From Shader Code to a Teraflop, Kayvon Fatahalian, SIGGRAPH 2008/2009Selected Topics of AI, Prof. Dr. S.R. Radicke, Prof. Dr. J. Hahn WS 2019/202021 / 44

GPU Programming Model: CUDA (1/4) Kernels are C-style functions which are executed N times in parallelby N different CUDA threadsThreads are identified by three-dimensionalthread and block indices.Local, shared and globalmemory ng-guide/index.htmlSelected Topics of AI, Prof. Dr. S.R. Radicke, Prof. Dr. J. Hahn WS 2019/202022 / 44

GPU Programming Model: CUDA (2/4) CUDA threads execute on a physically separate device that operatesas a coprocessor to the host running the main program. Automatic scalability More streaming multiprocessors or CUDA coresData need to be copied betweenhost and device Synchronous or asynchronous Several mapping optionsSynchronization must be done explicitlyon multiple levels Between Host and Device Between individual kernels Between threads within a kernel Memory fences/barriers within thread ing-guide/index.htmlSelected Topics of AI, Prof. Dr. S.R. Radicke, Prof. Dr. J. Hahn WS 2019/202023 / 44

GPU Programming Model: CUDA (3/4) Example: Matrix multiplication using shared ing-guide/index.htmlSelected Topics of AI, Prof. Dr. S.R. Radicke, Prof. Dr. J. Hahn WS 2019/202024 / 44

GPU Programming Model: CUDA (4/4) Example: Matrix multiplication with shared memory// Matrix multiplication kernel called by MatMul()global void MatMulKernel(Matrix A, Matrix B, Matrix C){// Block row and columnint blockRow blockIdx.y; int blockCol blockIdx.x;// Each thread block computes one sub-matrix Csub of CMatrix Csub GetSubMatrix(C, blockRow, blockCol);// Each thread computes one element of Csub by accumulating results into Cvaluefloat Cvalue 0;// Thread row and column within Csubint row threadIdx.y; int col threadIdx.x;// Loop over all the sub-matrices of A and B that are required to compute Csub// Multiply each pair of sub-matrices together and accumulate the resultsfor (int m 0; m (A.width / BLOCK SIZE); m) {// Get sub-matrix Asub of AMatrix Asub GetSubMatrix(A, blockRow, m);// Get sub-matrix Bsub of BMatrix Bsub GetSubMatrix(B, m, blockCol);// Shared memory used to store Asub and Bsub respectivelyshared float As[BLOCK SIZE][BLOCK SIZE];shared float Bs[BLOCK SIZE][BLOCK SIZE];// Load Asub and Bsub from device memory to shared memory// Each thread loads one element of each sub-matrixAs[row][col] GetElement(Asub, row, col); Bs[row][col] GetElement(Bsub, row, col);// Synchronize to make sure the sub-matrices are loaded before starting the computationsyncthreads();// Multiply Asub and Bsub togetherfor (int e 0; e BLOCK SIZE; e) Cvalue As[row][e] * Bs[e][col];// Synchronize to make sure that the preceding computation is done before loading two new// sub-matrices of A and B in the next iterationsyncthreads();}// Write Csub to device memory Each thread writes one elementSetElement(Csub, row, col, amming-guide/index.htmlSelected Topics of AI, Prof. Dr. S.R. Radicke, Prof. Dr. J. Hahn WS 2019/202025 / 44

Game AI on GPUs? (1/7) The AI in most modern games addresses three basic needs:1. The ability to move characters2. The ability to make decisions about where to move3. The ability to think tactically or strategicall Pac Man (1979) used a state machinefor each enemy Either chasing you or running away They took a semi-random route at each junctionGoldeneye 007 (1997) used a sense simulation system A character could see their colleaguesand would notice if they were killedCreatures (1997) has one of the mostcomplex AI systems seen in a game A neural network-based brain for each nware.com/game/creatures-3i7Selected Topics of AI, Prof. Dr. S.R. Radicke, Prof. Dr. J. Hahn WS 2019/202026 / 44

Game AI on GPUs? (2/7) Investigating common AI techniques Movement Pathfinding Decision Making 995246/, https://en.wikipedia.org/wiki/Pathfinding, tml, i lzqynygzfabh64awdz Selected Topics of AI, Prof. Dr. S.R. Radicke, Prof. Dr. J. Hahn WS 2019/202027 / 44

Game AI on GPUs? (3/7) Movement: One of the most fundamental requirementsof AI is to move characters around in the game sensibly.Suitable for GPUs? Can only be parallelized on a per-agent level Interaction between agents would be very problematic Terrain / map / navmesh data would need to be “flattened” Output would always need to be send back to the hostSelected Topics of AI, Prof. Dr. S.R. Radicke, Prof. Dr. J. Hahn WS 2019/202028 / 44

Game AI on GPUs? (4/7) Pathfinding: The AI must be able to calculate a suitableroute through the game level to get from where it isnow to its goal. Many games rely on the A* algorithm It requires the game level berepresented as a directednegative weighted graph.non-Suitable for GPUs? Management of graphs and node listsbe highly inefficient Way too many branches in the code Some research exists, e.g.: wouldZhou and Zeng, “Massively Parallel A* Search on a GPU”,Proceedings of the Twenty-Ninth AAAI Conference onArtificial Intelligence, 2015Bleiweiss, “GPU Accelerated Pathfinding”, NVIDIA, 2008Selected Topics of AI, Prof. Dr. S.R. Radicke, Prof. Dr. J. Hahn WS 2019/202029 / 44

Game AI on GPUs? (5/7) Decision Making: The ability of a character to decide what to do. Decision Trees Finite State Machines Behaviour TreesSuitable for GPUs? Can only be parallelized on a per-agent level Internal and external knowledge is hard to organize, acquire and update By nature requires many branches (decisions), thus would run veryinefficiently on GPUs Output would always need to be send back to the hostSelected Topics of AI, Prof. Dr. S.R. Radicke, Prof. Dr. J. Hahn WS 2019/202030 / 44

Game AI on GPUs? (6/7) Learning: In principle has the potential to adapt to the player,learning their tricks and providing a consistent challenge. Online learning Performed during the game, while the player is playing. Allows characters to adapt dynamically to the player’s style.Offline learning The majority of learning in game AI is done offline, either between levels ofthe game or more often at the development studioThis allows more unpredictable learning algorithms to be tried out and theirresults to be tested exhaustively.The learning algorithms in games are usually applied offline; it is rare to findgames that use any kind of online learning.Suitable for GPUs? Many combinations can be tested in parallel Training of artificial neural networks works very well The trained network data can be stored and shipped with the gameSelected Topics of AI, Prof. Dr. S.R. Radicke, Prof. Dr. J. Hahn WS 2019/202031 / 44

Game AI on GPUs? (7/7) Machine Learning: An algorithmplays Super Mario World Artificial Neural Network Genetic AlgorithmSeveral other such experiments exist: -intel-mkl-python/ lay-dino-run-e37f37bdf153 “I trained my model for around 2 million frames for a deep-neural-network-to-play-fifa-18dce54d45e675?gi 9a4fd5113a89 “It’s remarkable how consistent the behaviour of this simple network is.”“I believe it can get very close to human level performance with many morehours of training data, .”But what about AAA games!?https://www.youtube.com/watch?v qv6UVOQ0F44Selected Topics of AI, Prof. Dr. S.R. Radicke, Prof. Dr. J. Hahn WS 2019/202032 / 44

Neural Networks and Deep Learning (1/4) The inner-workings of the human brain are often modeled around theconcept of Neurons ( 100 billion) and the networks of neurons knownas Biological Neural Networks. Neurons communicate with one another across their synapses. Activation: A single neuron will pass a message to another neuronacross if the sum of weighted input signals from one or more neurons(summation) into it is great enough to cause the message transmission. Each neuron applies a function or transformation to the weighted inputs. The thinking or processing that our brain carries out are the result ofthese neural networks in action.Artificial Neural Networks (ANNs)are statistical models directlyinspired by, and partially modelledon biological neural lained.htmlSelected Topics of AI, Prof. Dr. S.R. Radicke, Prof. Dr. J. Hahn WS 2019/202033 / 44

Neural Networks and Deep Learning (2/4) Artificial Neural Networks Are capable of modelling and processing nonlinear relationshipsbetween inputs and outputs in parallel. They contain adaptive weights along paths between neurons that can betuned by a learning algorithm that learns from observed data in order toimprove the model. The cost function is what’s used to learn the optimal solution to theproblem being solved. In a simple model, the first layer is the input layer,followed by one hidden layer, and lastly by anlayer. Each layer can contain one orneurons.outputmoreDeep Learning, describes certain types of ANNsand related algorithms. They process this datathrough many layers of nonlinear transformationsof the input data in order to calculate a target output.Selected Topics of AI, Prof. Dr. S.R. Radicke, Prof. Dr. J. Hahn WS 2019/202034 / 44

Neural Networks and Deep Learning (3/4) Deep Learning Falls under the group of techniques known as feature learning orrepresentation learning. The machine learning algorithms themselves learn the optimalparameters to create the best performing model. For neural network-based deep learning models, the number of layersare greater than in so-called shallow learning algorithms. Deep learning algorithms rely more on optimal model selection andoptimization through model tuning. In addition to statistical techniques, neural networks and deep learningleverage concepts and techniques from signal processing as well,including

2 / 44 Contents Motivation Recent Media Articles Nvidia launched their RTX GPUs CPU vs GPUs GPU Architecture Basics GPU Programming Model: CUDA Game AI on GPUs? Investigating Common AI Techniques Neural Networks and Deep Learning Nvidia’s RTX Architecture Real-Time Rendering now relies on AI!? Selected Topics of A

Related Documents:

OpenCV GPU header file Upload image from CPU to GPU memory Allocate a temp output image on the GPU Process images on the GPU Process images on the GPU Download image from GPU to CPU mem OpenCV CUDA example #include opencv2/opencv.hpp #include <

plify development of HPC applications, they can increase the difficulty of tuning GPU kernels (routines compiled for offloading to a GPU) for high performance by separating developers from many key details, such as what GPU code is generated and how it will be executed. To harness the full power of GPU-accelerated nodes, application

GPU Tutorial 1: Introduction to GPU Computing Summary This tutorial introduces the concept of GPU computation. CUDA is employed as a framework for this, but the principles map to any vendor’s hardware. We provide an overview of GPU computation, its origins and development, before presenting both the CUDA hardware and software APIs. New Concepts

limitation, GPU implementers made the pixel processor in the GPU programmable (via small programs called shaders). Over time, to handle increasing shader complexity, the GPU processing elements were redesigned to support more generalized mathematical, logic and flow control operations. Enabling GPU Computing: Introduction to OpenCL

Possibly: OptiX speeds both ray tracing and GPU devel. Not Always: Out-of-Core Support with OptiX 2.5 GPU Ray Tracing Myths 1. The only technique possible on the GPU is “path tracing” 2. You can only use (expensive) Professional GPUs 3. A GPU farm is more expensive than a CPU farm 4. A

Latest developments in GPU acceleration for 3D Full Wave Electromagnetic simulation. Current and future GPU developments at CST; detailed simulation results. Keywords: gpu acceleration; 3d full wave electromagnetic simulation, cst studio suite, mpi-gpu, gpu technology confere

transplant a parallel approach from a single-GPU to a multi-GPU system. One major reason is the lacks of both program-ming models and well-established inter-GPU communication for a multi-GPU system. Although major GPU suppliers, such as NVIDIA and AMD, support multi-GPUs by establishing Scalable Link Interface (SLI) and Crossfire, respectively .

NVIDIA vCS Virtual GPU Types NVIDIA vGPU software uses temporal partitioning and has full IOMMU protection for the virtual machines that are configured with vGPUs. Virtual GPU provides access to shared resources and the execution engines of the GPU: Graphics/Compute , Copy Engines. A GPU hardware scheduler is used when VMs share GPU resources.