Take GPU Processing Power Beyond Graphics With GPU .

3y ago

65 Views

5 Downloads

426.07 KB

6 Pages

Last View : 14d ago

Last Download : 3m ago

Upload by : Halle Mcleod

Report this link

Download PDF

Transcription

Take GPU Processing PowerBeyond Graphics with Mali GPUComputingRoberto MijatVisual Computing Marketing ManagerAugust 2012IntroductionModern processor and SoC architectures endorse parallelism as a pathway to get more performancemore efficiently. GPUs deliver superior computational power for massive data-parallel workloads. ModernGPUs are becoming increasingly programmable and can be used for general purpose processing.Frameworks such as OpenCL and Android Renderscript enable this. In order to achieveuncompromised features support and performance you need a processor specifically designed forgeneral purpose computation. After an introduction to the technology and how it is enabled, thispresentation will explore design considerations of the ARM Mali-T600 series of GPUs that make them theperfect fit for GPU Computing.Copyright 2012 ARM Limited. All rights reserved.The ARM logo is a registered trademark of ARM Ltd.All other trademarks are the property of their respective owners and are acknowledgedPage 1 of 6

The rise of parallel computationParallelism is at the core of modern processor architecture design: it enables increased processingperformance and efficiency. Superscalar CPUs implement instruction level parallelism (ILP). SingleInstruction Multiple Data (SIMD) architectures enable faster computation of vector data. Simultaneousmultithreading (SMT) is used to mitigate memory latency overheads. Multi-core SMP can providesignificant performance uplift and energy savings by executing multiple threads/programs in parallel. SoCdesigners combine diverse accelerators together on the same die sharing a unified bus matrix. All thesetechnologies enable increased performance and more efficient computation, by doing things in parallel.They are all well established techniques in modern computing.Portability and complexity Today’s computing platforms are complex heterogeneous systems (HMP). For example the SamsungExynos Quad SoC, which is at the heart of the award winning Samsung Galaxy S III smartphone,includes: an ARM Cortex -A9 quad-core CPU implementing VPF and 128-bit NEON Advanced SIMD,a quad-core Mali-400 MP 2D/3D graphics processor, a JPEG hardware codec, a multi-format videohardware codec and a cryptography engine.Programming approaches for each processor (CPU, GPU, ISP, DSP etc) are all different. Optimizingcode for a selected accelerator requires specialized expertise. Code written for one accelerator is typicallynon portable to other architectures. This leads to a suboptimal utilization of the platform’s processingpotential. Writing parallel code that scales is also very difficult, and has proven illusive for mostapplications in the mobile industry today.GPUs: Moving beyond graphics Early GPUs were specifically designed to implement graphics programming languages such as OpenGL .Whilst this meant that OpenGL applications/operations would typically achieve good performance, it alsomeant that programmers were limited to the fixed functionality expressed by the API. To address thislimitation, GPU implementers made the pixel processor in the GPU programmable (via small programscalled shaders). Over time, to handle increasing shader complexity, the GPU processing elements wereredesigned to support more generalized mathematical, logic and flow control operations.Enabling GPU Computing: Introduction to OpenCLOpenCL (Open Compute Language) provides a solution that enables easier, better, portableprogramming of heterogeneous parallel processing systems and unleashes the computational power ofGPUs needed by emerging workloads. OpenCL creates a foundation layer for a parallel computingecosystem and takes graphics processing power beyond graphics. It is defined by the Khronos Group,and it is a royalty-free open standard, interoperable with existing APIs.The OpenCL framework includes:A framework (compiler, runtime, libraries) to enable general purpose parallel computingOpenCL C, a computing language portable across heterogeneous processing platforms (a supersetof a subset of C99, removing pointers and recursion but adding vector data types and other parallelcomputing features)Copyright 2012 ARM Limited. All rights reserved.The ARM logo is a registered trademark of ARM Ltd.All other trademarks are the property of their respective owners and are acknowledgedPage 2 of 6

-An API to define and control (interrogate and configure) the platform and coordinate parallelcomputation across processors.The developer will identify performance critical areas in its application and rewrite them using the OpenCLC language and API. An OpenCL C function is known as kernel. Kernels and supporting code areconsolidated into programs, equivalent in principle to DLLs.OpenCL implements a control-slave architecture, where the host processor (on which the applicationruns) offloads work to a computing resource. When a kernel is submitted for execution by the host, anindex space is defined. The index space represents the set of data that the kernel will be applied to. It canhave 1, 2 or 3 dimension (hence the name of NDRange, or N-dimensional range). The instance of akernel executing on an individual entry in the index space takes the name of work-item. Work items canbe grouped into work-groups, which will execute on a single compute unit.Kernels can be compiled ahead of time and stored in the application as binaries, or JIT-compiled on thedevice, in which case the kernel code will be embedded in the application as source (or a suitableintermediate representation). The kernel can be compiled to executeon any of the supported devices in the platform.The application developer defines a context of execution, which is theenvironment the OpenCL C kernels execute in. The context includesthe list of target devices, associated command queues, the memoryaccessible by the devices and its properties. Using the API, theapplication can queue commands such as: execution of kernelobjects, moving of memory between host and processing plane,synchronization to enforce ordered execution between commands,events to be triggered or waited upon, and execution barriers.Copyright 2012 ARM Limited. All rights reserved.The ARM logo is a registered trademark of ARM Ltd.All other trademarks are the property of their respective owners and are acknowledgedPage 3 of 6

OpenCL enables general purpose computing to be carried out on the GPU. The ARM Mali-T600 series ofGPUs has been specifically designed for general purpose GPU computing, and an OpenCL 1.1. FullProfile DDK is available from ARM.More information of OpenCL can be found on the Khronos website.Android RenderscriptRenderscript is a high performance computation API for Android. It has been officially introduced inHoneycomb.Renderscript complements existing Android APIs by adding:A compute API for parallel processing similar to CUDA/OpenCLA scripting language based on C99 supporting vector data types (called ScriptC)Earlier versions of Renderscript included an experimental graphics engine component. This has beendeprecated since Android 4.1 Jelly Bean.Like OpenCL, Renderscript implements a cross-platform control-slave architecture with runtimecompilation. The majority of the application will be written using the Dalvik APIs as usual, whilstperformance critical code – or code more suitable for parallel execution – will be identified and rewrittenusing the ScriptC language.A key design consideration of Renderscript is performance portability: the API is designed so that a scriptshould show good performance across all devices instead of peak performance for one device at theexpense of others (naturally, intensive data parallel algorithms will continue to be more suitable foracceleration by the GPU). The compilation infrastructure is based around LLVM. A first stage ofcompilation is performed offline: portable bitcode is generated as well as all the necessary glue code toenable visibility of the script’s data and functions from the Java application (the reflected layer). The APKpackage will include the Java application and associated files, assets and so forth, plus the RenderScriptCopyright 2012 ARM Limited. All rights reserved.The ARM logo is a registered trademark of ARM Ltd.All other trademarks are the property of their respective owners and are acknowledgedPage 4 of 6

portable binary. When Dalvik JIT-compiles the application, the intermediate bitcode is also compiled forthe target processor. The compiled bitcode will be cached to speed up future loading of the application,and re-compiled only if the scripts are updated. This split enables aggressive machine-independentoptimization to be carried out offline, therefore making the online JIT compilation lighter-weight and moresuitable for energy-limited battery-powered mobile devices.Up until Android 4.1, Renderscript is only enabled to target the CPU (with VFP/NEON). In the near future,this will be extended to target other accelerator, such as the GPUs.ARM Mali-T600 series of GPUs: Designed for GPU ComputingTo achieve optimal general purpose computational throughput you need a purposely designed processor,such as the Mali-T600 series of GPUs from ARM. These are designed to integrate the graphics andcompute functionalities together, optimizing interoperation between the two both at hardware andsoftware driver levels.ARM Mali-T600 GPUs are designed to work with the latest version (4) of the AMBA (AdvancedMicrocontroller Bus Architecture) which feature Cache Coherent Interconnect (CCI). Data shared betweenprocessors in the system, a natural occurrence in heterogeneous computing, no longer requires costly(cycles and joules) synchronization via external memory and explicit cache maintenance operations. All ofthis is now performed in hardware, and is enabled transparently inside the drivers provided by ARM. Inaddition to reduced memory traffic, CCI avoids superfluous sharing of data: only data genuinelyrequested by another master is transferred to it, to the granularity of a cache line. No need to flush awhole buffer or data structure anymore.Computing frameworks like Renderscript and OpenCL introduce significant additional requirements forprecision and support of mathematical functions. In addition to satisfy IEEE 754 precision requirementsfor single and double floating point, Mali-T600 GPUs implement the majority of these mathematicaloperations directly in hardware. In fact over 60% of floating point functions defined by the OpenCLspecification is hardware accelerated (most trigonometric functions, power and exponent, square root anddivision) and all of them meet IEEE 754 precision requirements. Over 70% of integer operations are alsoimplemented in hardware. Mali-T600 GPUs natively supports 64-bit integer data types, something notcommon in competing architectures. Barriers and atomics are also implemented in hardware. In essence,the vast majority of operations take place in a single cycle (or a few cycles max). This provides animmense step-up in performance for general purpose computation if compared to current generation ofGPUs not purposely designed for it.There is more. As well as task management and event dependencies being optimized in hardware, taskdependency coordination is entirely designed into the hardware job manager unit. The software driverresponsibility is reduced to handing over the workload to the GPU: all scheduling, prioritization and runtime synchronization take place transparently, behind the scenes.Typically GPUs are designed to favor throughput over latency. Mali-T600 GPUs treat generic memoryload/stores as first-class operations with proper latency tolerance.Typically developers use a blend of APIs during development. The Mali software driver infrastructure istightly integrated and optimized. All APIs of the Mali software stack architecture share the same high-levelAPI objects, the same address space, the same queues, dependencies and events. This approachreduce code footprint and significantly increase performance. Data structures are shared between APIsand devices, to avoid unnecessary memory copies.Copyright 2012 ARM Limited. All rights reserved.The ARM logo is a registered trademark of ARM Ltd.All other trademarks are the property of their respective owners and are acknowledgedPage 5 of 6

Use casesIn addition to the many scientific, academic, industrial andfinancial use cases, there is a wide variety of applicationswhere general purpose GPU computing brings greatbenefits. Examples include:Computational Photography and Computer Vision:compensating the limitation of the hardware sensor,image stabilization, HDR compensation, face and smilerecognition, image editing, filters, landmark & contextrecognition, superimposition of s,transcoding, super-scaling, 2D-3D conversionStream Data Processing: deep packet inspection,antivirus, encryption, compression, data analyticsUIs, Gaming and 3D Modelling: voice recognition,gesture recognition, physics, AI, photorealistic raytracing, modellingAugmented RealityAnd many many more!GPU computing can be used for any computationallyintensive task, but will be most efficient where parallelismcan be exploited (either parallelism within the task, orwhere multiple tasks can be executed simultaneously).ConclusionModern processor and SoC architectures endorseparallelism as a pathway to get more performance moreefficiently. GPUs deliver superior computational power formassive data-parallel workloads. Modern GPUs arebecoming increasingly programmable and can be used forgeneral purpose processing. OpenCL and Renderscriptenable this technology providingeasier,betterprogramming of heterogeneous parallel compute systemsand unleashing the computational power of GPUs neededby emerging workloads.To achieve optimal general purpose computationalthroughput you need a purposely designed GPU, such asthe Mali-T600 series of GPUs from ARM. The ARM MaliT600 series of GPUs is designed to integrate the nginteroperation between the two and delivering marketleading 3D graphics and general purpose parallelcomputation.For more information: gpucompute-info@arm.com.Copyright 2012 ARM Limited. All rights reserved.The ARM logo is a registered trademark of ARM Ltd.All other trademarks are the property of their respective owners and are acknowledgedPage 6 of 6

limitation, GPU implementers made the pixel processor in the GPU programmable (via small programs called shaders). Over time, to handle increasing shader complexity, the GPU processing elements were redesigned to support more generalized mathematical, logic and flow control operations. Enabling GPU Computing: Introduction to OpenCL

Related Documents:

OpenCV on a GPU

OpenCV GPU header file Upload image from CPU to GPU memory Allocate a temp output image on the GPU Process images on the GPU Process images on the GPU Download image from GPU to CPU mem OpenCV CUDA example #include opencv2/opencv.hpp #include <

154 Views

2y ago

GPU Tutorial 1: Introduction to GPU Computing

GPU Tutorial 1: Introduction to GPU Computing Summary This tutorial introduces the concept of GPU computation. CUDA is employed as a framework for this, but the principles map to any vendor’s hardware. We provide an overview of GPU computation, its origins and development, before presenting both the CUDA hardware and software APIs. New Concepts

43 Views

3y ago

GPU Ray Tracing - GPU Technology Conference 2012

Possibly: OptiX speeds both ray tracing and GPU devel. Not Always: Out-of-Core Support with OptiX 2.5 GPU Ray Tracing Myths 1. The only technique possible on the GPU is “path tracing” 2. You can only use (expensive) Professional GPUs 3. A GPU farm is more expensive than a CPU farm 4. A

40 Views

2y ago

GPU Computing Advances in 3D Electromagnetic Simulation

Latest developments in GPU acceleration for 3D Full Wave Electromagnetic simulation. Current and future GPU developments at CST; detailed simulation results. Keywords: gpu acceleration; 3d full wave electromagnetic simulation, cst studio suite, mpi-gpu, gpu technology confere

32 Views

2y ago

Load Balanced Parallel GPU Out-of-Core for Continuous LOD Model ...

transplant a parallel approach from a single-GPU to a multi-GPU system. One major reason is the lacks of both program-ming models and well-established inter-GPU communication for a multi-GPU system. Although major GPU suppliers, such as NVIDIA and AMD, support multi-GPUs by establishing Scalable Link Interface (SLI) and Crossﬁre, respectively .

14 Views

1y ago

NVIDIA Multi-Instance GPU and NVIDIA Virtual Compute Server

NVIDIA vCS Virtual GPU Types NVIDIA vGPU software uses temporal partitioning and has full IOMMU protection for the virtual machines that are configured with vGPUs. Virtual GPU provides access to shared resources and the execution engines of the GPU: Graphics/Compute , Copy Engines. A GPU hardware scheduler is used when VMs share GPU resources.

18 Views

1y ago

Introduction to GPU computing - Boston University

Introduction to GPU computing Felipe A. Cruz Nagasaki Advanced Computing Center Nagasaki University, Japan. Felipe A. Cruz Nagasaki University The GPU evolution The Graphic Processing Unit (GPU) is a processor that was specialized for processing graphics. The GPU has recently evolved towards a more ﬂexible architecture.

44 Views

3y ago

'Good counselling is just excellent communication skills

"Good counselling is just excellent communication skills! Or is it?" Authors: Ms Merrelyn Bates Mr Paul Stevenson ABSTRACT There have been arguments about whether counselling is a new profession while other established professions engage in similar practices and have a legitimacy of their own. Theoretical frameworks for professional counselling are discussed with an emphasis on practice .

104 Views

3y ago

Recent Views

Career Options for In-House Counsel

Association of Corporate Counsel 1025 Connecticut Avenue, NW, Suite 200 Washington, DC 20036 USA tel 1 202.293.4103, fax 1 202.293.4701 www.acc.com By in-house counsel, for in-house counsel. Association of Corporate Counsel 1025 Connecticut Avenue, NW, Suite 200 Washington, DC 20036 USA tel 1 202.293.4

2y ago

181 Views

Corporate Counsel College

CORPORATE COUNSEL TRAINING ACADEMY For in-house counsel newer to the role. For more information, please view the Corporate Counsel Training Academy brochure on www.iadclaw.org. 5:00 - 6:30 p.m. COCKTAIL RECEPTION THURSDAY, APRIL 7, 2022 7:15 - 8:00 a.m. BREAKFAST 8:00 - 8:15 a.m. OPENING REMARKS John T. Lay, Jr., Corporate Counsel College Dean .

1y ago

115 Views

Session 102 How to Become Insurance Panel Counsel & Tips on Ethical .

The retained counsel maintains a relationship between the insured client(s) and the carrier with the common goal of resolving the litigation or claim(s) asserted against the insured. In such a relationship, the carrier pays the defense cost and the legal fees of the panel counsel. However, the panel counsel/staff counsel

1y ago

124 Views

OFFICE OF THE GENERAL COUNSEL MEMORANDUM GC 15- 04 March 18, 2015

OFFICE OF THE GENERAL COUNSEL MEMORANDUM GC 15- 04 March 18, 2015 TO: All Regional Directors, Officers-in-Charge, and Resident Officers FROM: Richard F. Griffin, Jr., General Counsel SUBJECT: Report of the General Counsel Concerning Employer Rules Attached is a report from the General Counsel concerning recent employer rule cases. Attachment

1y ago

108 Views

Corporate Counsel: In the Crosshairs of a Criminal Ivestigation

Corporate counsel are expected, and in some cases required, to act independently of the very executives to whom they report. The fiduciary duties of corporate counsel now dic-tate that, at the first signs of suspicious activity, corporate counsel are expected to consult with outside counsel, initi-

1y ago

102 Views

Summaries of Published Successful Ineffective Assistance of Counsel .

innocence; counsel thought petitioner believed what he was saying but counsel disbelieved it, and counsel's approach was not designed to avoid suborning perjury but rather to avoid a death sentence. SCOTUS not apply did . Strickland. here "[b]ecause a client's autonomy, not counsel's competence, is in issue." 138 S. Ct. at 1510- 11.

1y ago

85 Views

SM Recruiting & Retaining In-House Counsel

May 30, 2013 · By in-house counsel, for in-house counsel. Association of Corporate Counsel 1025 Connecticut Avenue, NW, Suite 200 Washington, DC 20036 USA tel 1 202.2

2y ago

125 Views

Assistant General Counsel for Litigation, Employment and .

The Assistant General Counsel for Litigation, Employment, and Oversight (AGC/LEO) is the principal assistant and advisor to the General Counsel and Deputy General Counsel on legal aspects of the Department’s activities in the fields of employment, labo

2y ago

109 Views

Case: 15-6397 Document: 24 Filed: 02/04/2016 Page: 1 .

AMICUS CURIAE IN SUPPORT OF THE APPELLANT . ANNE K. SMALL General Counsel . SANKET J. BULSARA Deputy General Counsel . MICHAEL A. CONLEY Solicitor . WILLIAM K. SHIREY Assistant General Counsel . STEPHEN G. YODER Senior Litigation Counsel . Securities and Exchange

2y ago

105 Views

USCA Case #13-5252 Document #1455974 Filed: 09/11/2013 .

1615 H St., NW Washington, DC 20062 202.463.5337 Counsel for Appellant the Chamber of Commerce of the United States of America Of Counsel: Quentin Riegel National Association of Manufacturers 733 10th St., NW Suite 700 Washington, DC 20001 202.637.3000 Counsel for Appellant the National Association of Manufacturers Of Counsel: Maria Ghazal

2y ago

322 Views

OUTSIDE COUNSEL GUIDELINES - Government of New Jersey

counsel shall designate a Relationship Attorney to be the Designated Attorney's principal contact. Outside counsel may expect the Designated Attorney to provide clear, specific instructions; communicate the State's objectives; closely monitor the management plan and budget; follow the progress of the matter; keep outside counsel informed of .

1y ago

104 Views

Waiver of Counsel in Juvenile Court

Waiver of Counsel . 3 Waiver of Counsel in Juvenile Court . The Sixth Amendment states "[i]n all criminal prosecutions, the accused shall enjoy the right . . . to have the Assistance of Counsel for his defence." (U.S. Constit, amend. VI). This right is part of the Constitutional jurisdiction of the Court (Johnson v. Zerbst, 1938). Without it, the

1y ago

113 Views

Should Compliance Report to the General Counsel?

than 800 responses, 88% are opposed to the corporate counsel serving as the compliance officer, and 80% oppose having com-pliance report to the corporate counsel's office. Detailed Findings o Survey respondents were strongly opposed to the idea of corporate counsel also serving as the compliance officer.

1y ago

112 Views

The General Counsel Report 2021 Rising To Today's Challenges and .

general counsel evolved from the office of "no," to one of significant strategic influence. Once largely viewed as a cost center, or barrier to corporate progress, the general counsel of today are business drivers in their own right. This evolution for the general counsel came in the nick of time for the turmoil of 2020.

1y ago

105 Views

Leveraging Legal Leadership: The General Counsel as a Corporate Culture .

counsel and legal department, but the failure to draw that link may prove shortsighted on the part of the board. Given the importance of the general counsel in matters of ethics, compliance, corporate governance, and risk and reputation management, the general counsel should be a key ally and partner in establishing a

1y ago

150 Views

Take GPU Processing Power Beyond Graphics With GPU .

It looks like you're using an ad-blocker