The Future Is Heterogeneous Computing

2y ago
29 Views
3 Downloads
1.98 MB
26 Pages
Last View : Today
Last Download : 8m ago
Upload by : Rafael Ruffin
Transcription

Valencia, SpainSeptember 7, 2010The Future Is Heterogeneous ComputingMike HoustonPrincipal Architect, Accelerated Parallel ProcessingAdvanced Micro DevicesOctober 27th, 2010Page 1 The Future Is Heterogeneous Computing Oct 27, 2010

Workload Example: Changing Consumer Behavior20 hoursof videouploaded to YouTubeApproximately9 billionvideo files owned arehigh-definitionevery minute50 million digital media filesadded to personal content librariesevery day1000imagesare uploaded to Facebookevery secondPage 2 The Future Is Heterogeneous Computing Oct 27, 20102

Challenges for Next Generation Systems The Power Wall Even more broadly constraining in the future! Complexity Management – HW and SW Principles for managing exponential growth Parallelism, Programmability and Efficiency Optimized SW for System-level Solutions System balance Memory Technologies and System Design Interconnect DesignPage 3 The Future Is Heterogeneous Computing Oct 27, 2010

The Power Wall Easy prediction: Power will continue to be the #1 designconstraint for Computer Systems design. Why? Vmin will not continue tracking Moore’s Law Integration of system-level components consume chip power– A well utilized 100GB/sec DDR memory interface consumes 15W for the I/O alone! 2nd Order Effects of Power– Thermal, packaging & cooling (node-level & datacenter-level)– Electrical stability in the face of rising variablity Thermal Design Points (TDPs) in all market segmentscontinue to drop Lightly loaded and idle power characteristics are keyparameters in the Operational Expense (OpEx) equation Percent of total world energy consumed by computingdevices continues to grow year-on-yearPage 4 The Future Is Heterogeneous Computing Oct 27, 2010

Optimized SW for System-level Solutions Long history of SW optimizations for HW “characteristics” Optimizing compilersCache / TLB blockingMulti-processor coordination: communication & synchronizationNon-uniform memory characteristics: Process and memory affinity Scarcity/Abundance principle favors increased use ofAbstractions Abstraction leads to Increased productivity but costs performance Still allow experts burrow down into lower level “on the metal” details System-level Integration Era will demand even more Many Core: user mode and/or managed runtime scheduling? Heterogeneous Many Core: capability aware scheduling? SW productivity versus optimization dichotomy Exposed HW leads to better performance but requires a “platformcharacteristics aware programming model”Page 5 The Future Is Heterogeneous Computing Oct 27, 2010

The Memory Wall – getting thickerThere has always been a Critical Balance betweenData Availability and ProcessingSituationWhen?ImplicationDRAM vs CPU Cycle Time GapEarly1990sMemory wait timedominates computingNon-blockingcachesO-o-O Machines Mid1990sLarger working setsMore diverse data typesLarger CachesCache HierarchiesElaborate prefetch 2005 andbeyondMultiple working sets!Virtual Machines!More memory accessesHuge CachesMultiple MemoryControllersExtreme PHYs2009 andbeyondEven larger working setsLarger data typesAccelerated ParallelProcessingChip StackingSW Productivity CrisisObject oriented languages;Managed runtime environmentsSingle Thread CMP FocusNew & Emerging AbstractionsBrowser-based RuntimesImage/Video as basic data typesThroughput-based designsPage 6Industry Solutions The Future Is Heterogeneous Computing Oct 27, 2010 TBD

Interconnect Challenges Coherence domain – knowing when to stop Interesting implications for on-chip interconnect networks Industry Mantra: “Never bet against Ethernet” But, current Ethernet not well suited for lossless transmission Troublesome for storage, messaging and more The more subtle and trickier problems Adaptive routing, congestion management, QOS, End-to-endcharacteristics, and more Data centers of tomorrow are going to take great interest inthis areaPage 7 The Future Is Heterogeneous Computing Oct 27, 2010

Single-thread Performance- DFM- Variability- Reliability- Wire delayTimewe arehereTimeowe arehereIssue WidthTimeLocality PerformanceThe IPC Complexity Wall IPCwe arehereoowe arehereSingle thread Perf (!)Single-thread Perfwe arehereServer: power DT: eliminate fansMobile: batteryoFrequency!oThe Frequency Wall The Power Wall Power Budget (TDP)Integration (log scale)Moore’s Law Cache SizePage 8 The Future Is Heterogeneous Computing Oct 27, 2010owe arehereTime?

Parallel Programs and Amdahl’s LawSpeed-up 1SW: % Serial WorkN: Number of processorsSW (1 – SW ) / N140120Assume 100W TDP SocketSpeed-up10010W for global clocking20W for on-chip network/caches15W for I/O (memory, PCIe, etc)800% Serial0% Serial10%SerialThis leaves 55W for all the cores 850mW per Core !60100%35% SerialSerial100% Serial402001248163264128Number of CPU CoresPage 9 The Future Is Heterogeneous Computing Oct 27, 2010

35 Years of Microprocessor Trend (SpecINT)Frequency(MHz)Typical Power(Watts)Number ofCoresOriginal data collected and plotted by M. Horowitz, F. Labonte, O. Shacham, K. Olukotun, L. Hammond and C. BattenDotted line extrapolations by C. MoorePage 10 The Future Is Heterogeneous Computing Oct 27, 2010

The Power Wall – Again! Escalating multi-core designs will crash into the power wall justlike single cores did due to escalating frequency Why? In order to maintain a reasonable balance, core additions must beaccompanied by increases in other resources that consume power(on-chip network, caches, memory and I/O BW, )– Spiral upwards effect on power The use of multiple cores forces each core to actually slow down– At some point, the power limits will not even allow you to activate allof the cores at the same time Small, low-power cores tend to be very weak on single-threadedgeneral purpose workloads– Customer value proposition will continue to demand excellentperformance on general purpose workloads– The transition to compelling general purpose parallel workloads willnot be a fast onePage 11 The Future Is Heterogeneous Computing Oct 27, 2010

What about Throughput Computing? Works around Amdahl’s law by focusing on throughput ofmultiple independent tasks Servers: Transaction Processing; Web Clicks; Search Queries Clients: Graphics; Multimedia; Sensory Inputs (future) HPC: Data-level parallelism New bottlenecks start to appear As some point, the OS itself becomes the “serial component”User mode scheduling and task-stealing runtimes Memory BW – Goal is to saturate the pipeline to memoryLarge number of outstanding referencesLarge number of active and/or standby threads Power – Overall utilization goes up, so does power consumptionStill the #1 constraint in modern computer designPage 12 The Future Is Heterogeneous Computing Oct 27, 2010

Three Eras of Processor sSystems EraConstrained by:PowerComplexityConstrained by:PowerParallel SW availabilityScalabilityCurrently constrained by:Programming modelsCommunication overheadso?we arehereTimePage 13owe arehereTime(# of Processors)Targeted ApplicationPerformanceEnabled by: Moore’s Law Abundant data parallelism Power efficient GPUsThroughput PerformanceEnabled by: Moore’s Law Desire for Throughput 20 years of SMP archSingle-thread PerformanceEnabled by: Moore’s Law Voltage Scaling MicroArchitectureowe arehereTime(Data-parallel exploitation) The Future Is Heterogeneous Computing Oct 27, 2010

AMD x86 64-bit CMP Evolution2003AMD Opteron Mfg.Process2005Dual-CoreAMD Opteron2007Quad-CoreAMD Opteron200845nm QuadCoreAMD Opteron2009Six-CoreAMD Opteron2010AMD Opteron6100 Series90nm SOI90nm SOI65nm SOI45nm SOI45nm SOI45nm SOIK8K8GreyhoundGreyhound Greyhound Greyhound MBHyperTransport Technology3x 1.6GT/.s3x 1.6GT/.s3x 2GT/s3x 4.0GT/s3x 4.8GT/s4x 6.4GT/sMemory2x DDR1 3002x DDR1 4002x DDR2 6672x DDR2 8002x DDR2 10664x DDR3 1333CPU CoreMax Power Budget Remains ConsistentPage 14 The Future Is Heterogeneous Computing Oct 27, 2010

AMD Opteron 6100 SeriesSilicon and PackageL3CACHECore 1Core 4Core 2Core 3Core 5Core 6L3CACHE12 AMD64 x86 Cores18 MB on-chip cache4 Memory Channels @ 1333 MHz4 HT Links @ 6.4 GT/secPage 15 The Future Is Heterogeneous Computing Oct 27, 2010

AMD Radeon HD5870 GPU ArchitecturePage 16 The Future Is Heterogeneous Computing Oct 27, 2010

GPU Processing Performance Trend3000GigaFLOPS2500Cypress*ATI RADEON * Peak single-precision performance;For RV670, RV770 & Cypress divide by 5 for peak double-precision performanceHD 5870RV770ATI RADEON RV6702000R600ATI RADEON ATI RADEON X19xxR520ATI RADEON X1800ATI FireGL V7600V8600V8650HD 3800AM D FireStream ATI FireGL 92509270V7700AM D FireStream 91702.5x ALUincreaseATI FireGL V7200V7300V7350Stream SDKCAL IL/Brook 50017Jul-09Double-precisionfloating pointJun08UnifiedShadersNov -07Oct-06MarSep-06-050Apr- 07GPGPUvia CTMc-081000ATI FireStream HD 2900OpenCL 1.1 DirectX 112.25x Perf.V8700De1500R580( )ATI RADEON HD 4800ATI FirePro

GPU 01087.904.50GFLOPS/mm ep-07Nov-07Jun-08Oct-09ATI Radeon X1800 XTATI Radeon X1900 XTXATI Radeon HD2900 PROATI Radeon HD3870ATI Radeon HD4870ATI Radeon HD587018

AMD Accelerated ParallelProcessing (APP) Technology is Heterogeneous: Developers leverage AMD GPUs and CPUs foroptimal application performance and user experienceHigh performance: Massively parallel, programmable GPUarchitecture delivers unprecedented performance and power efficiencyIndustry Standards: OpenCL enables cross-platform developmentSciences19ProductivityDigital ContentCreationGamingGovernmentEngineering

Moving Past Proprietary Solutions for Easeof Cross-Platform ProgrammingOpen and Custom ToolsHigh LevelToolsHigh Level LanguageCompilersApplication SpecificLibrariesIndustry Standard InterfacesDirectX AMDGPUsOpenCL OpenGL AMDCPUsOtherCPUs/GPUsOpenCL Cross-platform development Interoperability with OpenGL and DX CPU/GPU backends enable balanced platform approach20

Heterogeneous Computing:Next-Generation Software EcosystemIncrease ease ofapplicationdevelopmentEnd-user ApplicationsLoad balanceacross CPUs andGPUs; leverageAMD Fusion performanceadvantagesAdvanced Optimizations& Load BalancingHigh LevelFrameworksMiddleware/Libraries: Video,Imaging, Math/Sciences,PhysicsTools: HLLcompilers,Debuggers,ProfilersOpenCL & Direct ComputeHardware & Drivers: AMD Fusion ,Discrete CPUs/GPUs21Drive newfeatures intoindustry standards

AMD Balanced Platform AdvantageCPU is excellent for running somealgorithms Ideal place to process if GPU isfully loaded Great use for additional CPUcoresGPU is ideal for data parallel algorithmslike image processing, CAE, etc Great use for AMD AcceleratedParallel Processing (APP)technology Great use for additional GPUsGraphics WorkloadsSerial/Task-Paralle lWorkloadsDelivers22Other HighlyParallel Workloadsadvanced performancefor a wide rangeof platform configurations

Nested dataparallel CodeFine-grain dataparallel CodeCoarse-grain dataparallel Code 2D arrayrepresentingvery largedatasetLoop 16 times for 16pieces of dataPage 23Lots of conditional dataMaps very well toparallelism. BenefitsThroughput-orientedfrom closer couplingdata parallel enginesbetween CPU & GPU The Future Is Heterogeneous Computing Oct 27, 2010i,j 0i j load x(i,j)fmulstorecmp j (100000)bccmp i (100000)bc i 0i load x(i)fmulstorecmp i (16)bcMaps very well tointegrated SIMDdataflow (ie: SSE)i 0i load x(i)fmulstorecmp i (1000000)bc Loop 1Mtimes for1M piecesof data Challenges: Extracting Parallelism

A New Era of Processor sSystems t Performance24GPUGPUAdvancementCPUMicroprocessor Advancement

Now the AMD Fusion Era of Computing Begins25

DISCLA IMERThe inf ormation presented in this document is f or inf ormat ional purposes only and may cont ain t echnical inaccuracie s, omissions and typographical errors.The inf ormation contained herein is subject to change and may be rendered inaccurat e f or many reasons, including but not limited to product and roadmap changes,component and motherboard version changes, new model and/or product releases, product dif f erences between dif f ering manuf act urers, sof t ware changes,BIOS f lashes, f irmware upgrades, or t he like. A MD assumes no obligat ion t o updat e or ot herwise correct or revise this inf ormation. However, A MD reservest he right to revise this inf ormation and to make changes f rom time t o time t o the cont ent hereof without obligat ion of A MD to not if y any person of suchrevisions or changes.A MD MA KES NO REPRESENTA TIONS OR WA RRA NTIES WITH RESPECT TO THE CONTENTS HEREOF A ND A SSUMES NO RESPONSIBILITY FOR A NY INA CCURACIES,ERRORS OR OMISSIONS THA T MA Y A PPEA R IN THIS INFORMA TION.A MD SPECIFICA LLY DISCLA IMS A NY IMPLIED WA RRA NTIES OF MERCHA NTA BILITY OR FITNESS FOR A NY PA RTICULA R PURPOSE. IN NO EVENT WILL A MD BELIA BLE TO A NY PERSON FOR A NY DIRECT, INDIRECT, SPECIA L OR OTHER CONSEQUENTIA L DA MA GES A RISING FROM THE USE OF A NY INFORMA TIONCONTA INED HEREIN, EVEN IF A MD IS EXPRESSLY A DVISED OF THE POSSIBILITY OF SUCH DA MA GES.T his presentation c ontains forward- looking s tatements c oncerning AMD and tec hnology partner produc t offerings whic h are made purs uant to the s afe harbor provis ions of the P rivateSec urities L itigation Reform A ct of 1 9 95. Forward- looking s tatements are c ommonly identified by words s uc h as "would," "may," "expects," "believes," "plans," "intends,"“s trategy,” “roadmaps ,” "projects" and other terms with s imilar meaning. I nvestors are c autioned that the forward- looking s tatements in this presentation are bas ed on c urrentbeliefs , as sumptions and expectations, s peak only as of the date of this pres entation and involve risks and unc ertainties that c ould c ause ac tual results to differ materially fromc urrent expectations.A T TRIBUTIO N 2 0 1 0 Advanced M icro D evices, I nc. A ll rights reserved. A MD , the A MD A rrow logo, A M D O pteron, A TI, the A TI logo, Radeon and c ombinations thereof are trademarks of A dvancedM ic ro D evices, I nc. M icrosoft, Windows , and Windows V ista are registered trademarks of M icrosoft Corporation in the U nited States and/or other juris dictions. O penCL is trademark ofA pple I nc. us ed under license to the Khronos G roup I nc. O ther names are for informational purposes only and may be trademarks of their res pective owners .26

ATI RADEON HD 4800. ATI FirePro V8700. AMD FireStream 9250. 9270. RV670 ATI RADEON HD 3800 ATI FireGL V7700 AMD FireStream 9170 R600 ATI RADEON HD 2900. ATI FireGL V7600. V8600. V8650. R580( ) ATI RADEON X19xx. R520. ATI FireStream ATI RADEON X1800. ATI FireGL V7200. V7300. V7350. Unified Shaders Double .

Related Documents:

May 02, 2018 · D. Program Evaluation ͟The organization has provided a description of the framework for how each program will be evaluated. The framework should include all the elements below: ͟The evaluation methods are cost-effective for the organization ͟Quantitative and qualitative data is being collected (at Basics tier, data collection must have begun)

Silat is a combative art of self-defense and survival rooted from Matay archipelago. It was traced at thé early of Langkasuka Kingdom (2nd century CE) till thé reign of Melaka (Malaysia) Sultanate era (13th century). Silat has now evolved to become part of social culture and tradition with thé appearance of a fine physical and spiritual .

On an exceptional basis, Member States may request UNESCO to provide thé candidates with access to thé platform so they can complète thé form by themselves. Thèse requests must be addressed to esd rize unesco. or by 15 A ril 2021 UNESCO will provide thé nomineewith accessto thé platform via their émail address.

̶The leading indicator of employee engagement is based on the quality of the relationship between employee and supervisor Empower your managers! ̶Help them understand the impact on the organization ̶Share important changes, plan options, tasks, and deadlines ̶Provide key messages and talking points ̶Prepare them to answer employee questions

Dr. Sunita Bharatwal** Dr. Pawan Garga*** Abstract Customer satisfaction is derived from thè functionalities and values, a product or Service can provide. The current study aims to segregate thè dimensions of ordine Service quality and gather insights on its impact on web shopping. The trends of purchases have

Chính Văn.- Còn đức Thế tôn thì tuệ giác cực kỳ trong sạch 8: hiện hành bất nhị 9, đạt đến vô tướng 10, đứng vào chỗ đứng của các đức Thế tôn 11, thể hiện tính bình đẳng của các Ngài, đến chỗ không còn chướng ngại 12, giáo pháp không thể khuynh đảo, tâm thức không bị cản trở, cái được

Le genou de Lucy. Odile Jacob. 1999. Coppens Y. Pré-textes. L’homme préhistorique en morceaux. Eds Odile Jacob. 2011. Costentin J., Delaveau P. Café, thé, chocolat, les bons effets sur le cerveau et pour le corps. Editions Odile Jacob. 2010. Crawford M., Marsh D. The driving force : food in human evolution and the future.

Le genou de Lucy. Odile Jacob. 1999. Coppens Y. Pré-textes. L’homme préhistorique en morceaux. Eds Odile Jacob. 2011. Costentin J., Delaveau P. Café, thé, chocolat, les bons effets sur le cerveau et pour le corps. Editions Odile Jacob. 2010. 3 Crawford M., Marsh D. The driving force : food in human evolution and the future.