AMD EPYC For HPC - Easybuild.io

1y ago

15 Views

3 Downloads

1.67 MB

25 Pages

Last View : 2d ago

Last Download : 23d ago

Upload by : Camille Dion

Report this link

Download PDF

Transcription

AMD EPYC for HPCOverview, Strategies, and Best Practices for the HPC CommunityPart 1Summer 2021

Overview of AMD EPYC in HPC SpaceWhat’s all the fuss?History of EPYCHPC community adoptionBuying/Using EPYC for HPCWhat problems are you trying to solve?What does someone new to EPYC need to know?What to buy and why?What traps to avoid?Best PracticesBIOS, Networking, OSApplications

Humble employee of .Lifelong Blue Devil .Grew up in Durham, North CarolinaDuke ’05 Grad5 National Titles, a million amazingmemoriesStill can’t believe we lost to Uconn in ‘99and ‘04 Final Fours

Principal Program Manager, Azure HPC (2017 – present) Lead for Azure H-series (CPUs RDMA networking)Director, HPC Solutions, Cycle Computing (2016-2017)National Center for Supercomputing Applications, Univ Illinois (2009-2016)

Is NotClever Azure marketing ployAdvertisement for AMDAnti-Intel rantPhD-level thesisIsContribution to broader HPC Community from a group that’s deployed a lot of AMD EPYC for HPC/AIDigestible, pragmatic guidance for those thinking of buying or have bought AMD EPYCRecommendations/data to help answer common Q’s, save you time, and support HPC workloadsOpen invite to ask questions and get my best, most data-driven answers

TL;DR - “EPYC” CPU is a credible alternative to Intel in the datacenter for buyers and users of HPC Leadership memory bandwidth & IO Competitive FLOPS x86 compatibility Very good power efficiency Highly competitive economicsAll things we in the HPC world really like!

2012201720192021“Abu Dhabi” core uArch“Zen1” core uArch“Zen2” core uArch“Zen3” core uArch“Piledriver” SoC“Naples” SoC“Rome” SoC“Milan” SoCUp to 16 coresUp to 32 coresUp to 64 coresUp to 64 cores 64 GB DRAM B/w 260 GB DRAM B/W 340 GB DRAM B/W 340 GB DRAM B/WPCIe 2.0Up to 4MB L3/coreUp to 16MB L3/coreUp to 32 MB L3/corePCIe 3.0PCIe 4.0PCIe 4.0

TL;DRChiplets help to increase fab yields and schedule, lower cost,improve socket level performance and power efficiencyProsEPYC “Rome” Die Shot2CH DDR42CH DDR4I/O DieAll of the above are good!2CH DDR42CH DDR4TradeoffsUsers and developers need to think of EPYC CPUs asalmost “clusters on a chip” and have awareness as to howbest to overlay software on top of this kind of hardwareE.g. Above is more “4 * 2 CH memory” rather than “8 channel”

SimilaritiesSame core countsSame 280w TDP maxSame PCIe 4.0 supportSame 8ch DDR4 3200 (2/quadrant)Same 16 GT/s xGMIDifferences2x addressable cache/core19% higher IPC from Zen3 v. Zen2Higher frequenciesBetter memory latencies

Top500 – New ale and Exascale 89PF Peak 552PF Peak 1.5 EF Peak 2 EF Peak

HBv1HB-series VMsND A100 v4-series VMsCPU-based HPCGPU-based HPC/AIEPYC Gen1 “Naples”100 Gb EDRQ2 2019HBv2EPYC Gen2 “Rome”200 Gb HDRQ1 2020HBv3EPYC Gen3 “Milan”200 Gb HDRQ1 2021NDv4EPYC Gen2 “Rome”8 * Nvidia A100 NVLINK 40 GB8 * 200 Gb GDRQ3 2021HBv1 docs: https://bit.ly/3CQbIoxHBv2 docs: https://bit.ly/3iT23WiHBv3 docs: https://bit.ly/37REkQ5NDv4 docs: https://bit.ly/3xSd1zE

Non-blocking Fat Tree topologyHardware offload of MPI collectivesSupports all MPI1.3 microsecond latenciesBare-metal passthroughInfiniBand Network CoreUp to 200 Gb HDRDynamic Connected TransportIntelligent Adaptive Routing

040,0003.1x30,00020,00010,0001x0Highest on other CloudCP2K (quantum chemistry) Graph500 (Graph Analytics)HCHCHBv2HBv2Star-CCM (CFD)HBv2HBv2HBv2WRF (Weather)NAMD (Biophysics)HBv2HBv2HBv2HBv213

First Step – What are the most important problems you are trying to solve for? How do you stack rank? What is the relevant level(s) of scale? Pure performance ? Performance/ ? Cost/Performance? Simplest possible HPC evolution for my users? Supported platform by ISVs and/or required SW toolchains? Platform for accelerators? Lowest possible cost? Something else?Frequent answer from Azure HPC customers: “best performance and cost/performance for my mainworkloads, with as minimal user education as possible”

EPYC performance can be extremely good for a CPUTypical Haswell/Broadwell Rome/Milan will seem like enormous leap for most workloads How good depends on what your workload scales with (memory bandwidth? L3? Compute? Frequency?) Realize/explain performance or cost per job is what matters Infrastructure doesn’t scale by “cores”, you buy/rent servers (nodes) Clock frequency performance (don’t just “MOAR GIGAHURTZ!!”)Perf scales per server (or VM), or by N* scalable network endpoints (MPI) Doesn’t matter if you used all the cores (do you do this for RAM? Cache? CUDA cores in a GPU? RDMA B/W?) Exception scenario: using expensive SW licensed per core Affinitize processes explicitly and with understanding of hardware topologyGenerally advisable to evenly distributed processes by physical L3 boundaries (4 cores/Rome, 8 cores/Milan) Don’t just throw N processes at the server and assume app/OS will automagically figure out placement for you

ANSYS Fluent 2019.5Per core performance depends heavily on how X number ofcores in a node sub-divide global shared assets that havesignificant impact on performance DRAM bandwidth L3 cache capacity and bandwidth On-die and inter-socket bandwidth (“GMI” and “xGMI”) Power and thermal headroom to increase clock frequencies For MPI workloads, network bandwidth/latencyIn right circle, the cores appear to be 2.5xfaster than the cores on the right. Are they?No, they’re exact same cores in exact sameserver, just getting different allocations ofglobal shared assetsAircraft 14m cell case, 1* Azure HBv2 VMScaling from 1-4 processes per NUMA100% of best performancepossible, but 4x the coresand licenses used. Still just1 node of infrastructure.63% of best performancepossible, but ¼ of the coresper node and licenses used

Even for compute bound apps, per core performance dependson whether and to what degree global shared assets are beingexhaustedSame phenomenon will generally occur on other CPUs, too(e.g. Intel Xeon)Cores per MilanCCD1 per CCD2 per CCD4 per CCD6 per CCDBenchmark16C32C64C96CHBv3 HPL0.761.402.232.86Bare metal HPL0.760321.42571522.253442.87232Expected HPLEfficiency90%90%87%85%VM as a % ofMetal1x0.98x0.99x1xLesson – Target a EPYC CPU model with a core count thatreturns commensurate value for increase in costNote decline in expected and deliveredHPL efficiency; this is due to graduallyrunning out of data fabric (GMI)bandwidth

TL;DR – EPYC packs in so much memory bandwidth ,L3 cache, data fabric perf, etc. that for many HPC apps,even at ISO core counts, it will often outperform XeonDisclaimer - not showing this for “Azure v. AWS”purposes” (Azure Skylake in HC-series would looksimilar to AWS’ Skylake in this case)Nor using Skylake as representative of all Intel Xeon(e.g. IceLake would do better than Skylake here)Nor saying OpenFOAM is indicative of every HPCworkloadOptimizing OpenFOAM Performance and Cost on Azure HBv2 VMs - https://bit.ly/3xUbOYo

Big differences in 1 node perf can reduce at scale . and small differences in 1 node per can increase at scaleBoth scenarios can change calculus of which CPU platform to invest in, and how to configure those platforms

TL;DR - Not *likely* a big deal Few HPC apps support AVX512 as is, evenfewer are heavily optimized for it Anything that supports AVX512 also likely hasAVX2 binary (e.g. GROMACS) For those that are, EPYC core count advantagemakes up the difference (and with no need forAVX512 support)Frontera (AVX512, CLX)Frontera (AVX2, CLX)Azure HBv2 (AVX2, Rome)2.52 Scenario 1: (2 CPUs/server) * (28 cores/Cascade Lake 8280) * (32ops/cycle) * ( 1.9 GHz SIMD bound freq) 3.4 teraFLOPS FP64(peak) Scenario 2: (2 CPUs/server) * (64 cores/Rome 7742) * (16 ops/cycle)* ( 2.2 GHz SIMD bound freq) 4.5 teraFLOPS FP64 (peak)1Exception: You have AVX512 app *AND* it’s licensedper core *AND* SW costs dominate TCO *AND*problem is not communications bound0.51.50Big picture - If your app is so purely computebound, you probably want a GPU anyway163264128Nodes256512

TL;DR - Very much a “it depends”In general, MKL will run just fine on EPYCIf access to source, AMD libraries are optimized andwell supported for EPYChttps://developer.amd.com/amd-aocl/Backup option for MKL (prior to 2020) is to useDebug Mode Type 5 (not necessarilyrecommended, though)But some apps will take hard dependency on MKLand as a result deliver better perf, perf/unit of cost,and cost/performance on Intel XeonLibraryMKL (Debug Modeenabled)MKL (DebugMode disabled)BLISSingle-core DGEMM51.36 GigaFLOPS47.684 GigaFLOPS50.65 GigaFLOPSMulti-core DGEMM3239 GigaFLOPS1778 GigaFLOPS4020 GigaFLOPS

L3 as NUMA Defines NUMA boundaryDeterminism Mode Enabled 1 NUMA for every L3 slice Disabled # of NUMA will be how you define NPS(recommended)Performance bring every CPU in cluster to lowestcommon denominator of silicon yield Power let motherboard drive CPU to bestfrequencies based on frequency/power curve of givenCPU (recommended)Nodes per Socket – Determines how Interleaving is doneC-States NPS1 simplest presentation Enabled – best “Fmax” (recommended) NPS2 2-way interleaving/socket (recommended) Disabled – limited “Fmax” NPS4 4-way interleaving/socket NPS4 not an option on 6 CCD EPYC

Preferred IO key PCIe device (e.g. InfiniBand NIC (recommended)cTDP Configurable power rangeLCLK for Key PCIe device Set to 593 (improves NIC latency)Simultaneous Multi-threading (SMT) Enabled 2 threads/core Disabled 1 thread/corePackage Power Limit (PPL) Hard governor of socket power limit Depends on your DC power limit and how you areassessed OPEX costs

High Performance Computing (HPC) Tuning Guide for AMD EPYC 7003 Series Processors - https://bit.ly/3k0oiZLHigh Performance Computing (HPC) Tuning Guide for AMD EPYC 7002 Series Processors - https://bit.ly/3xRJzd1HPC Performance and Scalability Results with Azure HBv2 VMs - https://bit.ly/2XD7EbjHPC Performance and Scalability Results with Azure HBv3 VMs - https://bit.ly/37PyCOMAMD Presentation to NASA – “Why AMD for HPC” - https://go.nasa.gov/3CVOMEzAMD Optimizing CPU Libraries (AOCL) - https://bit.ly/3m9afnfOptimizing OpenFOAM Performance and Cost on Azure HBv2 VMs - https://bit.ly/3xUbOYo

Thank you!FeedbackQ&A

Related Documents:

Performance Analysis of the AMD EPYC Rome Processors

AMD EPYC Rome Clusters Cluster / Configuration AMD Minerva cluster at the Dell EMC HPC Innovation Lab -Number of AMD EPYC Rome sub-systems with Mellanox EDR and HDR interconnect fabrics 10 x Dell EMC PowerEdge C6525 nodes with EPYC Rome CPUs running SLURM; AMD EPYC 7502 / 2.5 GHz; # of CPU Cores: 32; # of Threads: 64; Max Boost

11 Views

1y ago

Supermicro and AMD Achieve Outstanding Linear Workload ...

performance with Oracle Database 19c. Supermicro AS -1014S-WTRT server, powered by AMD EPYC 7F72 For this benchmark with Oracle Database 19c, we are using the Supermicro AS -1014S-WTRT server powered by AMD EPYC 7F52 and AMD EPYC 7F72. This system is a single socket, as shown in the table below. With AMD EPYC’s core density, it is cost-

26 Views

3y ago

HPC Tuning Guide for AMD EPYC™ Processors

HPC Tuning Guide for AMD EPYC Processors 56420 Rev. 0.7 December 2018 6 Chapter 1 Introduction AMD launched the new ‘EPYC’ x86_64 CPU for the data center in June 2017. Based on the 14nm Z

51 Views

2y ago

AMD Instinct GPU and ROCm Technical Overview PC Cluster Consortium 2022

This presentation contains forward-looking statements concerning Advanced Micro Devices, Inc. (AMD) such as the features, functionality, performance, availability, timing and expected benefits of AMD products and AMD product roadmaps, which are made pursuant to the Safe . HPC Tuning Guide for AMD EPYC 7003 Series Processor AMD Covid-19 HPC Fund

11 Views

1y ago

Big Data Welcome - PSC

XSEDE HPC Monthly Workshop Schedule January 21 HPC Monthly Workshop: OpenMP February 19-20 HPC Monthly Workshop: Big Data March 3 HPC Monthly Workshop: OpenACC April 7-8 HPC Monthly Workshop: Big Data May 5-6 HPC Monthly Workshop: MPI June 2-5 Summer Boot Camp August 4-5 HPC Monthly Workshop: Big Data September 1-2 HPC Monthly Workshop: MPI October 6-7 HPC Monthly Workshop: Big Data

41 Views

1y ago

Sun ZFS Storage 7120 7320 and 7420 Appliance Customer Service ... - Oracle

chassis-000 0839QCJ01A ok Sun Microsystems, Inc. Sun Storage 7410 cpu-000 CPU 0 ok AMD Quad-Core AMD Op cpu-001 CPU 1 ok AMD Quad-Core AMD Op cpu-002 CPU 2 ok AMD Quad-Core AMD Op cpu-003 CPU 3 ok AMD Quad-Core AMD Op disk-000 HDD 0 ok STEC MACH8 IOPS disk-001 HDD 1 ok STEC MACH8 IOPS disk-002 HDD 2 absent - - disk-003 HDD 3 absent - -

10 Views

1y ago

AMD EPYC CPU platform for High-Performance Computing

PYC CPU Tuning Guide for InfiniBand HPC [AMD Public Use] EPYC-07 - Based on June 8, 2018 AMD internal testing of same-architecture product ported from 14 to 7 nm technology with similar implementation flow/methodology, using performance from

8 Views

1y ago

American Revolution - Breal

The American Revolution, 1775-1781 Where was the American Revolution fought? Building a Professional Army nWashington’s task was to defendas much territory as possible: Relied on guerrilla tactics & avoided all-out-war with Britain Washington’s Continental Army served as the symbol of the “republican cause” But, colonial militias played a major role in “forcing” neutrals .

48 Views

3y ago

Recent Views

Grammar as a Foreign Language - List of Proceedings

Grammar as a Foreign Language Oriol Vinyals Google vinyals@google.com Lukasz Kaiser Google lukaszkaiser@google.com Terry Koo Google terrykoo@google.com Slav Petrov Google slav@google.com Ilya Sutskever Google ilyasu@google.com Geoffrey Hinton Google geoffhinton@google.com Abstract Synta

2y ago

445 Views

Attention is All you Need - NIPS

Google Brain avaswani@google.com Noam Shazeer Google Brain noam@google.com Niki Parmar Google Research nikip@google.com Jakob Uszkoreit Google Research usz@google.com Llion Jones Google Research llion@google.com Aidan N. Gomezy University of Toronto aidan@cs.toronto.edu Łukasz Kaiser Google Brain lukaszkaiser@google.com Illia Polosukhinz illia .

1y ago

303 Views

GSA Implementation of Google (G) Suite

Google Meet Classic Hangouts Google Chat Google Calendar Google Drive and Shared Drive Google Docs Google Sheets Google Slides Google Forms Google Sites Google Keep Apps Script D

2y ago

316 Views

Google Drive (Google Docs, Google Sheets, Google Slides)

Google Drive (Google Docs, Google Sheets, Google Slides) Employees are automatically issued a Kyrene Google account. Navigate to drive.google.com. Use Kyrene email address and network password to login. Launch in Chrome browser for best experience. Google Drive is a cloud storage sys

2y ago

388 Views

Quick Guide of Using Google Home to Control Smart Devices

Configuration needs Google Home app. Search "Google Home" in App Store or Google Play to install the app. 3.1 Set up Google Home with Google Home app You can skip this part if your Google Home is already set up. 1. Make sure your Google Home is energized. 2. Open the Google Home app by tapping the app icon on your mobile device. 3.

1y ago

326 Views

Elaboração de Provas Online usando o Formulário Google Docs

2 Após o login acesse o Google Drive ou o Google Docs e selecione a ferramenta Google Forms (Formulários). Clique na caixa de Ferramentas do Google, localizada no canto direito superior da tela e selecione o Google Drive. Na tela do Google Drive clique em New , opção More e selecione Google Forms. OBS: É possível acessar o google

10m ago

123 Views

ACS WASC Templates

File upload, Folder upload, Google Docs, Google Sheets, or Google Slides. You can also create Google Forms, Google Drawings, Google My Maps, etc. Share with exactly who you want — without email attachments. Search or sort your list of files, folders, and Google Docs. Preview files and Google Docs.

2y ago

366 Views

Google Drive - San Bernardino City Unified School District

Google Apps All of the Google applications that are available upon logging into Google.com (G , Gmail, Gphotos, Gdrive, etc.). Google Suite Google’s online cloud based office companion applications (Docs, Sheets, Slides). Google Drive Google’s online cloud storage and file sharing/collaboration application.

2y ago

378 Views

Single Sign On for Google Apps with NetScaler Unified Gateway

Google Apps for Work is a suite of cloud computing productivity and collaboration applications provided by Google on a subscription basis. It includes Google’s popular web applications including Gmail, Google Drive, Google Hangouts, Google Calendar and Google

2y ago

295 Views

Serviceteil

Google 84, 87, 124 Google 110 Google AdWords 101, 103 Google Alerts 127 Google Analytics 89 Google Maps 100, 110, 173 Google-Maps 63 Google Places 100, 103, 124 Graphiken 66 H Haftung 170 Haftungsausschluss 72 Hausfarbe 11 Headline 35 Heilmittelwerbegesetz 14, 69, 163 Heilversprechen 164 HONcode 78 HTML 58 HWG 31 I Imagefilm 31

2y ago

336 Views

Best practices for managing identities when you move to Google Cloud

Google Cloud. To provide t he informat ion an organizat ion would ne e d to transfer data and ownership from one Google Account to anot her for s ome of t he noncore Google s er vice s, such as Google Ads, Google Analyt ics, or DV360. Intende d audience Organizat ion administrators. Sta planning Google Cloud / Google Wor kspace migrat ion. Key .

1y ago

481 Views

LIST OF EXHIBITORS

jec world 2020 list of exhibitors / updated on november 20, 2019 company name country corelite inc usa coriolis composites france corso magenta france new cousin composites france covestro deutschland ag germany cpic international co.,limited hong-kong cqfd composites france creaform (ametek sas - division creaform) france crepim france ctmi france

2y ago

115 Views

Introduction - Google Earth User Guide

Google Earth Community: Learn from other Google Earth users by asking questions and sharing answers on the Google Earth Community forums. Using Google Earth: This blog describes how you can use some of the interesting features of Google Earth. Selecting a Server Note: This section is relevant to Google Earth Pro and EC users.

3y ago

288 Views

Using Google Forms to Manage Officials Signups

Google Sheets, deleting a response from the form or sheet will not affect the other. Once the Google Form is linked to a Google Sheet, clicking on the spreadsheet icon will open the linked Google Sheet. Google Responses Sheet Google automatically creates and populates the sp

2y ago

276 Views

Google Cheat Sheets - Shake Up Learning

Google Slides Cheat Sheet p. 15-18 Google Sheets Cheat Sheet p. 19-22 Google Drawings Cheat Sheet p. 23-26 Google Drive for iOS Cheat Sheet p. 27-29 Google Chrome Cheat Sheet p. 30-32 ShakeUpLearning.com Google Cheat Sheets - By Kasey Bell 3

2y ago

296 Views

AMD EPYC For HPC - Easybuild.io

It looks like you're using an ad-blocker