MVAPICH2 On Azure HPC: A Seamless HPC Cloud Experience

1y ago
9 Views
2 Downloads
553.17 KB
27 Pages
Last View : 1m ago
Last Download : 3m ago
Upload by : Aiyana Dorn
Transcription

MVAPICH2 on Azure HPC: A seamless HPC Cloud Experience Jithin Jose, Microsoft MVAPICH User Group Meeting 2020

Agenda Overview of Azure HPC What’s unique in HPC Cloud HPC Software Ecosystem MVAPICH2-X Azure Performance Characteristics Conclusion

Microsoft Azure HPC High Speed Networking Powerful Compute Seamless Integration InfiniBand Network Compute Optimized SKUs Seamless integration with Supports OFA verbs and GPU SKUs all IB-based MPI Libraries Only public cloud to offer IB existing HPC environments Scale out to Cloud

HPC Offerings in Azure H-Series (InfiniBand) N-Series (GPU InfiniBand)* H16r (FDR) NC24r (2 x Tesla K80 FDR) HB60rs (EDR) NC24rs v2 (4 x Tesla P100 FDR) HC44rs (EDR) NC24rs v3 (4 x Tesla V100 FDR) HB120rs v2 (HDR) ND24rs (4 x Tesla P40 FDR) ND40rs v2 (8 x Tesla V100, EDR) SKU Name indicates core count “r” indicates RDMA support “s” indicates Premium Storage support *GPU-only sizes not listed

Outline Overview of Azure HPC What’s unique in HPC Cloud HPC Software Ecosystem Performance Characteristics Conclusion

Unique Challenges for HPC in Cloud Performance/Scalability Challenges Noise from Host OS / Host Agents Host interrupts Core/NUMA mapping Traffic from other customers Guest Agents

Host / VM Partitioning NDv2 HBv2 0 1 2 3 24 25 26 27 4 5 6 7 28 29 30 31 23 8 9 10 11 32 33 34 35 26 27 12 13 14 15 36 37 38 39 30 31 16 17 18 19 40 41 42 43 20 21 22 23 44 45 46 47 0 1 2 3 16 17 18 19 4 5 6 7 20 21 22 8 9 10 11 24 25 12 13 14 15 28 29 Socket0 Socket1 Socket0 Host Cores VM Cores Host NUMA Nodes (4 cores per NUMA node) VM NUMA Nodes Host: AMD Rome with 128 cores 32 NUMA nodes, 4 cores per NUMA node HBv2 VM 120 cores for VM 8 cores reserved for host L3 NUMA No cache pollution MinRoot: Shapes up the Host cores Socket1 CPUGroups: Shapes up the VM cores Intel Xeon (Skylake) – 48 cores 40 cores for VM 8 cores reserved for host No Hyperthreading One VM per Host

Efficient Network Virtualization Single Root I/O Virtualization (SR-IOV) Expose all NIC features w/o any host intervention Offers bare-metal network performance Single VM per host One VF per Host Network Features:

Network Security InfiniBand Partition Keys Only VMs with same partition keys can communicate w/ each other Isolates customer traffic Multiple SL’s possible within same PKEY Partition Keys in Azure Single PKEY for all VMs in a VMSS (Virtual Machine Scale Set) Single PKEY for all VMs associated with an Availability Set Check PKEY: cat /sys/class/infiniband/mlx5 0/ports/1/pkeys/0 0x801a

Congestion Control Azure HPC InfiniBand Network is non-blocking No oversubscription Static Routing may still cause bottlenecks Solution: Adaptive Routing Available on CX5 and later generation NICs Configured per SL, AR enabled on all SLs 0

NUMA Mapping Deterministic pNUMA-vNUMA mapping Distance map shows 1:1 mapping Enables NUMA aware designs Efficient Process mapping NUMA aware MPI collectives MLC Latency Matrix on HBv2 (ns)

Outline Overview of Azure HPC What’s unique in HPC Cloud HPC Software Ecosystem MVAPICH2-X Azure Performance Characteristics Conclusion

HPC Marketplace Images CentOS HPC Images Mellanox OFED MPI Libraries Includes MVAPICH2, MVAPICH2X-Azure HPC Libraries Optimization Configurations OpenSource GitHub repository https://github.com/Azure/azhpc-images/ Use pre-built HPC VM images, or build custom image based on these, or BYO software stack

MVAPICH2-X Azure Available in all Azure CentOS-HPC images Targeted for Azure HB, HBv2, HC VM instances Feature Highlights: Enhanced tuning for point-to-point and collectives XPMEM Support DC Support Co-operative Protocol Hybrid RC/UD Support Blog Post: te/mvapich2-on-azure-hpc-clusters/ba-p/1404305

Outline Overview of Azure HPC What’s unique in HPC Cloud HPC Software Ecosystem Performance Characteristics Conclusion

Experiment Setup HBv2 VMs CentOS 7.7 HPC Image MPI Libraries MVAPICH2 2.3.4 MVAPICH2-X 2.3 Mellanox OFED 5.1

MPI Latency MPI Latency (inter-node) MPI Latency (intra-node) 512 128 256 64 128 64 16 Message Size (bytes) MVAPICH2 MVAPICH2-X Message Size (bytes) MVAPICH2 MVAPICH2-X MVAPICH2, MVAPICH2-X achieves 2us latencies MVAPICH2-X offers better large message latencies for intra-node transfers (XPMEM) 4M 2M 1M 512K 256K 128K 64K 32K 8K 16K 4K 2K 1K 512 256 128 64 32 0.5 8 0 4M 2M 1M 512K 256K 64K 128K 32K 16K 8K 4K 2K 1K 512 256 128 64 32 1 16 1 8 2 4 2 2 4 1 4 16 8 4 8 32 2 16 1 Latency (us) 32 0 Latency (us) 256

MPI Bandwidth / Bi-Bandwidth MPI Bandwidth MPI Bi-Bandwidth 25000 45000 40000 35000 Bandwidth (MB/s) Bandwidth (MB/s) 20000 15000 10000 30000 25000 20000 15000 10000 5000 5000 Message Size (bytes) MVAPICH2 MVAPICH2-X MVAPICH2, MVAPICH2-X close to line rates Both versions use same protocols Message Size (bytes) MVAPICH2 MVAPICH2-X 4M 2M 1M 512K 256K 128K 64K 32K 16K 8K 4K 2K 1K 512 256 128 64 32 16 8 4 2 1 4M 2M 1M 512K 256K 128K 64K 32K 16K 8K 4K 2K 1K 512 256 128 64 32 16 8 4 2 0 1 0

MPI Allreduce MPI Allreduce (960 processes) 3000 2500 Latency (us) 2000 2.8X 1500 1000 500 0 4 8 16 32 64 128 256 512 1K 2K 4K 8K 16K 32K 64K 128K 256K 512K 1M Message Size (bytes) MVAPICH2 MVAPICH2-X MVAPICH2-X XPMEM Collectives offers better large message allreduce latencies 8 HBv2 nodes, 120 PPN

GPCNet on HBv2 GPCNet - Congestion Performance - Lat, BW, AllRed 8 Congestion Factor 7 6 5 4 3 2 1 0 No Congestion Control With Congestion Control Measure Congestion Factor with and without Congestion Control (CC) 128 HBv2 VMs, 120 PPN (15,360 MPI ranks) Upcoming F/W version

MiniFE MiniFE 100 Finite Element Mini-Application 90 80 FE codes Strong scaling experiment 70 Execution Time (s) Proxy application for unstructured implicit 60 50 40 30 Version: openmp-opt Problem Size nx 1024, ny 1024, nz 1024 20 10 0 4 8 16 # Nodes MVAPICH2 MVAPICH2-X 32

CloverLeaf CloverLeaf 1400 1200 Hydrodynamics mini0app to solve compressible Euler equations in 2D DataSet: clover bm256.in x cells: 15360, y cells: 15360 Steps: 2955 Time (s) Version: CloverLeaf MPI 1000 800 600 400 200 0 4 8 16 # Nodes MVAPICH2 MVAPICH2-X 32

WRF WRF Execution time MVAPICH2 WRF 3.6 https://github.com/hanschen/WRFV3 Benchmark: 12km resolution case over the Continental U.S. (CONUS) domain https://www2.mmm.ucar.edu/wrf/WG2 /benchv3/# Toc212961288 Update io form history in namelist.input to 102 https://www2.mmm.ucar.edu/wrf/users/namelist best p rac wrf.html#io form history * Courtesy: MVAPICH Team 60 50 40 Time (s) MVAPICH2-X XPMEM 30 20 10 0 120 240 480 Number of processes 960

Outline Overview of Azure HPC What’s unique in HPC Cloud HPC Software Ecosystem Performance Characteristics Conclusion

Conclusion Azure HPC design offers bare-metal performance SR-IOV efficiently exposes network features Out-of-the box HPC VM Images MVAPICH2, MVAPICH2-X MVAPICH2/MVAPICH2-X offers great performance and scalability on Azure

Pointers AzureHPC Deployment Scripts https://github.com/Azure/azurehpc Azure HPC/GPU VM Sizes sizes-hpc sizes-gpu HPC Marketplace Images te/azure-hpc-vm-images/ba-p/977094 MVAPICH2 on Azure te/mvapich2-on-azure-hpc-clusters/ba-p/1404305 Adaptive Routing on Azure HPC te/adaptive-routing-on-azure-hpc/ba-p/1205217

Thank You! jijos@microsoft.com

Network Security InfiniBand Partition Keys Only VMs with same partition keys can communicate w/ each other Isolates customer traffic Multiple SL's possible within same PKEY Partition Keys in Azure Single PKEY for all VMs in a VMSS (Virtual Machine Scale Set) Single PKEY for all VMs associated with an Availability Set Check PKEY: .

Related Documents:

XSEDE HPC Monthly Workshop Schedule January 21 HPC Monthly Workshop: OpenMP February 19-20 HPC Monthly Workshop: Big Data March 3 HPC Monthly Workshop: OpenACC April 7-8 HPC Monthly Workshop: Big Data May 5-6 HPC Monthly Workshop: MPI June 2-5 Summer Boot Camp August 4-5 HPC Monthly Workshop: Big Data September 1-2 HPC Monthly Workshop: MPI October 6-7 HPC Monthly Workshop: Big Data

This section contains step-by-step procedures for installing an HPC Pack head node in a Microsoft Azure virtual machine. The head node will be joined to an Active Directory domain that is also created on the head node. NOTE You can deploy HPC Pack in a Microsoft Azure virtual machine starting with HPC Pack 2012. Step 1: Create an affinity group

AZURE TAGGING BEST PRACTICES Adding tags to your Azure resources is very simple and can be done using Azure Portal, Azure PowerShell, CLI, or ARM JSON templates. You can tag any resources in Azure, and using this service is free. The tagging is done on the Azure platform level and does not impact the performance of the resource in any way.

DE LAS UNIDADES PROGRAMA CURRICULAR UNIDAD 2 - Introduccion a los servicios de azure - Los servicios de Azure - Cómo crear un App Service en Azure - Administrar App Service con Azure Cloud Shell Azure UNIDAD 3 - Introduccion al Modulo - Regiones y centros de datos en azure - Zonas Geograficas en

Resource Manager and the Azure portal through Azure Arc to facilitate resource management at a global level. This also means a single vendor for support and billing. Save time and resources with regular and consistent feature and security updates. Access Azure hybrid services such as Azure Security Center, Azure Backup, and Azure site recovery.

students solve a variety of challenges faced in education through Microsoft Azure and the cloud. Azure for research staff Azure for teaching staff Azure for students Azure for academic institutions Azure is a powerful tool for research and education, and Microsoft provides a number of programs to meet the needs of academic institutions.

Gain Insights into your Microsoft Azure Data using Splunk Jason Conger Splunk. Disclaimer 2 . Deploying Splunk on Azure Collecting Machine Data from Azure Splunk Add-ons Use cases for Azure Data in Splunk 3. Splunk available in Azure Marketplace 4. Splunk in Azure Marketplace

Music at Oxford’s annual celebration of carols in the beautiful surroundings of the Cathedral brings together a popular mix of festive cheer and seasonal nostalgia. The Cathedral Choir will sing a range of music centred on the Christmas message, under their new director Steven Grahl, with spirited readings and audience carols to share. Early booking is essential. Tickets from www .