Experiences With NECʼs New Vector System SX-Aurora

2y ago
5 Views
2 Downloads
9.45 MB
31 Pages
Last View : 1m ago
Last Download : 3m ago
Upload by : Ellie Forte
Transcription

Experiences with NECʼs New Vector SystemSX-Aurora TSUBASA and Its Extension for the FutureHiroaki KobayashiSpecial Advisor to President (for ICT Innovation)Deputy Directer for the HPC Strategy of Cybersicence CenterChair of Computer and Mathematics Sciences DepartmentProfessor of Graduate School of Information SciencesTohoku Universitykoba@tohoku.ac.jp28th WSSPOctober 9-10, 2018

Hiroaki Kobayashi, Tohoku UniversityToday’s AgendaQuick Introduction of NEC’s New Vector System: SXAurora TSUBASAAn X86-Attached SX Vector System Aiming at Standardizationand CustomizationThe New Execution Model of Scalar/OS OffloadingEarly Evaluation of SX-Aurora TSUBASATohoku Univ.’s Application KernelsHPCGVector Offloading MechanismOn-going R&DDesign consideration of SX-Aurora TSUBASA for the NextGenerationR&D of a Quantum Computing-Assisted HPC Infrastructure28th WSSP!2Oct. 9-10, 2018

The First Impression of SX-Aurora TSUBASA

Hiroaki Kobayashi, Tohoku UniversityNEC Brand-New Vector System: nHighest Mem. BWnatSadir zitanoLargest Single CorePerformanceStandardizationLinux EnvironmentNew execution modelcentralized on vectorcomputingSource: NEC28th WSSP4Oct. 9-10, 2018

Hiroaki Kobayashi, Tohoku UniversityHardware Specification of SX-Aurora TSUBASASX Vector ProcessorX86 Processor(Xeon)PCIeSource: Intel, 28-core version of Skylake28th WSSP55Oct. 9-10, 2018

Hiroaki Kobayashi, Tohoku UniversityA New Execution Model of SX-Aurora TSUBASASX-Aurora TSUBASAExecution ModelConventional ExecutionModel of Accelerators!elavSas!citrraotraneCrotcVeeleccA28th WSSP66Oct. 9-10, 2018

Hiroaki Kobayashi, Tohoku UniversityComparison between SX-ACE and ImprovementNumber of Cores842xTotal Flop/s in DP(Total Flop/sin Memory Bandwidth1.2TB/sec256GB/sec4.7xADB nuxSuper-UXOS28th WSSP!7Oct. 9-10, 2018

Hiroaki Kobayashi, Tohoku UniversityComparison between Xeon Gold, SX-AuroraTSUBASA VE and V100Intel Xeon Gold6126NEC Vector EngineType 10BNVIDIA Tesla V100Frequency2.6 GHz / 3.7GHz(Turbo)1.4 GHz1.245 GHzNo. of cores1285120Performance/socket1,996/2,840 GF (SP)998.4/1,420 GF (DP)4.3 TF (SP)2.15 TF (DP)14 TF (SP)7 TF (DP)Memory subsystemDDR4-2666 DIMM16GB x 6 channelsHBM2 8GBx 6 modulesHBM2 4GBx 4 modulesMemory bandwidth128 GB/s1.22 TB/s900 GB/sMemory capacity96 GB48 GB16 GB Price?28th WSSP!8 Oct. 9-10, 2018

Hiroaki Kobayashi, Tohoku UniversityYou may be interested in Post-K Processor Become available in 2021? 28th WSSP!9Oct. 9-10, 2018

Hiroaki Kobayashi, Tohoku UniversityThe Similar Architecture with The SamePerformance Available Right Now!4.8 billion transistors28th WSSP!10Oct. 9-10, 2018

Hiroaki Kobayashi, Tohoku UniversityBenchmark Programs for Performance ridsCode B/F100x750x750Actual B/FsLand MineElectromagneticEarthquakeSeismologyTurbulent irect480x80x80x100.96239.599.70.008428th WSSPFDTDDependentFriction LawSequentialSequential112047x2047x256Oct. 9-10, 2018

Hiroaki Kobayashi, Tohoku UniversityTohoku Univ.’s Kernels Results109.7XSX-Aurora TSUBASASX-ACESkylakeSpeedup to SX-ACE8644.2X3.5X3.4X2.9X21.4X0.4X0C B/FS B/FVec. Length28thWSSPVectrizationLand mine6.225.14250.999.210.5X0.7XEarthquake Turbulent 0.6XTurbine0.960.0084239.5Oct. 9-10, 201899.71

Hiroaki Kobayashi, Tohoku UniversityPerformance Evaluation of SX-Aurora TSUBASA byUsing the HPCG BenchmarkHPCG (High Performance Conjugate Gradients) is designed to exercise computationaland data access patterns that more closely match a broad set of importantapplications,HPL for top500 is increasingly unreliable as a true measure of system performancefor a growing collection of important science and engineering applications.HPCG is a complete, stand-alone code that measures the performance of basicoperations in a unified code:Sparse matrix-vector multiplication.BenchmarkKernelRequired B/FHPLDGEMM 0.1HPGMGGSRB 1HPCGSpMV, SYMGS 4Sparse triangular solve.Vector updates.Global dot products.Local symmetric Gauss-Seidel smoother.Driven by multigrid preconditioned conjugate gradient algorithm that exercises thekey kernels on a nested set of coarse grids.Reference implementation is written in C with MPI and OpenMP support.28th WSSP13Oct. 9-10, 2018

Hiroaki Kobayashi, Tohoku UniversitySustained Performance of HPCG-BenchmarkHPCG result (Gflop/s)80SX-Aurora 612864256321285126425664641286464103264128 256323264128642565123264128256512 1024 32128128256Grid sizes28th WSSP64142565121024 64128512256 5121 0241 282565 121024ZYXOct. 9-10, 2018

Hiroaki Kobayashi, Tohoku UniversityHPCG-Benchmark Efficiency12SX-Aurora TSUBASA11%SX-ACEEfficiency 264128 256323264128642565123264128256512 1024 32128641282562565121024 64128512256 5121 0241 282565 121024ZYXGrid sizes28th WSSP15Oct. 9-10, 2018

Hiroaki Kobayashi, Tohoku UniversityEvaluation of the New Execution Model:OS/Scalar Offloading from Vector e268.8GFHBM28GBHBM28GBOffloading of OS and scalar 268.8GFPCIeHBM28GBVE Offloading ModeOS & ScalaroffloadingOS onlyoffloadingCore268.8GFCore268.8GFOffloading of vector operationsVH Offloading Mode28th WSSPCore268.8GFVH Offloading16VectoroffloadingCore268.8GFHBM28GBLLC 8MBHBM28GBLLC 8MBCore268.8GFCore268.8GFLLC 8MB268.8GFLLC 8MBCoreHBM28GBVE OffloadingOct. 9-10, 2018

Hiroaki Kobayashi, Tohoku UniversityImpressions of SX-Aurora TSUBASASX-Aurora TSUBASA has a great potential to achieve a highsustained performance for memory-intensive applications, but Compiler development is still underway, limiting the sustainedperformance regarding auto-vectorization and autoparallelization, anyway use the latest one for the bestperformance!Compiler is also not fully exploiting enlarged and core-sharedcapacity of LLC. Software controlled function is desired to makethe best use of it for reducing off-chip memory transactonsFor some applications, the LLC bandwidth to cores becomes abottleneck even with a high hight ratesShared LCC of SX-Aurora, 2.66 against 1.2 of Mem. vs.Dedicated ADB of SX-ACE, 1 against 0.256 all in TB/s)28th WSSP!17Oct. 9-10, 2018

Hiroaki Kobayashi, Tohoku UniversityUnofficial Web Site of SX-Aurora /06/15/how-to-install-sx-aurora-tsubasa/ Our website provides the information about How to setup software environments How to update software environments Events etc28th WSSP!18Oct. 9-10, 2018

Design Consideration of the Future Vector Systems **This work is partially conducted with NEC,but the contents do not reflect any futureproducts of NEC

Hiroaki Kobayashi, Tohoku UniversityTimeline of the Cyberscience Center HPC SystemDevelopment and R&D For the FutureYear 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 20195-Cluster SX-ACE (707TF)LX 406Re2(31TF)Storage Systems (4PB)3D Tiled DisplaySX-9(29TF)New HPC BuildingConstruction(1,500m2)Next System?Systems& FacilityDesign and Procurement processfor enhancement of Server,Storage&Visualization SystemsProjects2020 Design and Procurementprocess of the nextsupercomputer systemR&D for the next systemFeasibility study for future HPC28th WSSP!20Oct. 9-10, 2018

Hiroaki Kobayashi, Tohoku UniversityReenforce the academic and industry collaboration for the HPC R&Dat Tohoku UniversityTohoku-Univ NEC Joint Research Division of High-Performance ComputingFounded in June, 2014, 8-Year Period until 2022ObjectivesR&D on HPC technologies to exploit high-sustained performance of science and engineering applications oncurrent HPC Systems and to realize Future HPC Systems targeting at 2021Evaluation and Improvement of the current HPC environments through migration of SX-9 applications toSX-ACEDetailed Evaluation and Analysis of Modern HPC Systems, not only Vector Systems but also Scalar-Paralleland Accelerator-Based SystemsFeasibility study of a future highly balanced HPC system for high sustained performance of practicalapplications in the post-peta scale eraHPC R&DFaculty MembersHPC SystemOperationHiroaki Kobayashi, Professor and division directorHuman ResourceDevelopmentHiroyuki Takizawa, ProfessorRyusuke Egawa, Associate ProfessorAkihiko Musa (NEC),Visiting ProfessorMitsuo Yokokawa (Kobe Univ),Visiting ProfessorShintaro Momose (NEC),Visiting Associate ProfessorHPC User SupportMasayuki Sato, Assistant ProfessorIn collaboration with visiting researchers from NEC and the technical staff of Cyberscience Center28th WSSP21Oct. 9-10, 2018

Hiroaki Kobayashi, Tohoku UniversityScaling may be End, but Silicon is not End!And Use it Smart and Effective!Tech ScalingWe are facing the end of Moore’s low due to the physicallimitations, and the transistor cost is hard to reduce, howeverSilicon is still fundamental constructing material for computingplatforms such as plastic, steel and concrete for automobiles,buildings and home appliances.So, we have to become much more smart for design of FutureHEC systems.Use precious silicon budget ( advanced device technologies)to effectively design mechanisms that can maximize thesustained performance and power-efficiency of individualapplications domains.ng!ipopstebillw.hTecFabrication CostincsitCosingsareItʼs time to focus on Domain-Specific Architectures(DSAs) forcomputation-intensive, memory-intensive, I/O intensive, low-precisioncomputing etc applications to improve silicon/power efficiency!New HPC System Architecture Design Concept of Ensemble Architecture:Make different DSAs combine and complementary work together28thWSSP the general-purpose functionalityOct. 9-10, 2018torealizeas a single computing infrastructure22

Hiroaki Kobayashi, Tohoku UniversityDomain Specific Balanced Architecture Design Approach:Not Peak Performance, Turn Memory-BW into Sustained Performance!For computation-intensiveapplications only!Need balancedimprovement both in flop/sand BW! for high-efficiencyin wide application areasLINPACKSustained*PerformanceFlop/s-oriented, memory-limited designOur target:High B/F oriented design28th WSSP!23Applica'on*SpectrumOct. 9-10, 2018

Hiroaki Kobayashi, Tohoku UniversityWhat Does the Next Vector System Look Likein Year Around 2020-2021?Vector Engine Spec. The 7nm Technology becomes available? 5X more transistors from 16nm tech?Aurora in20Aurora-218in 2021?5X in # of Cores, i.e. 50 VE cores feasible?up to 15TF, if the core performance is same, but should be lowered due to power/thermallimitation of the chip.Memory Subsystem 2x in Memory BW, and 1.5X in Memory Capacity when using HBM 3 under the assumption ofthe same chip size of Aurora-TSUBASA 3TB/s and 96GB?Design targets of 0.5BF (20 cores of 6TF for memory-intensive applications) to0.25 BF (40 Cores of 12TF for compute-intensive applications) 28th WSSPbe competitive with contemporary HEC systems at that time, such as Post-K (JP), A21 (US),NERSC-9 (USA), Crossroads (US), EU Exa-System (FR/GE), NUDT2020 (Ch) Oct. 9-10, 2018

Hiroaki Kobayashi, Tohoku UniversityWhat Does the Next Vector System Look Likein Year Around 2020-2021? (Cont’d)How 20 40 cores are integrated and connected. Single chip or multi-chip (SIP) ?If SIP is employed, how multiple chips are connected?If EMIB available, BW could be increased?Silicon photonics with WDM becomes available? Single SMP or clustered SMPcrossbar, mesh, ring, etc or their hierarchical and hybrid?coherency protocol of ADB (Snoopy or Directory)32 Cores, 9.6TF, 3TB/s, 0.3BF28th WSSPSource by IBMSource by IntelOct. 9-10, 2018

0iHiroaki Kobayashi, Tohoku UniversityiQuantum Computer: XxH HArchitecture0 iEmerging Domain Specifici 1Quantum computing is drawing much attention recentlyas an emerging technology in the era of post-MooreIn particular, quantum computers for quantumannealing are commercialized by the D-wavesystems, and their applications are developed worldwidely.Google, NASA, Volkswagen, Lockheed, Denso The base model named the Ising model to design andimplement the D-wave machines has been proposedby Prof. Nishimori et al of Tokyo Inst. Tech. In 1998.The quantum annealing is a metaheuristic for finding theglobal minimum of a given objective function over agiven set of candidate solutions (candidate states), by aprocess using quantum fluctuationsAn ideal solver for combinatorial problems!28th WSSP26Transverse magnetic fieldtype quantum annealingChip and System (D-Wave)Optimal solutionSimultaneous Search toreach optimal one byQuantum FluctuationOct. 9-10, 2018

Hiroaki Kobayashi, Tohoku UniversityToward Realization of Quantum Computing-Assisted HPC InfrastructureTohoku University has established an interdisciplinary priority research institute,named Q-HPC, for Quantum Computing-Accelerated HPC in 2018As Q-HPC members, we start a new 5-year research program named “R&D ofQuantum Annealing-Assisted HPC Infrastructure”, supported by MEXTBecomes an innovative infrastructure to develop next-generation applicationsin the fields of computational science, data sciences and their fusionsprovides transparent accesses to not only classical HPC resources but alsoQuantum Computing one in a unified fashion.28th WSSP27Oct. 9-10, 2018

Hiroaki Kobayashi, Tohoku UniversityTeam Organization28th WSSP28Oct. 9-10, 2018

Hiroaki Kobayashi, Tohoku UniversityAn Example of Target Application:QA-Enhanced Real-Time Tsunami Inundation Forecasting andOptimal Evacuation PlanningFault Estimation with MCMCTsunami Inundation SimulationOptimal Evacuation Planningwith Quantum AnnealingIntegrated Programming FrameworkD-waveMachine28th WSSPAurora TSUBASAVector Host(Xeon)29Aurora TSUBASAVector EngineOct. 9-10, 2018

Hiroaki Kobayashi, Tohoku UniversityAn Example of Target Applications:Digital Twin Numerical Turbine28th WSSP30Oct. 9-10, 2018

Hiroaki Kobayashi, Tohoku UniversityLet’s Meet together again at the next WSSP at Tohoku Univ.29th Workshop on SustainedSimulation PerformanceDate: March 19-20, 2019Place: Tohoku University28th WSSP31Oct. 9-10, 2018

Hiroaki Kobayashi, Tohoku University 28th WSSP Oct. 9-10, 2018 Quick Introduction of NEC’s New Vector System: SX-Aurora TSUBASA A

Related Documents:

NEC 3 Professional Services Contract (PSC). NEC 3 Short Contract (ECSC) and Short Subcontract (ECSS). NEC 3 Adjudicator’s Contract (AC). NEC 3 Term Services Contract (TSC). NEC 3 Term Service Short Contract (TSSC). NEC 3 Framework Contract (Framework Contract). NEC 3 Supply Contract (SC).

Dec 02, 2017 · Based on 2017 National Electrical Code (NFPA 70) The Philippine Electrical Code Part 1 2009 Edition was based on NEC 2005 Regularly revised (every three years) to reflect the evolution of products, materials, and installation techniques. NEC 2008, NEC 2011, NEC 2014 & NEC 2017 NEXT NEC REVISION: NEC 2020, 2023, 2026, 2029

5 Location NEC 2008 NEC 2011 NEC 2014 NEC 2017 NEC 2020 Family Rooms AFCI AFCI AFCI AFCI AFCI Dining Rooms AFCI AFCI AFCI AFCI AFCI Kitchens - 125V Receptacles GFCI GFCI AF/GF AF/GF AF/GF Kitchens - 250V Receptacles TM TM TM TM GFCI1 Bedrooms AFCI AFCI AFCI AFCI AF/GF1 Living Rooms AFCI AFCI AFCI AFCI AFCI Garage - 125V Receptacles GFCI GFCI GFCI GFCI GFCI Garage - 250V Receptacles TM TM .

JH-47 E-mail re: NEC Contention 4 10/25/2007 Joram PhD for NEC Shadis renewal-program.pdf Yes Yes E-mail re: Amended Hopenfeld, Expert Witness Karen JH-48 Contention 2 Reply 10/25/2007 Joram PhD for NEC Tyler None Yes Yes . Contention 2A issues 1/8/2008 Hopenfeld PhD for NEC Tyler 08-2008).ppt Yes Yes Raymond Shadis,

The NEC-IIC Incubation Center aims to be the hub of innovative and high impact ventures in social, technical, educational, commercial and other domains. It hopes to bring forth a revolution in how and what students learn and achieve while in college.NEC-IIC support members of the NEC including staff, students, alumni, faculty, and R&D partners .

Lecture, Vision Track: C&C User Forum 2005 NEC's Technology Innovation Strategy * T his article has been compiled by the NEC Technical Journal Editorial Office. It is based on the panel discussion between Ms. Atsuko Fukushima, a newscaster/essayist, and Mr. Kazuhiko Kobayashi, Executive Vice President of the NEC Corporation, on

with the idea in 1999 and NEC voted to give members a choice to choose their power provider in 2000. In 2005 NEC officially introduced NEC Co-op Energy, the Co-op's competitive retail division. Today, NEC Co-op Energy serves the retail electric needs of more than 28,000 meters across Texas. In the Texas electric industry, there are six investor .

the 2011 NEC include groundbreaking changes that broaden the Code's scope to cover alternate energy sources, green technologies, IT systems, and high voltage installations. (Softbound, 870 pages) 2011 NEC Handbook — Source for practical solu-tions and Code rationale. The NEC Handbook has the full 2011 NEC text, expert commentary on NEC