Amber And Kepler GPUs - Nvidia

2y ago
76 Views
2 Downloads
3.19 MB
45 Pages
Last View : 14d ago
Last Download : 3m ago
Upload by : Nora Drum
Transcription

AMBER and Kepler GPUsJulia Levites, Sr. Product Manager, NVIDIARoss Walker, Assistant Research Professor and NVIDIA CUDA FellowSan Diego Supercomputer and Department of Chemistry & BiochemistrySAN DIEGO SUPERCOMPUTER CENTER1

Walker MolecularDynamics Labhttp://www.wmd-lab.org/GPU AccelerationLipid Force FieldDevelopmentQM/MM MDAutomatedRefinementResearchers / Postdocs: Andreas Goetz, Romelia Salomon, JianYinGraduate Students: Ben Madej (UCSD/SDSC), Justin McCallum (ImperialCollege), Age Skjevik (UCSD/SDSC/Bergen), Davide Sabbadin (SDSC)Undergraduate Researchers: Robin Betz, Matthew Clark, Mike WuSAN DIEGO SUPERCOMPUTER CENTER2UCSD

What is Molecular Dynamics? In the context of this talk: The simulation of the dynamical properties of condensedphase biological systems. Enzymes / Proteins Drug Molecules Biological Catalysts Classical Energy Function Force Fields Parameterized (Bonds, Angles,Dihedrals, VDW, Charges ) Integration of Newton’s equationsof motion. Atoms modeled as points, electrons included implicitly within theparameterization.SAN DIEGO SUPERCOMPUTER CENTER3

Why Molecular Dynamics? Atoms move!– Life does NOT exist at the global minimum.– We may be interested in studying time dependentphenomena, such as molecular vibrations, structuralreorganization, diffusion, etc.– We may be interested in studying temperature dependantphenomena, such as free energies, anharmonic effects,– etc. Ergodic Hypothesis– Time average over trajectory is equivalent to an ensembleaverage.– Allows the use of MD for statistical mechanics studies.SAN DIEGO SUPERCOMPUTER CENTER4

Force FieldsSAN DIEGO SUPERCOMPUTER CENTER5

What is AMBER?An MD simulationpackage12 Versions as of 2012A set of MD forcefieldsfixed charge, biomolecular forcefields:ff94, ff99, ff99SB, ff03, ff11, ff12distributed in two parts:experimental polarizable forcefields e.g.ff02EP- AmberTools, preparatory and analysisprograms, free under GPLparameters for general organicmolecules, solvents, carbohydrates(Glycam), etc.- Amber the main simulation programs,under academic licensingin the public domainindependent from the accompanyingforcefieldsSAN DIEGO SUPERCOMPUTER CENTER6

The AMBER Development TeamA Multi-Institutional Research CollaborationPrincipal contributors to the current codes:David A. Case (Rutgers University)Tom Darden (NIEHS)Thomas E. Cheatham III (University of Utah)Carlos Simmerling (Stony Brook)Junmei Wang (UT Southwest Medical Center)Robert E. Duke (NIEHS and UNC-Chapel Hill)Ray Luo (UC Irvine)Mike Crowley (NREL)Ross Walker (SDSC)Wei Zhang (TSRI)Kenneth M. Merz (Florida)Bing Wang (Florida)Seth Hayik (Florida)Adrian Roitberg (Florida)Gustavo Seabra (Florida)SAN DIEGO SUPERCOMPUTER CENTERKim F. Wong (University of Utah)Francesco Paesani (University of Utah)Xiongwu Wu (NIH)Scott Brozell (TSRI)Thomas Steinbrecher (TSRI)Holger Gohlke (J.W. GoetheUniversität)Lijiang Yang (UC Irvine)Chunhu Tan (UC Irvine)John Mongan (UC San Diego)Viktor Hornak (Stony Brook)Guanglei Cui (Stony Brook)David H. Mathews (Rochester)Celeste Sagui (North Carolina State)Volodymyr Babin (North CarolinaState)Peter A. Kollman (UC San Francisco)7

AMBER Usage Approximately 850 site licenses (per version)across most major countries.SAN DIEGO SUPERCOMPUTER CENTER8

What can we do with MolecularDynamics? Can simulate time dependent properties. Protein domain motions. Small Protein Folds. Spectroscopic Properties. Can simulate ensemble properties. Binding free energies. Drug Design Biocatalyst Design Reaction Pathways Free Energy Surfaces.SAN DIEGO SUPERCOMPUTER CENTER9

Why do we need Supercomputers?(Complex Equations)U RK r r reqbondsVn1 cos ndihedrals 2atomsi j22Keqanglesatomsi jAijBij12ij6ijRRqi q jRijSAN DIEGO SUPERCOMPUTER CENTER10

Why do we need Supercomputers?Lots of AtomsSAN DIEGO SUPERCOMPUTER CENTER11

Why do we need Supercomputers?Lots of Time Steps Maximum time per step is limited by fastestmotion in system (vibration of bonds) 2 femto seconds (0.000000000000002 seconds)(Light travels 0.006mm in 2 fs) Biological activity occurs on the nano-second tomicro-second timescale.1 micro second 0.000001 secondsSO WE NEED500 million steps to reach 1 microsecond!!!SAN DIEGO SUPERCOMPUTER CENTER12

Moving Molecular DynamicsForward?(at the speed of memory / interconnect?)SAN DIEGO SUPERCOMPUTER CENTER13

Just build bigger supercomputersBeliefs in moving to the exascaleAtheistHereticBeliever R, -P R, P R, P(Today)True BelieverR, PR, PR Node SpeedP No. NodesLudditesFanatic-R, -P-R, PThe Problem: ImmediateFuture is somewhere here.14

The Problem(s) Molecular Dynamics is inherently serial. To compute t 1 we must have computed allprevious steps. We cannot simply make the system bigger sincethese need more sampling (although manypeople conveniently forget this). 100M atoms 300M degrees of freedom (d.o.f) 10ns 5,000,000 time steps 60x less time steps than d.o.f. We can run ensembles of calculations but thesepresent their own challenges (both practical andpolitical).SAN DIEGO SUPERCOMPUTER CENTER15

Better Science? Bringing the tools the researcher needs into hisown lab. Can we make a researcher’s desktop look like a small cluster (removethe queue wait)? Can we make MD truly interactive (real time feedback /experimentation?) Can we find a way for a researcher to cheaply increase the power ofall his graduate students workstations? Without having to worry about available power (power cost?). Without having to worry about applying for clustertime. Without having to have a full time ‘student’to maintain the group’s clusters? GPU’s offer a possible costeffective solution.SAN DIEGO SUPERCOMPUTER CENTER16

Requirements Any implementation that expects to gainwidespread support must be: Simple / transparent to use. Scientists want science first. Technology is the enabler, NOT the science. Whichever path is the easiest will be the one which is taken. Not make additional approximations. Have broad support. Have longevity (5 years minimum).SAN DIEGO SUPERCOMPUTER CENTER17

The Project Develop a GPU acceleratedversion of AMBER’s PMEMD.San DiegoSupercomputer CenterRoss C. WalkerFunded as a pilot project (1year) under NSF SSEProgram & renewed for 3more years. SAN DIEGO SUPERCOMPUTER CENTERNVIDIAScott Le GrandDuncan Poole18

Project Info AMBER Website: z, A.W., Williamson, M.J., Xu, D., Poole, D.,Grand, S.L., Walker, R.C. "Routine microsecondmolecular dynamics simulations with amber - part i: Generalized born", Journal of Chemical Theory andComputation, 2012, 8 (5), pp 1542-1555, DOI:10.1021/ct200909jPierce, L.C.T., Salomon-Ferrer, R. de Oliveira, C.A.F. McCammon, J.A. Walker, R.C., "Routine accessto millisecond timescale events with accelerated molecular dynamics.", Journal of Chemical Theory andComputation, 2012, 8 (9), pp 2997-3002, DOI: 10.1021/ct300284cSalomon-Ferrer, R.; Case, D.A.; Walker, R.C.; "An overview of the Amber biomolecular simulationpackage", WIREs Comput. Mol. Sci., 2012, in press, DOI: 10.1002/wcms.1121Grand, S.L.; Goetz, A.W.; Walker, R.C.; "SPFP: Speed without compromise - a mixed precision modelfor GPU accelerated molecular dynamics simulations", Chem. Phys. Comm., 2013, 184, pp374-380,DOI: 10.1016/j.cpc.2012.09.022Salomon-Ferrer, R.; Goetz, A.W.; Poole, D.; Le Grand, S.; Walker, R.C.* "Routine microsecondmolecular dynamics simulations with AMBER - Part II: Particle Mesh Ewald" , J. Chem. TheoryComput., (in review), 2013SAN DIEGO SUPERCOMPUTER CENTER19

Original Design Goals Transparent to the user. Easy to compile / install. AMBER Input, AMBER Output Simply requires a change in executable name. Cost effective performance. C2050 should be equivalent to 4 or 6 standard IB nodes. Focus on accuracy. Should NOT make any additional approximations wecannot rigorously defend. Accuracy / Precision should be directly comparable to thestandard CPU implementation.SAN DIEGO SUPERCOMPUTER CENTER20

Version History AMBER 10 – Released Apr 2008 Implicit Solvent GB GPU support released as patch Sept 2009. AMBER 11 – Released Apr 2010 Implicit and Explicit solvent supported internally on single GPU. Oct 2010 – Bugfix.9 doubled performance on single GPU, addedmulti-GPU support. AMBER 12 – Released Apr 2012 Added Umbrella Sampling Support, REMD, Simulated Annealing,aMD, IPS and Extra Points. Aug 2012 – Bugfix.9 new SPFP precision model, support for Kepler I,GPU accelerate NMR restraints, improved performance. Jan 2013 – Bugfix.14 support CUDA 5.0, Jarzynski on GPU, GBSA.Kepler II support.SAN DIEGO SUPERCOMPUTER CENTER21

Supported Features Summary Supports ‘standard’ MD Explicit Solvent (PME) NVE/NVT/NPT Implicit Solvent (Generalized Born) AMBER and CHARMM classical force fields Thermostats Berendsen, Langevin, Anderson Restraints / Constraints Standard harmonic restraints Shake on hydrogen atomsSAN DIEGO SUPERCOMPUTER CENTERNew in AMBER 12 Umbrella SamplingREMDSimulated AnnealingAccelerated MDIsotropic Periodic SumExtra Points22

Precision ModelsSPSP - Use single precision for the entire calculation with theexception of SHAKE which is always done in double precision.SPDP - Use a combination of single precision for calculation anddouble precision for accumulation (default AMBER 12.9)DPDP – Use double precision for the entire calculation.SPFP – New!1 – Single / Double / Fixed precision hybrid. Designed foroptimum performance on Kepler I. Uses fire and forget atomicops. Fully deterministic, faster and more precise than SPDP,minimal memory overhead. (default AMBER 12.9)Q24.40 for Forces, Q34.30 for Energies / Virials1. ScottLe Grand, Andreas W. Goetz, Ross C. Walker, “SPFP: Speed without compromise - a mixed precisionmodel for GPU accelerated molecular dynamics simulations”, Comp. Phys. Comm., 2012, in review.SAN DIEGO SUPERCOMPUTER CENTER23

Supported System SizesSAN DIEGO SUPERCOMPUTER CENTER24

Running on GPUs Details provided on: http://ambermd.org/gpus/ Compile (assuming nvcc 4.2 installed) cd AMBERHOME./configure –cuda gnumake installmake test Running on GPU Just replace executable pmemd with pmemd.cuda AMBERHOME/bin/pmemd.cuda –O –i mdin If set process is exclusive mode is on for each GPU,pmemd just ‘Does the right thing’ SAN DIEGO SUPERCOMPUTER CENTER25

PerformanceAMBER 12SAN DIEGO SUPERCOMPUTER CENTER26

Implicit SolventPerformance(SPFP)TRPCage 304 s expected the performancedifferential is larger for biggersystems.Myoglobin 2492 AtomsNucleosome 25095 0.110.00Throughput in ns/daySAN DIEGO SUPERCOMPUTER CENTER27

Explicit Solvent Performance(JAC-DHFR NVE Production .094xK10 (2 card)97.953xK10 (1.5 card)82.702xK10 (1 card)67.021xK10 (0.5 card)52.942xM2090 (1 node)Single 1000 GPU58.281xM209043.741xGTX Titan111.344xGTX680 (1 node)118.883xGTX680 (1 node)101.312xGTX680 (1 node)86.841xGTX68072.491xGTX58054.4696xE5-2670 (6 nodes)41.5564xE5-2670 (4 nodes)58.1948xE5-2670 (3 nodes)50.2532xE5-2670 (2 nodes)38.4916xE5-2670 (1 node)21.130.0020.00Kepler K20 GPU40.00Kepler K10 GPU60.00M2090 GPUSAN DIEGO SUPERCOMPUTER CENTER80.00GTX GPU100.00120.00140.00Gordon CPUThroughput in ns/day 28

Explicit Solvent Benchmark(Cellulose NVE Production K10 (2 card)6.443xK10 (1.5 card)5.452xK10 (1 card)4.331xK10 (0.5 card)3.262xM2090 (1 node)Single 1000 GPU3.751xM20902.671 x GTX Titan7.854xGTX680 (1 node)7.563xGTX680 (1 node)6.472xGTX680 (1 node)5.301xGTX6804.341xGTX5803.16128xE5-2670 (8 nodes)5.7296xE5-2670 (6 nodes)4.7364xE5-2670 (4 nodes)3.7348xE5-2670 (3 nodes)2.9532xE5-2670 (2 nodes)2.0516xE5-2670 (1 node)1.120.001.002.00Kepler K20 GPU3.00Kepler K10 GPU4.00M2090 GPUSAN DIEGO SUPERCOMPUTER CENTER5.00GTX GPU6.007.008.009.00Gordon CPUThroughput in ns/day 29

Performance Example (TRP Cage)CPU 8xE5462SAN DIEGO SUPERCOMPUTER CENTERGPU C205030

31

Interactive MD? Single nodes are now fast enough, GPU enabled cloud nodesactually make sense as a back end now.SAN DIEGO SUPERCOMPUTER CENTER32

33

Recommended HardwareSupported GPUs (examples, not exhaustive)Hardware Version 3.5 Kepler K20 / K20X GTX TitanHardware Version 3.0 Kepler K10 GTX670 / 680 / 690Hardware Version 2.0 Tesla M2050/M2070/M2075/M2090 (and C variants) GTX560 / 570 / 580 / 590 GTX465 / 470 / 480C1060/S1070 / GTX280 etc also supportedSAN DIEGO SUPERCOMPUTER CENTER34

Graphics cardOther1%Quadro9%Don't know18%Tesla42%GeForce30%For the research group you belong to, please write in the specific card modelsif you know them.SAN DIEGO SUPERCOMPUTER CENTER35

Recommended Hardware See the following page for continuous updates:http://ambermd.org/gpus/recommended hardware.htm#hardwareSAN DIEGO SUPERCOMPUTER CENTER36

DIY 4 GPU SystemAntec P280 Black ATX Mid Tower Casehttp://tinyurl.com/a2wtkfr 126.46SILVERSTONE ST1500 1500WATX12V/EPS12V Power Supplyhttp://tinyurl.com/alj9w93 299.99AMD FX-8350 Eight Core CPUhttp://tinyurl.com/b9teunj 189.15Corsair Vengeance 16GB(2x8GB) DDR3 1600 MHzDesktop Memoryhttp://tinyurl.com/amh4jyu 96.87GIGABYTE GA-990FXA-UD7AM3 AMD Motherboardhttp://tinyurl.com/b8yvykv 216.00Seagate Barracuda 7200 3 TB7200RPM Internal Bare DriveST3000DM001http://tinyurl.com/a4ccfvj 139.994of EVGA GeForce GTX 6804096 MB GDDR5http://tinyurl.com/d82lq8d 534.59 eachOr K20X or GTX TitanTotal Price : 3206.82Note cards in this system run at x8 so you can only run single GPU AMBER runs (but you can run 4 simultaneously at fullspeed) – If you want to be able to run MPI 2xGPU runs then only place 2 cards in the x16 slots.SAN DIEGO SUPERCOMPUTER CENTER37

Single WorkstationBased on Exxact Model Quantum TXR410-512R(Available as Amber MD Workstation with AMBER 12 preinstalled)(A) SuperServer Tower / 4U Convertible Chassis, Supports Up To 3x 5.25 InchBays, 8x 3.5 Inch Hot-Swap HDD Bays, Up To 4x Double-Width GPU, 1620WRedundant Power Supplies(B) SuperServer Intel Patsburg Based Motherboard, Supports Up To 2x SandyBridge EP (Socket R) Series CPU, 2x 10/100/1000 NIC, Dedicated IPMI Port, 4xPCIE 3.0 x16 Slots, 2x PCIE 3.0 x8 Slots, Up To 512GB DDR3 1600MHzECC/REG Memory(C) Intel Xeon E5-2620 2.00 Ghz 15MB Cache 7.20GT/sec LGA 2011 6-CoreProcessor (2)(D) Certified 4GB 240-Pin DDR3 SDRAM ECC Registered DDR3 1600 MHzServer Memory (8)(E) Certified 2TB 7200RPM 64MB Cache 3.5 Inch SATA Enterprise Class HDD in aRAID 1 Configuration (2)(G) GeForce GTX 680 4GB or GTX Titan or K20X 6GB 384-bit GDDR5 PCIExpress 3.0 Accelerator (4)(H) CentOS 6Price 6,500 (GTX 680) 8,500 (GTX Titan) 20,000 (K20X)SAN DIEGO SUPERCOMPUTER CENTER38

Clusters# of CPU sockets2Cores per CPU socket4 (1 CPU core drives 1 GPU)CPU speed (Ghz)2.0 System memory per node (GB)16 to 32GPUsKepler K10, K20, K20XFermi M2090, M2075, C2075# of GPUs per CPU socket1-2(4 GPUs on 1 socket is good to do 4 fast serial GPU runs)GPU memory preference (GB)6GPU to CPU connectionPCIe 2.0 16x or higherServer storage2 TB Network configurationInfiniband QDR or better (optional)Scale to multiple nodes with same single node configurationSAN DIEGO SUPERCOMPUTER CENTER39

AcknowledgementsSan Diego Supercomputer CenterUniversity of California San DiegoNational Science FoundationNSF Strategic Applications Collaboration (AUS/ASTA) ProgramNSF SI2-SSE ProgramNVIDIA CorporationHardware PeoplePeopleRomelia SalomonAndreas GoetzScott Le GrandMike WuMatthew ClarkRomelia SalomonRobin BetzJason SwailsBen Madej40SAN DIEGO SUPERCOMPUTER CENTERDuncan PooleMark BergerSarah Tariq

AMBER User Survey – 2011GPU Momentum is Growing!AMBER machinesDon'tknow1%GPUs andCPUs49%CPUs only50%GPU Experience2-3years6%Don'tknow8%1-2years22%Lessthan 6months30%Almost 1year34%49% of AMBER machines have GPUs85% of users have up to 2 years of GPU experienceSlide #41

Testimonialswhole lab loves the GPU cluster.“ TheStudents are now able to run AMBERsimulations that would not have beenfeasible on our local CPU-basedresources before. Research throughputthe group has been enhancedsignificantly.”Jodi HaddenChemistry Graduate StudentWoods Computing LabComplex Carbohydrate Research CenterUniversity of GeorgiaSlide #42

GPU Accelerated Apps MomentumKey codes are GPU Accelerated!Molecular DynamicsAbalone – GPU only codeACEMD – GPU only codeAMBERCHARMMDL POLYGROMACSHOOMD-Blue – GPU onlycodeLAMMPSNAMDQuantum ChemistryABINITBigDFTCP2KGAMESSGaussian – in developmentNWChem – in developmentQuantum EspressoTeraChem – GPU only codeVASPCheck many more apps at www.nvidia.com/teslaappsSlide #43

Test Drive K20 GPUs!Experience The AccelerationRun AMBER on Tesla K20 GPUtodaySign up for FREE GPU TestDrive on remotely hostedclusterswww.nvidia.com/GPUTestDriveSlide #44

Test Drive K20GPUs!Registration is Open!March 18-21, 2013 San Jose, CAExperience The AccelerationRun AMBER on Tesla K20 GPUtodaySign up for FREE GPU TestDrive on remotely hostedclusterswww.nvidia.com/GPUTestDriveFour daysThree keynotes400 sessionsOne day of preconference developertutorials150 research postersLots of networkingevents andopportunitiesVisit www.gputechconf.com formore info.Slide #45

Precision Models SPSP - Use single precision for the entire calculation with the exception of SHAKE which is always done in double precision. SPDP - Use a combination of single precision for calculation and double precision for accumulation (default AMBER 12.9) DPDP – Use d

Related Documents:

brief overview of CUDA see Appendix A - Quick Refresher on CUDA). The following table compares parameters of different Compute Capabilities for Fermi and Kepler GPU architectures: Compute Capability of Fermi and Kepler GPUs FERMI GF100 FERMI GF104 KEPLER GK104 KEPLER

NVIDIA GRID K2 1 Number of users depends on software solution, workload, and screen resolution NVIDIA GRID K1 GPU 4 Kepler GPUs 2 High End Kepler GPUs CUDA cores 768 (192 / GPU) 3072 (1536 / GPU) Memory Size 16GB DDR3 (4GB / GPU) 8GB GDDR5 Max Power 130 W 225 W Form Factor Dual Slot ATX, 10.5” Dual Slot ATX,

M5. The Cisco UCS C240 M5 Rack Server can host up to four NVIDIA T4 Tensor Core GPUs for AI inferencing, or up to two NVIDIA Tesla V100 Tensor Core GPUs for training workloads. The compact, 1RU Cisco UCS C220 M5 Rack Server can host up to two NVIDIA T4 Tensor Core GPUs. NetApp ONTAP. The ONTAP software built into

Accelerating Ansys Fluent Using NVIDIA GPUs Accelerating ANSYS Fluent 15.0 Using NVIDIA GPUs DA-07311-001_v01 9 3. CHANGING AMGX CONFIGURATION In ANSYS Fluent 15.0, the Algebraic Multigrid (AMG) linear system solver used on the CPU is different from that used on the GPU. In the latter case, the AmgX library is used to perform the

Appendix A ‐ Quick Refresher on CUDA). The following table compares parameters of different Compute Capabilities for Fermi and Kepler GPU architectures: FERMI GF100 FERMI GF104 KEPLER GK104 KEPLER GK110 Comp

NVIDIA virtual GPU products deliver a GPU Experience to every Virtual Desktop. Server. Hypervisor. Apps and VMs. NVIDIA Graphics Drivers. NVIDIA Virtual GPU. NVIDIA Tesla GPU. NVIDIA virtualization software. CPU Only VDI. With NVIDIA Virtu

Kubernetes is an open-source platform for automating deployment, scaling and managing containerized applications. Kubernetes on NVIDIA GPUs includes support for GPUs and enhancements to Kubernetes so users can easily configure and use GPU resources for accelerating w

Keywords: Artificial intelligence, Modern society, Future impact, Digital world Introduction Artificial Intelligence or AI, as it is popularly known as, was founded in 1955. Since then, it has .