Implementing Fast Parallel Linear System Solvers In OpenFOAM Based On CUDA

1y ago
14 Views
2 Downloads
699.21 KB
32 Pages
Last View : 26d ago
Last Download : 3m ago
Upload by : Hayden Brunner
Transcription

Implementing Fast Parallel Linear SystemSolvers In OpenFOAM based on CUDADaniel P. Combest and Dr. P.A. Ramachandranand Dr. M.P. DudukovicChemical Reaction Engineering Laboratory (CREL)Department of Energy, Environmental, and ChemicalEngineering. Washington University, St. Louis, MO.Optimization, HPC, and Pre- and Post-Processing I Session.6th OpenFOAM Workshop Penn State University. June 15th 2011

Objectives2

Introduction to The GPU and CUDAWhat exactly is CUDA?Defined as: Compute Unified Device Architecture. I.e. a parallelcomputing architecture used in graphics processing units (GPU),developed by Nvidia.3

Introduction to The GPU and CUDAWhat exactly is CUDA?Defined as: Compute Unified Device Architecture. I.e. a parallelcomputing architecture used in graphics processing units (GPU),developed by Nvidia.What is CUDA C/C ?A language that provides an interface so that parallel algorithmscan be run on CUDA enabled Nvidia GPUs4

Introduction to The GPU and CUDAGPU v.s CPU CalculationsCPU-GPU Comparison of Floating-point operations per second [1]5

Introduction to The GPU and CUDAWhy are we interested?Larger problems require more computing resources (LES, coupledphysics)GPUs are fast when used properlyThey are relatively cheap6

Introduction to The GPU and CUDAWhy are we interested?Larger problems require more computing resources (LES, coupledphysics)GPUs are fast when used properlyThey are relatively cheapWhere can GPUs be applied?Where parallel algorithms live Linear algebra i.e. sparse matrix math7

Introduction to The GPU and CUDAWhy are we interested?Larger problems require more computing resources (LES, coupledphysics)GPUs are fast when used properlyThey are relatively cheapWhere can GPUs be applied?Where parallel algorithms live Linear algebra i.e. sparse matrix mathWhy don't we compile everything to work on the GPU?Only programs written in CUDA language can be parallelized onGPU. So we cannot just recompile OF.8

Integrating CUSP into library/“Cusp is a library for sparse linear algebra and graph computations onCUDA. Cusp provides a flexible, high-level interface for manipulatingsparse matrices and solving sparse linear systems.”[2]Provided Template Solvers: (Bi-) Conjugate Gradient (-Stabilized) GMRESMatrix Storage CSR, COO, HYB, DIAProvided Preconditioners Jacobi (diagonal) preconditioners Sparse Approximate inverse preconditioner Smoothed-Aggregation Algebraic Multigrid preconditioner9

Integrating CUSP into OpenFOAMhttp://code.google.com/p/thrust/“Thrust is a CUDA library of parallel algorithms with aninterface resembling the C Standard Template Library(STL). Thrust provides a flexible high-levelinterface for GPUprogramming that greatly enhances developer productivity.“ [3]10

Integrating CUSP into OpenFOAMOpenFOAM solve( );AXThrust Methodsb Cusp-based solver on GPUAXb cusp Methods11

Integrating CUSP into OpenFOAMOpenFOAM solve( );AXbThrust MethodslduMatrix is converted to COO Usingthrust::copy() in C Cusp-based solver on GPUAXb cusp Methods12

Integrating CUSP into OpenFOAMOpenFOAM solve( );AXbThrust MethodslduMatrix is converted to COO Usingthrust::copy() in C COO is transferred to GPU In CUDA CodeCusp-based solver on GPUAXb cusp Methods13

Integrating CUSP into OpenFOAMOpenFOAM solve( );AXbThrust MethodslduMatrix is converted to COO Usingthrust::copy() in C COO is transferred to GPU In CUDA CodeCusp-based solver on GPUAXbCOO is converted to other formats on GPUAnd passed to CUSP-based solver withconvergence criteria cusp Methods14

Integrating CUSP into OpenFOAMOpenFOAM solve( );AXbThrust MethodslduMatrix is converted to COO Usingthrust::copy() in C COO is transferred to GPU In CUDA CodeCusp-based solver on GPUAXb COO is converted to other formats on GPUAnd passed to CUSP-based solver withconvergence criteriaResidual calculated using OFnormalized residual methodcusp Methods15

Integrating CUSP into OpenFOAMOpenFOAM solve( );AXThrust Methodsb Cusp-based solver on GPUAXPass X vector andsolver performancedata back toOpenFOAM usingthrust-methodsb 16

Preliminary ResultsA test Problem.2D Heat Equation T 02Vary N from 10-2000where N2 nCells17

Preliminary ResultsSolver SettingsAll CG solversTolerance 1e-10;MaxIter ;nPreSweeps0;nPostSweeps 2;cacheAgglomeration true;nCellsInCoarsestLevel sqrt(nCells);agglomerator faceAreaPair;mergeLevels 1;

Preliminary ResultsSetupCUDA version 4.0CUSP version 0.2Thrust version 1.4Ubuntu 10.04CPU: Dual Intel Xeon Quad Core E5430 2.66GHzMotherboard: Tyan S5396RAM: 24 gigGPU: Tesla C2050 3GB DDR5515 Gflops peak double precision1.03 Tflops Peak single precision14 MP * 32 cores/MP 448 coresHost-device memory bw 1566 MB/sec (Motherboard specific)

Preliminary ResultsSolve TimeSolve() Time Comparison14001200cusplink SmAPCGGAMGcusplink DPCGcusplink CGDPCG-parallel4DPCG-parallel6-s231DPCGCGTime 00002500000nCells300000035000004000000450000020

Preliminary ResultsSolution SpeedupSpeedup Comparison18Speedup Ts/Tp -s231DPCG-parallel6-s161cusplink DPCGcusplink CG10Speedup8642005000001000000 1500000200000025000003000000 350000040000004500000nCells21

Preliminary ResultsSolution SpeedupSpeedup G-parallel6-s161cusplink CGcusplink DPCGGAMGGAMG6cusplink SmAPCG14012010080Speedup Ts/Tp 000025000003000000350000040000004500000nCells22

Preliminary ResultsSolution SpeedupSpeedup Comparison605040SpeedupSpeedup Ts/Tp PCG-parallel6-s161cusplink CGcusplink DPCGGAMG6GAMGcusplink 00nCells23

Preliminary Results24

Important Considerations25

Next Steps26

Take Home Messages The GPU only solves the Ax b system We have double precision GPUs have been integrated into OpenFOAM using Thrust and CUSP As cusp and thrust improve, nothing needs to be changed in this code,only to update cusp and thrust. They have been shown to be faster in the cases provided, because it ismostly solving Ax b. Residuals are calculated the same as in OpenFOAM Multi-GPU still needs attention. The results show that memory bandwidth still is an issue with thisparticular setup and results could be faster with other setup.

AcknowledgementsFunding and SupportNvidia Professor Partnership ProgramChemical Reaction Engineering Laboratory (CREL) MRE Fund(http://crelonweb.eec.wustl.edu/)OpenFOAM Developers CommunityAdvisorsDr. RamachandranDr. Dudukovic28

Sources1.2.3.Nvidia CUDA Programming Guide, Version 4.0, 2011. NvidiaCorporation.Nathan Bell and Michael Garland, Cusp: Generic ParallelAlgorithms for Sparse Matrix and Graph Computations, 2010,http://cusp-library.googlecode.com,Version 0.1.0Jared Hoberock and Nathan Bell, Thrust: A Parallel TemplateLibrary, 2010, http://www.meganewtons.com/,Version 1.3.029

Thanks for your attention!Questions?Contact Info:Dan Combestdcombest@seas.wustl.edu30

Preliminary ResultsSolution SpeedupSpeedup140cusplink CGcusplink DPCGcusplink rallel6-s161GAMGGAMG6120100Speedup80Speedup Ts/Tp 0000nCells3000000350000040000004500000

Residual ScalingFor matrix A x b,residual is defined asres b - AxWe then apply residual scaling with the following normalisation factor procedure:Type xRef gAverage(x);wA A x;pA A xRef;NormFactor gSum(cmptMag(wA - pA) cmptMag(source - pA)) matrix.small ;and the scaled residual is:residual gSum(cmptMag(source - wA))/normFactor;I will save you from complications with vectors and tensors in my block solver. :-)Enjoy,HrvSource: 57903-residuals-convergence-segregatedsolvers.html

Implementing Fast Parallel Linear System Solvers In OpenFOAM based on CUDA Daniel P. Combest and Dr. P.A. Ramachandran and Dr. M.P. Dudukovic Optimization, HPC, and Pre- and Post-Processing I Session. 6th OpenFOAM Workshop Penn State University. June 15th 2011 Chemical Reaction Engineering Laboratory (CREL)

Related Documents:

SKF Linear Motion linear rail INA Linear SKF Linear Motion Star Linear Thomson Linear linear sysTems Bishop-Wisecarver INA Linear Pacific Bearing Thomson Linear mecHanical acTuaTors Duff-Norton Joyce/Dayton Nook Industries Precision mecHanical comPonenTs PIC Design Stock Drive Product

mx b a linear function. Definition of Linear Function A linear function f is any function of the form y f(x) mx b where m and b are constants. Example 2 Linear Functions Which of the following functions are linear? a. y 0.5x 12 b. 5y 2x 10 c. y 1/x 2 d. y x2 Solution: a. This is a linear function. The slope is m 0.5 and .

The MikroTik Fast Path and Conntrack's work together gave the name Fast Track. Fast Track Fast Path extentions Only Ipv4 TCP/UDP (Total Traffic %99) FastTrack management is left to network admin FastTrack can be used on devices with Fast Path support. After the first packet of the connection passing through the router is marked as Fast Track .

linear matrix inequality (LMI), 77, 128, 144 linear quadratic Gaussian estimation (LQG), 244 linear quadratic regulation (LQR), 99-102, 211-215, 223-230 linear time-invariant (LTI) system, 6 linear time-varying (LTV) system, 6 L8 norm, 260 LMI, see linear matrix inequality local linearization, 11-14, 88 around equilibrium point in continu-

will be useful in designing linear induction motor. Key Words : linear induction motor, 3D FEA, analyt-ical method, Maxwells equation, eddy current analysis 1 Introduction Linear electric machines can generate a linear driving force, and there are advantages to using a linear driving system. That is, in the case of a linear electric machine in .

Series-Parallel Circuits If we combined a series circuit with a parallel circuit we produce a Series-Parallel circuit. R1 and R2 are in parallel and R3 is in series with R1 ǁ R2. The double lines between R1 and R2 is a symbol for parallel. We need to calculate R1 ǁ R2 first before adding R3.

The Series-Parallel Network In this circuit: R 3 and R 4 are in parallel Combination is in series with R 2 Entire combination is in parallel with R 1 Another example: C-C Tsai 4 Analysis of Series-Parallel Circuits Rules for analyzing series and parallel circuits apply: Same current occurs through all series elements

Spring Lake Elementary Schools Curriculum Map 2nd Grade Reading The following CCSS’s are embedded throughout the year, and are present in units applicable: CCSS.ELA-Literacy.SL.2.1 Participate in collaborative conversations with diverse partners about grade 2 topics and texts with peers and adults in small and larger groups. CCSS.ELA-Literacy.SL.2.2 Recount or describe key ideas or .