Implementing Fast Parallel Linear SystemSolvers In OpenFOAM based on CUDADaniel P. Combest and Dr. P.A. Ramachandranand Dr. M.P. DudukovicChemical Reaction Engineering Laboratory (CREL)Department of Energy, Environmental, and ChemicalEngineering. Washington University, St. Louis, MO.Optimization, HPC, and Pre- and Post-Processing I Session.6th OpenFOAM Workshop Penn State University. June 15th 2011
Objectives2
Introduction to The GPU and CUDAWhat exactly is CUDA?Defined as: Compute Unified Device Architecture. I.e. a parallelcomputing architecture used in graphics processing units (GPU),developed by Nvidia.3
Introduction to The GPU and CUDAWhat exactly is CUDA?Defined as: Compute Unified Device Architecture. I.e. a parallelcomputing architecture used in graphics processing units (GPU),developed by Nvidia.What is CUDA C/C ?A language that provides an interface so that parallel algorithmscan be run on CUDA enabled Nvidia GPUs4
Introduction to The GPU and CUDAGPU v.s CPU CalculationsCPU-GPU Comparison of Floating-point operations per second [1]5
Introduction to The GPU and CUDAWhy are we interested?Larger problems require more computing resources (LES, coupledphysics)GPUs are fast when used properlyThey are relatively cheap6
Introduction to The GPU and CUDAWhy are we interested?Larger problems require more computing resources (LES, coupledphysics)GPUs are fast when used properlyThey are relatively cheapWhere can GPUs be applied?Where parallel algorithms live Linear algebra i.e. sparse matrix math7
Introduction to The GPU and CUDAWhy are we interested?Larger problems require more computing resources (LES, coupledphysics)GPUs are fast when used properlyThey are relatively cheapWhere can GPUs be applied?Where parallel algorithms live Linear algebra i.e. sparse matrix mathWhy don't we compile everything to work on the GPU?Only programs written in CUDA language can be parallelized onGPU. So we cannot just recompile OF.8
Integrating CUSP into library/“Cusp is a library for sparse linear algebra and graph computations onCUDA. Cusp provides a flexible, high-level interface for manipulatingsparse matrices and solving sparse linear systems.”[2]Provided Template Solvers: (Bi-) Conjugate Gradient (-Stabilized) GMRESMatrix Storage CSR, COO, HYB, DIAProvided Preconditioners Jacobi (diagonal) preconditioners Sparse Approximate inverse preconditioner Smoothed-Aggregation Algebraic Multigrid preconditioner9
Integrating CUSP into OpenFOAMhttp://code.google.com/p/thrust/“Thrust is a CUDA library of parallel algorithms with aninterface resembling the C Standard Template Library(STL). Thrust provides a flexible high-levelinterface for GPUprogramming that greatly enhances developer productivity.“ [3]10
Integrating CUSP into OpenFOAMOpenFOAM solve( );AXThrust Methodsb Cusp-based solver on GPUAXb cusp Methods11
Integrating CUSP into OpenFOAMOpenFOAM solve( );AXbThrust MethodslduMatrix is converted to COO Usingthrust::copy() in C Cusp-based solver on GPUAXb cusp Methods12
Integrating CUSP into OpenFOAMOpenFOAM solve( );AXbThrust MethodslduMatrix is converted to COO Usingthrust::copy() in C COO is transferred to GPU In CUDA CodeCusp-based solver on GPUAXb cusp Methods13
Integrating CUSP into OpenFOAMOpenFOAM solve( );AXbThrust MethodslduMatrix is converted to COO Usingthrust::copy() in C COO is transferred to GPU In CUDA CodeCusp-based solver on GPUAXbCOO is converted to other formats on GPUAnd passed to CUSP-based solver withconvergence criteria cusp Methods14
Integrating CUSP into OpenFOAMOpenFOAM solve( );AXbThrust MethodslduMatrix is converted to COO Usingthrust::copy() in C COO is transferred to GPU In CUDA CodeCusp-based solver on GPUAXb COO is converted to other formats on GPUAnd passed to CUSP-based solver withconvergence criteriaResidual calculated using OFnormalized residual methodcusp Methods15
Integrating CUSP into OpenFOAMOpenFOAM solve( );AXThrust Methodsb Cusp-based solver on GPUAXPass X vector andsolver performancedata back toOpenFOAM usingthrust-methodsb 16
Preliminary ResultsA test Problem.2D Heat Equation T 02Vary N from 10-2000where N2 nCells17
Preliminary ResultsSolver SettingsAll CG solversTolerance 1e-10;MaxIter ;nPreSweeps0;nPostSweeps 2;cacheAgglomeration true;nCellsInCoarsestLevel sqrt(nCells);agglomerator faceAreaPair;mergeLevels 1;
Preliminary ResultsSetupCUDA version 4.0CUSP version 0.2Thrust version 1.4Ubuntu 10.04CPU: Dual Intel Xeon Quad Core E5430 2.66GHzMotherboard: Tyan S5396RAM: 24 gigGPU: Tesla C2050 3GB DDR5515 Gflops peak double precision1.03 Tflops Peak single precision14 MP * 32 cores/MP 448 coresHost-device memory bw 1566 MB/sec (Motherboard specific)
Preliminary ResultsSolve TimeSolve() Time Comparison14001200cusplink SmAPCGGAMGcusplink DPCGcusplink CGDPCG-parallel4DPCG-parallel6-s231DPCGCGTime 00002500000nCells300000035000004000000450000020
Preliminary ResultsSolution SpeedupSpeedup Comparison18Speedup Ts/Tp -s231DPCG-parallel6-s161cusplink DPCGcusplink CG10Speedup8642005000001000000 1500000200000025000003000000 350000040000004500000nCells21
Preliminary ResultsSolution SpeedupSpeedup G-parallel6-s161cusplink CGcusplink DPCGGAMGGAMG6cusplink SmAPCG14012010080Speedup Ts/Tp 000025000003000000350000040000004500000nCells22
Preliminary ResultsSolution SpeedupSpeedup Comparison605040SpeedupSpeedup Ts/Tp PCG-parallel6-s161cusplink CGcusplink DPCGGAMG6GAMGcusplink 00nCells23
Preliminary Results24
Important Considerations25
Next Steps26
Take Home Messages The GPU only solves the Ax b system We have double precision GPUs have been integrated into OpenFOAM using Thrust and CUSP As cusp and thrust improve, nothing needs to be changed in this code,only to update cusp and thrust. They have been shown to be faster in the cases provided, because it ismostly solving Ax b. Residuals are calculated the same as in OpenFOAM Multi-GPU still needs attention. The results show that memory bandwidth still is an issue with thisparticular setup and results could be faster with other setup.
AcknowledgementsFunding and SupportNvidia Professor Partnership ProgramChemical Reaction Engineering Laboratory (CREL) MRE Fund(http://crelonweb.eec.wustl.edu/)OpenFOAM Developers CommunityAdvisorsDr. RamachandranDr. Dudukovic28
Sources1.2.3.Nvidia CUDA Programming Guide, Version 4.0, 2011. NvidiaCorporation.Nathan Bell and Michael Garland, Cusp: Generic ParallelAlgorithms for Sparse Matrix and Graph Computations, 2010,http://cusp-library.googlecode.com,Version 0.1.0Jared Hoberock and Nathan Bell, Thrust: A Parallel TemplateLibrary, 2010, http://www.meganewtons.com/,Version 1.3.029
Thanks for your attention!Questions?Contact Info:Dan Combestdcombest@seas.wustl.edu30
Preliminary ResultsSolution SpeedupSpeedup140cusplink CGcusplink DPCGcusplink rallel6-s161GAMGGAMG6120100Speedup80Speedup Ts/Tp 0000nCells3000000350000040000004500000
Residual ScalingFor matrix A x b,residual is defined asres b - AxWe then apply residual scaling with the following normalisation factor procedure:Type xRef gAverage(x);wA A x;pA A xRef;NormFactor gSum(cmptMag(wA - pA) cmptMag(source - pA)) matrix.small ;and the scaled residual is:residual gSum(cmptMag(source - wA))/normFactor;I will save you from complications with vectors and tensors in my block solver. :-)Enjoy,HrvSource: 57903-residuals-convergence-segregatedsolvers.html
Implementing Fast Parallel Linear System Solvers In OpenFOAM based on CUDA Daniel P. Combest and Dr. P.A. Ramachandran and Dr. M.P. Dudukovic Optimization, HPC, and Pre- and Post-Processing I Session. 6th OpenFOAM Workshop Penn State University. June 15th 2011 Chemical Reaction Engineering Laboratory (CREL)
SKF Linear Motion linear rail INA Linear SKF Linear Motion Star Linear Thomson Linear linear sysTems Bishop-Wisecarver INA Linear Pacific Bearing Thomson Linear mecHanical acTuaTors Duff-Norton Joyce/Dayton Nook Industries Precision mecHanical comPonenTs PIC Design Stock Drive Product
mx b a linear function. Definition of Linear Function A linear function f is any function of the form y f(x) mx b where m and b are constants. Example 2 Linear Functions Which of the following functions are linear? a. y 0.5x 12 b. 5y 2x 10 c. y 1/x 2 d. y x2 Solution: a. This is a linear function. The slope is m 0.5 and .
The MikroTik Fast Path and Conntrack's work together gave the name Fast Track. Fast Track Fast Path extentions Only Ipv4 TCP/UDP (Total Traffic %99) FastTrack management is left to network admin FastTrack can be used on devices with Fast Path support. After the first packet of the connection passing through the router is marked as Fast Track .
linear matrix inequality (LMI), 77, 128, 144 linear quadratic Gaussian estimation (LQG), 244 linear quadratic regulation (LQR), 99-102, 211-215, 223-230 linear time-invariant (LTI) system, 6 linear time-varying (LTV) system, 6 L8 norm, 260 LMI, see linear matrix inequality local linearization, 11-14, 88 around equilibrium point in continu-
will be useful in designing linear induction motor. Key Words : linear induction motor, 3D FEA, analyt-ical method, Maxwells equation, eddy current analysis 1 Introduction Linear electric machines can generate a linear driving force, and there are advantages to using a linear driving system. That is, in the case of a linear electric machine in .
Series-Parallel Circuits If we combined a series circuit with a parallel circuit we produce a Series-Parallel circuit. R1 and R2 are in parallel and R3 is in series with R1 ǁ R2. The double lines between R1 and R2 is a symbol for parallel. We need to calculate R1 ǁ R2 first before adding R3.
The Series-Parallel Network In this circuit: R 3 and R 4 are in parallel Combination is in series with R 2 Entire combination is in parallel with R 1 Another example: C-C Tsai 4 Analysis of Series-Parallel Circuits Rules for analyzing series and parallel circuits apply: Same current occurs through all series elements
Spring Lake Elementary Schools Curriculum Map 2nd Grade Reading The following CCSS’s are embedded throughout the year, and are present in units applicable: CCSS.ELA-Literacy.SL.2.1 Participate in collaborative conversations with diverse partners about grade 2 topics and texts with peers and adults in small and larger groups. CCSS.ELA-Literacy.SL.2.2 Recount or describe key ideas or .