Numerical Solutions Of Differential Equations On Fpga-enhanced Computers

1y ago
10 Views
2 Downloads
2.12 MB
162 Pages
Last View : 3m ago
Last Download : 3m ago
Upload by : Abram Andresen
Transcription

NUMERICAL SOLUTIONS OF DIFFERENTIAL EQUATIONS ONFPGA-ENHANCED COMPUTERSA DissertationbyCHUAN HESubmitted to the Office of Graduate Studies ofTexas A&M Universityin partial fulfillment of the requirements for the degree ofDOCTOR OF PHILOSOPHYMay 2007Major Subject: Electrical Engineering

NUMERICAL SOLUTIONS OF DIFFERENTIAL EQUATIONS ONFPGA-ENHANCED COMPUTERSA DissertationbyCHUAN HESubmitted to the Office of Graduate Studies ofTexas A&M Universityin partial fulfillment of the requirements for the degree ofDOCTOR OF PHILOSOPHYApproved by:Co-Chairs of Committee,Committee Members,Head of Department,Mi LuWei ZhaoGuan QinGwan ChoiJim JiCostas N. GeorghiadesMay 2007Major Subject: Electrical Engineering

iiiABSTRACTNumerical Solutions of Differential Equations onFPGA-Enhanced Computers. (May 2007)Chuan He, B.S., Shandong University;M.S., Beijing University of Aeronautics and AstronauticsCo-Chairs of Advisory Committee: Dr. Mi LuDr. Wei ZhaoConventionally, to speed up scientific or engineering (S&E) computation programson general-purpose computers, one may elect to use faster CPUs, more memory, systemswith more efficient (though complicated) architecture, better software compilers, or evencoding with assembly languages. With the emergence of Field Programmable GateArray (FPGA) based Reconfigurable Computing (RC) technology, numerical scientistsand engineers now have another option using FPGA devices as core components toaddress their computational problems. The hardware-programmable, low-cost, butpowerful “FPGA-enhanced computer” has now become an attractive approach for manyS&E applications.A new computer architecture model for FPGA-enhanced computer systems and itsdetailed hardware implementation are proposed for accelerating the solutions ofcomputationally demanding and data intensive numerical PDE problems. New FPGAoptimized algorithms/methods for rapid executions of representative numerical methodssuch as Finite Difference Methods (FDM) and Finite Element Methods (FEM) aredesigned, analyzed, and implemented on it. Linear wave equations based on seismicdata processing applications are adopted as the targeting PDE problems to demonstratethe effectiveness of this new computer model. Their sustained computationalperformances are compared with pure software programs operating on commodity CPUbased general-purpose computers. Quantitative analysis is performed from a hierarchicalset of aspects as customized/extraordinary computer arithmetic or function units,

ivcompact but flexible system architecture and memory hierarchy, and hardwareoptimized numerical algorithms or methods that may be inappropriate for propertyofin-systemhardwarereconfigurability of the new system is emphasized aiming at effectively accelerating theexecution of complex multi-stage numerical applications. Methodologies foraccelerating the targeting PDE problems as well as other numerical PDE problems, suchas heat equations and Laplace equations utilizing programmable hardware resources areconcluded, which imply the broad usage of the proposed FPGA-enhanced computers.

vDEDICATIONTo my wonderful and loving wife

viTABLE OF CONTENTSPageABSTRACT . iiiDEDICATION .vTABLE OF CONTENTS .viLIST OF TABLES . viiiLIST OF FIGURES.ix1INTRODUCTION.12BACKGROUND AND RELATED WORK.42.12.2Application Background: Seismic Data Processing.4Numerical Solutions of PDEs on High-Performance Computing (HPC)Facilities .62.3Application-Specific Computer Systems .72.4FPGA and Existing FPGA-Based Computers.92.4.1 FPGA and FPGA-Based Reconfigurable Computing .92.4.2 Hardware Architecture of Existing FPGA-Based Computers.102.4.3 Floating-Point Arithmetic on FPGAs.132.4.4 Numerical Algorithms/Methods on FPGAs .143HARDWARE ARCHITECTURE OF FPGA-ENHANCED COMPUTERSFOR NUMERICAL PDE PROBLEMS.163.13.23.34SPACE System for Seismic Data Processing Applications .17Universal Architecture of FPGA-Enhanced Computers .20Architecture of FPGA-Enhanced Computer Cluster.23PSTM ALGORITHM ON FPGA-ENHANCED COMPUTERS .284.1PSTM Algorithm and Its Implementation on PC Clusters.284.2The Design of Double-Square-Root (DSR) Arithmetic Unit.324.2.1 Hybrid DSR Arithmetic Unit .324.2.2 Fixed-point DSR Arithmetic Unit .364.2.3 Optimized 6th-Order DSR Travel-Time Solver.384.3PSTM Algorithm on FPGA-Enhanced Computers .404.4Performance Comparisons .435FDM ON FPGA-ENHANCED COMPUTER PLATFORM.485.1The Standard Second Order and High Order FDMs .505.1.1 2nd-Order FD Schemes in Second Derivative Form .505.1.2 High Order Spatial FD Approximations .54

viiPage5.1.3 High Order Time Integration Scheme .595.2High Order FD Schemes on FPGA-Enhanced Computers .615.2.1 Previous Work and Their Common Pitfalls .615.2.2 Implementation of Fully-Pipelined Laplace Computing Engine .635.2.3 Sliding Window Data Buffering System.645.2.4 Data Buffering for High Order Time Integration Schemes.735.2.5 Data Buffering for 3D Wave Modeling Problems .745.2.6 Extension to Elastic Wave Modeling Problems .765.2.7 Damping Boundary Conditions.785.3Numerical Simulation Results.805.3.1 Wave Propagation Test in Constant Media.815.3.2 Acoustic Modeling of Marmousi Mode .845.4Optimized FD Schemes with Finite Accurate Coefficients .885.5Accumulation of Floating-Point Operands .945.6Bring Them Together: Efficient Implementation of the Optimized FDComputing Engine.986FEM ON FPGA-ENHANCED COMPUTER PLATFORM .1036.1Floating-Point Summation and Vector Dot-Product on FPGAs .1066.1.1 Floating-Point Summation Problem and Related works .1066.1.2 Numerical Error Bounds of the Sequential Accumulation Method .1096.1.3 Group-Alignment Based Floating-Point Summation Algorithm .1116.1.4 Formal Error Analysis and Numerical Experiments .1136.1.5 Implementation of Group-Alignment Based Summation on FPGAs.1166.1.6 Accurate Vector Dot-Product on FPGAs .1226.2Matrix-Vector Multiply on FPGAs .1246.3Dense Matrix-Matrix Multiply on FPGAs .1317CONCLUSIONS .1387.17.2Summary of Research Work .138Methodologies for Accelerating Numerical PDE Problems on FPGAEnhanced Computers.141REFERENCES.145VITA .152

viiiLIST OF TABLESTABLE1PageFPGA-Based Reconfigurable Supercomputers .112 Error Property of the Hybrid CORDIC Unit with Different Guarding Bits.353Rounding Error of the Conversion Stage with Different Fraction WordWidth .384Errors of the Fixed-Point CORDIC Unit with Different Word-Width andGuarding Bits .385Performance Comparison of PSTM on FPGA and PC .466Performance Comparison for Different HD Schemes .597Performance Comparison for High-Order Time-Integration Schemes .618Comparison of FP Operations and Operands for Different FD Schemes .669Comparison of Caching Performance for Different FD Schemes.7210 Size of Wave Modeling Test Problems.8111 Performance Comparison for FD Schemes on FPGA and PC .8312 Coefficients of 3 FD Schemes with 9-Point Stencils .9213 Errors for the New Summation Algorithm.11514 Comparison of Single-Precision Accumulators .120

ixLIST OF FIGURESFIGUREPage1Demonstration of Seismic Reflection Survey .42Coupling FPGAs with Commodity CPUs.123The SPACE Acceleration Card . 184Architecture of FPGA-Enhanced Computer . 215FPGA-Enhanced PC Cluster . 2262D Torus Interconnection Network on Existent PC Cluster . 257The Relationship Between the Source, Receiver, and Scatter Points . 298Hardware Structure of the Hybrid DSR Travel-Time Solver . 339 Output Format of the Conversion Stage. 3710 Hardware Structure of the Fixed-Point DSR Travel-Time Solver . 3711 Hardware Structure of the PSTM Computing Engine . 4312 A Vertical In-Line Unmigrated Section . 4413 The Vertical In-Line Migrated Section . 4414 (2, 2) FD Stencil for the 2D Acoustic Equation . 5215 Second-Order FD Stencil for the 3D Laplace Operator .5316 (2, 4) FD Stencil for the 2D Acoustic Equation .5617 4th-Order FD Stencil for the 3D Laplace Operator .5618 Dispersion Relations of the 1D Acoustic Wave Equation and Its FDApproximations .5719 Dispersion Errors of Different FD Schemes .5920 Stencils for (2-4) and (4-4) FD Schemes .6021 2D 4th-Order Laplacian Computing Engine .6422 Stripped 2D Operands Entering the Computing Engine via Three Ports.6723 Stripped 2D Operands Entering the Computing Engine via Two Ports.6824 Block diagram of the buffering system for 2D (2, 2) FD Scheme .6925 Sliding Window for 2D (2, 4) FD Scheme.7026 Function Blocks of the 2D (2, 4) FD Scheme .71

xFIGUREPage27 Block Diagram and Dataflow for 2D (4, 4) FD Scheme .7428 Function Blocks of the Hybrid 3D (2, 4-4-2) FD Schemes .7529 Marmousi Model Snapshots (t 0.6s, 1.2s, 1.8s, and 2.4s. Shot at x 5km) .8530 Numerical Dispersion Errors for the Maximum 8th-Order FD Schemeswith 23, 16, or 8 Mantissa Bits.8931 Structure of Constant Multiplier .9232 Comparisons of Dispersion Relations for Different FD Approximations .9333 Dispersion Errors for Different FD Approximations .9434 Binary Tree Based Reduction Circuit for Accumulation .9535 Structure of Group-Alignment Based Floating-Point Accumulator.9736 Structure of 1D 8th-Order Laplace Operator .9937 Structure of 1D 8th-Order Finite-Accurate Optimized FD Scheme.10038 Conventional Hardwired Floating-Point Accumulators (a) Accumulatorwith Standard Floating-Point Adder and Output Register; (b) BinaryTree Based Reduction Circuit .10739 Structure of Group-Alignment Based Floating-Point Summation Unit.11840 Implementation for Matrix-Vector Multiply in Row Order .12741 Matrix-Vector Multiply in Column Order .12842 Implementation for Matrix-Vector Multiply in Column Order.13043 Blocked Matrix-Matrix Multiply .13444 Blocked Matrix-Matrix Multiply Scheme.136

11. INTRODUCTIONNumerical Evaluation of scientific or engineering problems governed by PartialDifferential Equations (PDEs) numerically is in general computationally-demanding anddata intensive. In typical numerical methods such as Finite Difference Methods (FDM),Finite Element Methods (FEM), or Finite Volume Methods (FVM), etc., PDEs arediscretized in space to bring them into finite-dimensional subspace and solved bystandard linear algebra subroutines. Spatial discretizations for realistic Scientific andEngineering (S&E) problems could easily result in millions, even billions, of discretegrid points, which correspond to large linear system equations with the same number ofunknowns. If the problem was time-dependent, in order to simulate the transientbehavior of the problem, we may need to solve the linear system equations for hundredsto thousands of discrete time steps. Furthermore, if the problem was nonlinear, we haveto resort to iterative methods to guarantee the convergence of the numerical solutions,which means solving multiple linear system equations in each time evolution step.Typical applications of numerical PDE problems include but are not limited toComputational Fluid Dynamics (CFD), computational physics, computational chemistry,weather forecast/climate modeling, seismic data processing/reservoir simulation, etc.The last two decades have seen rapid improvements in performance andcomplexity of digital computers. Without any doubt, the most convenient computingresources for solving numerical PDE problems are commodity computers. With thecontinuing renovation of Very Large-Scale Integration (VLSI) semiconductormanufacturing technology, modern commodity CPUs now consist of hundreds ofmillions of transistors and work at internal clock rates up to several GHz. Their lowprice and flexible program-controlled execution mode attain commodity CPU-basedgeneral-purpose computers feasible for almost all kinds of applications. However,because a large portion of silicon area inside CPUs is committed to sophisticated controlThis dissertation follows the style of Microprocessors and Microsystems.

2logics, the transistor utilization and energy efficiency of such devices are generally poor.Moreover, although the nominal speed of commodity CPUs are skyrocketing, in reality,only a small fraction of their peak performance could be delivered for ourcomputationally-intensive and data-demanding problems. Solving such problems mayeasily take people hours, days, weeks, even months. Some large-scale problems continueto be unsolvable in an acceptable period of time even on today’s fastest supercomputerplatforms.An alternative is to build special-purpose computer systems for specific problemsat hand using Application-Specific Integrated Circuits (ASIC) as the core components.Compared with commodity CPU-based general-purpose computers, a majority of in-chiptransistors could be devoted to useful computations/operations so that such applicationspecific systems could achieve much higher computational performance (100X 1000X)with better transistor utilization as well as energy efficiency. However, this approachpresents problems such as lack of flexibility, long developing period, and its high NonRecurrent Engineering (NRE) costs. If the high cost could not be amortized with massproduct, this approach would be expensive.The emergence of Field Programmable Gate Array (FPGA) devices gives peopleanother option to construct a new class of FPGA-based computers. FPGA is one type of“off-the-shelf” digital logic devices, the same as commodity CPUs. Inside an FPGA chip,there are numerous regularly-distributed island-like programmable hardware resourcessuch as logic slices, SRAM blocks, hardwired multiplier blocks, or even processorblocks. Design engineers can configure/program these hardware resources at runtime toperform a variety of basic digital logics such as AND, OR, NOT, FLIP-FLOP etc.Multiple similar or different programmable slices can cooperate to implement complexarithmetic or functionalities with the help of surrounding programmable interconnectionpaths. This so-called In-System-Programmability (ISP) consumes a considerable siliconarea and causes FPGA-based hardware implementation working at a much slower speedthan ASIC chips. However, FPGA devices have the potential to accommodate tens, evenhundreds, of similar or different arithmetic/function units so that the aggregate

3computational performance may still be much higher than what is provided by acommodity CPU. Furthermore, users can utilize the same FPGA-based system toaccelerate different problems with the help of the delightful ISP property. From thispoint of view, FPGA-based computer works as “middleware” between the purehardware-based approaches (ASICs) and the pure software-based approach (commodityCPUs). It has the potential to provide users with high computational power and whilemaintaining acceptable flexibility.This thesis will explore the feasibility of utilizing FPGA resources to acceleratecomputationally-demanding and data intensive numerical solutions of PDE problems.The research work can be divided into three main parts: the first part proposes a newhardware architecture model as “FPGA-enhanced Reconfigurable Computers”, whichtakes distinguished features of numerical PDE problems into account so that significantperformance improvement could be expected. It begins with an introduction of themotivation for FPGA-enhanced computers, related work, and other backgroundinformation. Then it discusses the system architecture of this new computer model andits detailed implementation as a single workstation as well as a parallel cluster system.The second and the third parts of this thesis discuss conceivable methods toaccelerate FDM and FEM on the proposed FPGA-enhanced computer platform. Here, Iselect linear wave equations based on seismic data processing applications as thetargeting PDE problems. A hierarchical set of aspects as customized/extraordinarycomputer arithmetic or function units, compact but efficient system structure andmemory hierarchy, and FPGA-optimized software algorithms/numerical methods areproposed and analyzed together with detailed implementations. A great variety ofexperiments comparing sustained computational performance of these numericalmethods running on FPGA-enhanced computers with commodity CPU-based generalpurpose computers are carried on to show the superiority of this new computer system.All results can also be applied to accelerate the solutions of other numerical PDEproblems such as heat equations, Laplace equations, thereby implying the broad use ofthe proposed FPGA-enhanced computers.

42. BACKGROUND AND RELATED WORK2.1 Application Background: Seismic Data ProcessingSeismic reflection survey is the most widely used geophysical explorationtechnique in the petroleum industry and plays a key role in locating underground oil andgas reservoirs for more than sixty years. The basic equipment for the field survey is asource producing impulsive seismic waves, an array of geophones receivingunderground echoes, and a multi-channel wave signal displaying and recording system.This method is quite simple in concept: The Earth is simply modeled as stratifiedmedium with material properties such as velocity, density, anisotropy, etc. Impulsiveseismic waves are excited on the ground and propagate downward into the Earth. Whenthey encounter interfaces of rock layers, the waves will be reflected back and recordedby an array of geophones deployed on the ground. The elapsed time and amplitudes ofreflections could be used to determine underground rock layers’ depths and attitudes.The main purpose of seismic data processing is to reduce noises embedded in thereflected seismic signals and convert them into interpretable images of subsurfacestructures [1].Figure 1. Demonstration of Seismic Reflection Survey

5Two main procedures dominate seismic data processing flow: seismic modelingand seismic migration. The mathematical model of energy propagating inside the Earthis acoustic wave equations (2.1) or elastic wave equations (2.2), which can be classifiedinto hyperbolic PDEs. 2 P ( x, y , z , t )1 ρ ( x , y , z )ν 2 ( x , y , z ) P ( x, y , z , t ) F2 t ρ ( x, y , z )(2.1)ρ t v i i σ ii j σ ij k σ ik(2.2a) t σ ii λ ( i v i j v j j v j ) 2 µ i v i(2.2b) t σ ij µ ( i v j j v i )(2.2c)Seismic modeling is a forwarding problem that simulates the scattering fieldarising when an impulsive source excites an underground region with known physicalproperties. Seismic migration is an inverse problem that estimates these physicalproperties using measured data as initial or boundary conditions. In conventional seismicdata processing flow, modeling and migration are two intimately related procedures.They perform iteratively with one’s output acting as input to the other: Process startfrom an initial estimate of underground parameters. Migration methods are then used tocollapse diffractions produced by underground scattering points or faults and movedipping reflectors to their physical subsurface locations. After abstracting parameters byanalyzing the migrated underground image, modeling procedures are employed toproduce a so-called ground “seismogram”. By comparing the ground seismogram withthe measured dataset, it is possible to indicate where the previous estimations ofunderground parameters are inaccurate and revise the estimations correspondingly. Theprocess will repeat until the difference between two consecutive iterations is sufficientlysmall. Mathematically, seismic migration is not a well-posed problem, i.e., boundaryconditions provided by the measured dataset are not enough to produce a unique solution.So in general, three to five of such iterations are necessary to obtain an acceptablemigrated underground image [2].Seismic modeling/migration is now the central and culminating step of seismicdata processing, and may easily devour 70 to 90 percent of CPU time of the entire

6process flow. They are all time-consuming procedures: Even for the simplest acousticcases, the workload for solving large-scale 3D wave propagation problems could easilyexceed the capability of most contemporary computer systems. Although thecomputational power of commodity computers keeps doubling every 18 monthsfollowing Moore’s Law, by utilizing more and more innovative migration methods, thetotal elapsed time for processing a 3D middle-scale region is always kept in one week toone month for almost three decades. Old migration methods such as phase-shift orKirchhoff migration are computationally efficient and affordable for most customers.However, their imaging resolution is not good enough to depict clearly complexunderground structures with lateral velocity variation or steep dips. New methods suchas frequency-space migration or Reverse Time Migration (RTM) that directly solveacoustic/elastic wave equations would provide infinitely improved accuracy. However,their intensive computational workloads for 3D full-volume imaging are hard toundertake, even for institutes that able to afford the high costs of operating andmaintaining supercomputers or large PC-cluster systems. In reality, finishing adesignated seismic processing task in a reasonable time period is still much moreimportant than obtaining a better result using improved modeling/migration methods [3].2.2 Numerical Solutions of PDEs on High-Performance Computing (HPC) FacilitiesIn the past decade, numerical computing efforts have grown rapidly withperformance improvements of general-purpose computers and parallel computingenvironments. Although the nominal frequency of commodity CPUs is skyrocketing, thereal utilization of the peak performance for our targeting problems are in general verypoor. The main reason for this inefficiency is that a large portion of transistors in today’scommodity CPUs are utilized for control logics or to provide flexible data flow, whichattains the prevalent Von Neumann computer architecture model not well-suited formost numerical computing applications. Consequently, many scientific/engineeringproblems are extremely time-consuming or even unsolvable on contemporary general-

7purpose computers, especially when large 2D or 3D geometrical regions are designatedas computational domain. It seems that there will always be an urgent demand for morepowerful and faster computer systems.Sustained Floating-Point (FP) performance, which is often represented in terms ofMegaflops or Gigaflops, is a key factor in measuring the computational performance of acomputer system. Numerical computing applications are generally data intensive as wellas computationally demanding. Correspondingly, numerical algorithms/methods alwaysexhibit low FP-operation to memory-access ratio, require considerable memory space forintermediate results, and tend to perform irregular indirect addressing. These intrinsicproperties inevitably result in poor caching behavior on general-purpose computers dueto their complex system architecture and memory hierarchy. A huge gap always existsbetween the theoretical peak FP performance of a commodity CPU and the realisticrunning speed of a program. Pure software methods, from high-level parallelism on PCCluster system [4] to low-level memory and disk optimization [5] or even instructionreordering [6] are exhausted to accelerate the execution of numerical applications.However, even with careful hand optimization, only a few numerical subroutines such asdense matrix multiply or Fast Fourier Transformation (FFT) can achieve 80 90 percentof a CPU’s peak performance; For most others, 20 30 percent is very common; withsome CPU utilization even reaching as low as 10 percent or even less than 5 percent inrealistic applications [7]. Consequently, many large numerical simulation tasks cannotbe executed routinely except in institutes that can afford high costs of operating andmaintaining supercomputers or large PC-cluster systems.2.3 Application-Specific Computer SystemsApplication-specific computers are computer systems customized for particular uses,rather than as general-purpose computers. Application-specific computing is an activeR&D area, both in academia and in industry. Traditionally, it aims at building for eachalgorithm/application a specific hardware device – the Application-Specific Integrated

8Circuit (ASIC) chip, to achieve better implementation in terms of hardware resources,performance, cost, and energy requirements. While general-purpose c

Array (FPGA) based Reconfigurable Computing (RC) technology, numerical scientists and engineers now have another option using FPGA devices as core components to address their computational problems. The hardware-programmable, low-cost, but powerful "FPGA-enhanced computer" has now become an attractive approach for many S&E applications.

Related Documents:

Introduction to Advanced Numerical Differential Equation Solving in Mathematica Overview The Mathematica function NDSolve is a general numerical differential equation solver. It can handle a wide range of ordinary differential equations (ODEs) as well as some partial differential equations (PDEs). In a system of ordinary differential equations there can be any number of

(iii) introductory differential equations. Familiarity with the following topics is especially desirable: From basic differential equations: separable differential equations and separa-tion of variables; and solving linear, constant-coefficient differential equations using characteristic equations.

Differential equations are among the most important mathematical tools used in pro-ducing models in the physical sciences, biological sciences, and engineering. In this text, we consider numerical methods for solving ordinary differential equations, that is, those differential equations that have only one independent variable.

1.2 First Order Equations 5 1.3 Direction Fields for First Order Equations 14 Chapter 2 First Order Equations 2.1 Linear First Order Equations 27 2.2 Separable Equations 39 2.3 Existence and Uniqueness of Solutions of Nonlinear Equations 48 2.5 Exact Equations 55 2.6 Integrating Factors 63 Chapter 3 Numerical Methods 3.1 Euler’s Method 74

Andhra Pradesh State Council of Higher Education w.e.f. 2015-16 (Revised in April, 2016) B.A./B.Sc. FIRST YEAR MATHEMATICS SYLLABUS SEMESTER –I, PAPER - 1 DIFFERENTIAL EQUATIONS 60 Hrs UNIT – I (12 Hours), Differential Equations of first order and first degree : Linear Differential Equations; Differential Equations Reducible to Linear Form; Exact Differential Equations; Integrating Factors .

The main objective of the thesis is to develop the numerical solution of partial difierential equations, partial integro-difierential equations with a weakly singular kernel, time-fractional partial difierential equations and time-fractional integro partial difierential equations. The numerical solutions of these PDEs have been obtained .

3 Ordinary Differential Equations K. Webb MAE 4020/5020 Differential equations can be categorized as either ordinary or partialdifferential equations Ordinarydifferential equations (ODE's) - functions of a single independent variable Partial differential equations (PDE's) - functions of two or more independent variables

Introduction to Differential Equation Solving with DSolve The Mathematica function DSolve finds symbolic solutions to differential equations. (The Mathe- matica function NDSolve, on the other hand, is a general numerical differential equation solver.) DSolve can handle the following types of equations: † Ordinary Differential Equations