Analysis Of Monte Carlo Accelerated Iterative Methods For .

2y ago
19 Views
3 Downloads
1.39 MB
18 Pages
Last View : 22d ago
Last Download : 3m ago
Upload by : Callan Shouse
Transcription

Received: 4 February 2016Revised: 26 November 2016Accepted: 9 December 2016DOI: 10.1002/nla.2088RESEARCH ARTICLEAnalysis of Monte Carlo accelerated iterative methods for sparselinear systemsMichele Benzi1 Thomas M. Evans2Stuart R. Slattery2Steven P. Hamilton2Massimiliano Lupo Pasini11 Departmentof Mathematics and ComputerScience, Emory University, Atlanta, 30322, GA,USA2 Oak Ridge National Laboratory, 1 Bethel ValleyRd., Oak Ridge, 37831, TN, USACorrespondenceMichele Benzi, Department of Mathematics andComputer Science, Emory University, Atlanta, GA30322, USA.Email: benzi@mathcs.emory.eduFunding InformationUT-Battelle, LLC, Grant/Award Number:DE-AC05-00OR22725 ; Department of Energy(Office of Science), Grant/Award Number:ERKJ247SummaryWe consider hybrid deterministic-stochastic iterative algorithms for the solution oflarge, sparse linear systems. Starting from a convergent splitting of the coefficientmatrix, we analyze various types of Monte Carlo acceleration schemes applied tothe original preconditioned Richardson (stationary) iteration. These methods areexpected to have considerable potential for resiliency to faults when implemented onmassively parallel machines. We establish sufficient conditions for the convergenceof the hybrid schemes, and we investigate different types of preconditioners including sparse approximate inverses. Numerical experiments on linear systems arisingfrom the discretization of partial differential equations are presented.KEYWORDSiterative methods, Monte Carlo methods, preconditioning, resilience, Richardsoniteration, sparse approximate inverses, sparse linear systems1INTRODUCTIONThe next generation of computational science applicationswill require numerical solvers that are both reliable and capable of high performance on projected exascale platforms. Inorder to meet these goals, solvers must be resilient to softand hard system failures, provide high concurrency on heterogeneous hardware configurations, and retain numericalaccuracy and efficiency. In this paper, we focus on the solution of large sparse systems of linear equations, for example,of the kind arising from the discretization of partial differential equations (PDEs). A possible approach is to try to adaptexisting solvers (such as preconditioned Krylov subspace ormultigrid methods) to the new computational environments,and indeed, several efforts are under way in this direction; see,for example, the previous studies1–5 and references therein.An alternative approach is to investigate new algorithmsthat can address issues of resiliency, particularly fault tolerance and hard processor failures, naturally. An exampleis provided by the recently proposed Monte Carlo syntheticacceleration (MCSA) methods , see the previous studies.6,7In these methods, an underlying (deterministic) stationaryNumer Linear Algebra Appl. ardson iterative method is combined with a stochastic,Monte Carlo (MC)-based “acceleration” scheme. Ideally, theaccelerated scheme will converge to the solution of the linearsystem in far fewer (outer) iterations than the basic schemewithout MC acceleration, with the added advantage that mostof the computational effort is now relegated to the MC portionof the algorithm, which is highly parallel and offers a morestraightforward path to resiliency than standard, deterministicsolvers. In addition, a careful combination of the Richardson and MC parts of the algorithm allows to circumvent thewell-known problem of slow MC error reduction; see onestudy.6Numerical evidence presented in one study6 suggests thatMCSA can be competitive, for certain classes of problems,with established deterministic solvers such as preconditionedconjugate gradients and Generalized Minimum Residual(GMRES). So far, however, no theoretical analysis of the convergence properties of these solvers has been carried out. Inparticular, it is not clear a priori whether the method, appliedto a particular linear system, will converge. Indeed, the convergence of the underlying preconditioned Richardson iteration is not sufficient, in general, to guarantee the ight 2017 John Wiley & Sons, Ltd.1 of 18

BENZI ET AL.2 of 18of the MCSA-accelerated iteration. In other words, it is quitepossible that the stochastic acceleration part of the algorithmmay actually cause the hybrid method to diverge or stagnate.In this paper, we address this fundamental issue, discussingboth necessary and sufficient conditions for convergence. Wealso discuss the choice of splitting, or preconditioner, andillustrate our findings by means of numerical experiments. Wedo not specifically consider in this paper the resiliency issue,which will be addressed elsewhere.The paper is organized as follows. In Section 2, we provide an overview of existing MC linear solver algorithms.In Section 3, we will discuss the convergence behavior ofstochastic solvers, including a discussion of classes of matrices for which convergence can be guaranteed. Section 4,provides some numerical results illustrating properties of thevarious approaches, and in Section 5, we give our conclusions.where A Rn n and x, b Rn . Equation 1 can be recast as afixed point problem:x Hx f,where H I A and f b. Assuming that the spectral radius𝜌(H) 1, the solution to Equation 2 can be written in termsof a power series in H (Neumann series):x Ax b, nn ···𝓁 1 k1 1 k2 1(1)H 𝓁 f.Denoting the kth partial sum by x(k) , the sequence of approximate solutions {x(k) } converges to the exact solutionk 0regardless of the initial guess x0 .By restricting the attention to a single component of x, weobtainSTOCHASTIC LINEAR SOLVERSLinear solvers based on stochastic techniques have a longhistory, going back to the famed 1928 paper by Courant,Friedrichs, and Lewy on finite difference schemes forPDEs.8 Many authors have considered linear solvers basedon MC techniques, with important early contributions byCurtiss9 and by Forsythe and Leibler.10 More recent worksinclude11,12 , and,13 among others. Until recently, these methods have had mixed success at best, due to their generally inferior performance when compared to state-of-the-artdeterministic solvers like multigrid or preconditioned Krylovmethods. Current interest in resilient solvers, where someperformance may be traded off for increased robustness inthe presence of faults, has prompted a fresh look at methodsincorporating MC ideas.6,7,14As mentioned in the work of Dimov and Alexandrov,13MC methods may be divided into two broad classes: directmethods, such as those described in the works of Dimov andAlexandrov,13,15 and iterative methods, which refer to techniques such as those presented in Halton12,16 ; see also theprevious studies.6,14 The first type consists of purely stochastic schemes; therefore, the resulting error with respect to theexact solution is made of just a stochastic component. Incontrast, the iterative MC methods utilize more traditionaliterative algorithms alongside the stochastic approach, generating two types of error: a stochastic one and a systematicone. In practice, it may be difficult to separate the two components; nevertheless, awareness of this intrinsic structure isuseful, as it allows algorithm designers some flexibility in thechoice of what part of the algorithm to target for refinement(e.g., trading off convergence speed for resilience by balancing the number of “deterministic" outer iterations against thenumber of random walks to be used within each iteration).Consider a system of linear equations of the form 𝓁 0xi fi 2(2)n Hi,k1 Hk1 ,k2 · · · Hk𝓁 1 ,k𝓁 fk𝓁 .(3)k𝓁 1The last equation can be reinterpreted as the realization of anestimator defined on a random walk. Let us start consideringa random walk whose state space S is labeled by the set ofindices of the forcing term f:S {1, 2, , n} N.Each ith step of the random walk has a random variable kiassociated with it. The realization of ki represents the indexof the component of f, which is visited in the current stepof the random walk. The construction of random walks isaccomplished considering the directed graph associated withthe matrix H. The nodes of this graph are labeled 1 through n,and there is a directed edge from node i to node j if and onlyif Hi,j 0. Starting from a given node, the random walk consists of a sequence of nodes obtained by jumping from onenode to the next along directed edges, choosing the next nodeat random according to a transition probability distributionmatrix constructed from H or from HT , see below. Note thatit may happen that a row of H (or of HT ) is all zero; this happens when there are no out-going (respectively, in-coming)edges from (respectively, to) the corresponding node. In thiscase, that particular random walk is terminated and anothernode is selected as the starting point for the next walk. Thetransition probabilities and the selection of the initial state ofthe random walk can be accomplished according to differentmodalities, leading to two distinct methods: the forward andadjoint methods. These methods are described next.2.1Forward methodGiven a functional J, the goal is to compute its action on a vector x by constructing a statistical estimator whose expectedvalue equals J(x). Each statistical sampling is represented bya random walk, and it contributes to build an estimate of theexpected value. Towards this goal, it is necessary to introduce an initial probability distribution and a transition matrix

BENZI ET AL.3 of 18so that random walks are well defined. Recalling Riesz’srepresentation theorem, one can writeJ(x) ⟨h, x⟩ n where 𝜈 ranges over all possible realizations. P𝜈 is the probability associated with a specific realization of the randomwalk. It can be proved (see the works of Halton12,16 ) that⟨⟩E[W𝓁 fk𝓁 ] h, H 𝓁 f ,hi xi ,i 1where h Rn is the Riesz representative in Rn of the functional J. Such a representative can be used to build the initialprobability p̃ S [0, 1] of the random walk asIt is important to highlight that the role of vector h is confinedto the construction of the initial probability, and that h is notused afterwards in the stochastic process. A possible choicefor the transition probability matrix P can be Hij p(k𝓁 j k𝓁 1 i) Pij nk 1and[𝜃 E 𝓁 0 hi p̃ (k0 i) p̃ k0 n.i 1 hi Hik 𝓁 0, 1, 2, ]W𝓁 fk𝓁 ⟨h, x⟩ .A possible choice for h is a vector of the standard basis, h ei .This would correspond to setting the related initial probabilityto a Kronecker delta:p̃ (k0 j) 𝛿ij .By doing so, we have k0 i and,𝜃 xi fi nn where···l 1 k1 1 k2 1p(·, i) S [0, 1],The probability distribution of the random variables wij is represented by the transition matrix that governs the stochasticprocess. The wij quantities just introduced can be used to buildone more sequence of random variables. At first, we introducequantities W𝓁W0 hk0p̃ k0,W𝓁 W𝓁 1 wk𝓁 1 ,k𝓁 .In probability theory, a random walk is itself envisioned as arandom variable that can assume multiple values consisting ofdifferent realizations. Indeed, given a starting point, there arein general many choices (one for each nonzero in the corresponding row of P) to select the second state and from there onrecursively. The actual feasibility of a path and the frequencywith which it is selected depend on the initial probability andon the transition matrix. By introducing 𝜈 as a realization ofa random walk, we defineX(𝜈) 𝓁 0W𝓁 fk𝓁as the random variable associated with a specific realization𝜈. We can thus define the estimator 𝜃 as 𝜃 E[X] P𝜈 X(𝜈),𝜈(4)As regards the variance, we recall the following relation:[Var 𝓁 0Hij.wij PijPi,k1 Pk1 ,k2k𝓁 1· · · Pk𝓁 1 ,k𝓁 wi,k1 wk1 ,k2 · · · wk𝓁 1 ,k𝓁 fk𝓁 . i Sand k𝓁 S represents the state reached at a generic 𝓁th stepof the random walk. A related sequence of random variableswij can be defined asn ][W𝓁 fk𝓁 E 𝓁 0]W𝓁2 fk2𝓁( [ ])2 EW𝓁 fk𝓁. (5)𝓁 0Hence, the variance can be computed as the differencebetween the second moment of the random variable and thesquare of its first moment.In order to apply the central limit theorem (CLT) to theestimators defined above, we must require that the estimatorshave both finite expected value and finite variance. This isequivalent to checking the finiteness of the expected value andsecond moment. Therefore, we have to impose the followingconditions:[ ] EW𝓁 fk𝓁 ,(6)𝓁 0[E 𝓁 0]W𝓁2 fk2𝓁 .(7)The forward method presented above, however, has the limitation of employing an entire set of realizations to estimatejust a single entry of the solution at a time. Hence, in orderto estimate the entire solution vector for Equation 1, we haveto employ a separate set of realizations for each entry in thesolution vector. This limitation can be circumvented by theadjoint method, which we describe below.Remark 1. It is important to note that in order to constructthe random walks, access to the individual entries of H isrequired. Hence, H needs to be formed explicitly and, therefore, must be sparse in order to have a practical algorithm.

BENZI ET AL.4 of 182.2and the sequence of weights as follows:Adjoint methodA second MC method can be derived by considering the linearsystem adjoint to Equation 1:AT y d,(8)where y and d are the adjoint solution and source term.Equation 8 can be recast in a manner similar to Equation 2:y H T y d.wij By reformulating the fixed point scheme in its statistical interpretation, the following formula holds for the estimator of thesolution vector associated with the adjoint method: it is thevector 𝜽 Rn such that][ 𝜃i EW𝓁 𝛿k𝓁 ,i𝓁 0Note that 𝜌(H ) 𝜌(H) 1, hence convergence of the Neumann series (fixed point iteration) for Equation 1 guaranteesconvergence for the adjoint system 8.Exploiting the following inner product equivalence:T⟨ T⟩A y, x ⟨y, Ax⟩ ,(9)it follows that⟨x, d⟩ ⟨y, f⟩ .By writing the Neumann series for the solution to Equation 8we havey (H T )𝓁 d,(10)𝓁 0and focusing on a single entry of the solution vector: yi di 𝓁 1 k1 1 k2 1··· THi,kHkT ,k · · · HkTk𝓁 1112𝓁 1 ,k𝓁dk𝓁 .The undetermined quantities in the dual problem 8 are y andd. Therefore, two constraints are required: the first constraintis Equation 9 and as a second constraint we select d to be oneof the standard basis vectors. Applying this choice of d to 9we get⟨y, f⟩ ⟨x, d⟩ xi .In order to give a stochastic interpretation of the adjointmethod similar to the one obtained for the forward method,we introduce the initial probability: fi p̃ (k0 i) ‖f‖1and the initial weight:W0 ‖f‖1fi. fi The transition probability is defined as HijT Hji np(k𝓁 j k𝓁 1 i) Pij nTk 1 Hki k 1 Hik nn n···𝓁 0 k0 1 k1 1 k2 1n fk0 Pk0 ,k1 Pk1 ,k2k𝓁 1· · · Pk𝓁 1 ,K𝓁 wk0 ,k1 · · · wk𝓁 1 ,k𝓁 𝛿k𝓁 ,i .This estimator is known in literature as collision estimator.The forward method adds a contribution to the componentof the solution vector where the random walk began, basedon the value of the source vector in the state in which thewalk currently resides. The adjoint method, on the other hand,adds a contribution to the component of the solution vectorwhere the random walk currently resides based on the valueof the source vector in the state in which the walk began. TheKronecker delta at the end of the series 10 represents a filter,indicating that only a subset of realizations contribute to thejth component of the solution vector.The variance is given by[nnnHji.PijVar 𝓁 0][W𝓁 𝛿k𝓁 ,i E 𝓁 0]W𝓁2 𝛿k𝓁 ,i( [ ])2 W𝓁 𝛿k𝓁 ,i, Ei 1, , n.𝓁 0(11)Along the same lines as the development for the forwardmethod, we must impose finiteness of the expected value andsecond moment. Therefore, the following conditions must beverified:[ ] EW𝓁 𝛿k𝓁 ,i i 1, , n(12)𝓁 0and[E 𝓁 0]W𝓁2 𝛿k𝓁 ,i ,i 1, , n.(13)The main advantage of this method, compared to the forwardone, consists in the fact that a single set of realizations is usedto estimate the entire solution vector. Unless only a small portion of the problem is of interest, this property often leads tothe adjoint method being favored over the forward method.In other terms, the adjoint method should be preferred whenapproximating the solution globally over the entire computational domain, and the forward method is especially usefulwhen approximating the solution locally.

BENZI ET AL.5 of 18In literature, another estimator is employed along with theadjoint MC method, the so called expected value estimator.Its formulation is as follows: it is the vector 𝜽 Rn such that(I H)𝛿xl 1 rl .(16)dictated by the CLT (N here is the number of random walksused to estimate the solution). Furthermore, when the spectralradius of the iteration matrix is close to unity, each individual random walk may require a large number of transitionsto approximate the corresponding components in the Neumann series. To offset the slow convergence of the CLT,schemes have been proposed which combine traditional fixedpoint iterative methods with the stochastic solvers. The firstsuch method, due to Halton, was termed the sequential MonteCarlo (SMC) method and can be written as follows.If this equation were to be solved exactly, the correspondingapproximation xl 1 xl 𝛿xl 1 would be the exact solutionto the linear system. This is of course impractical, becausesolving 16 is equivalent to solving the original linear system1. Instead, the correction is obtained by solving Equation 16only approximately, using an MC method. Because MC isonly applied within a single iteration, the CLT is only applicable within that iteration rather than to the overall convergence behavior of the algorithm. This allows a trade-offbetween the amount of time and effort spent on the inner(stochastic) and outer (deterministic) iterations, which cantake into account the competing goals of reliability and rapidconvergence.A further extension of Halton’s method, termed MonteCarlo Synthetic Acceleration (MCSA), has been recentlyintroduced in the previous studies.6,14 The MCSA algorithmcan be written asAs with SMC, an MC linear solver is used to compute1the updating contribution 𝛿xl 2 . In this approach, an extrastep of Richardson iteration is added to smooth out someof the high-frequency noise introduced by the MC process.This way, the deterministic and stochastic components of thealgorithm act in a complementary fashion.Obviously, a minimum requirement is that the linear systemcan be written in the form 2 with 𝜌(H) 1. This is typicallyachieved by preconditioning. That is, we find an invertiblematrix P such that H I P 1 A satisfies 𝜌(H) 1, andwe apply the method to the fixed point problem 2, whereH I P 1 A and f P 1 b. In other words, the underlying deterministic iteration is a preconditioned Richardsoniteration. Various choices of the preconditioner are possible;a detailed discussion of this issue is deferred until Section 3.5.Here, we note only that because we need explicit knowledgeof the entries of H, not all preconditioning choices are viable;in particular, P needs to be such that H I P 1 A retainsThe MC linear solver method is used to compute the update𝛿xl . This algorithm is equivalent to a Richardson iterationaccelerated by a correction obtained by approximately solvingthe error-residual equationa high degree of sparsity. Unless otherwise specified, below,we assume that the transformation of the original linear system 1 to the fixed point form 2 with 𝜌(H) 1 has already beencarried out.[] TW𝓁 Hk ,i𝜃 i E fi 𝓁 0 n fi 𝓁nn ···𝓁 0 k0 1 k1 1 k2 1n (14)fk0 Pk0 ,k1 Pk1 ,k2k𝓁 1· · · Pk𝓁 1 ,k𝓁 wk0 ,k1 · · · wk𝓁 1 ,k𝓁 HkT ,i .𝓁Hence, the expected value estimator averages the deterministic contribution of the iteration matrix over all the potentialstates j that might be reached from the current state 𝓁. Thevariance in this case becomes[ ][ ] T2 TVarW𝓁 Hk ,i EW𝓁 Hk ,i𝓁 0𝓁𝓁 0𝓁 02.3𝓁( [ ])2 T EW𝓁 Hk ,i,𝓁i 1, , n.(15)Hybrid stochastic/deterministic methodsThe direct methods described in Sections 2.1 and 2.2 suffer from a slow rate of convergence due to the 1 behaviorN

BENZI ET AL.6 of 183CONVERGENCE BEHAVIOROF STOCHASTIC METHODSInterestingly, the convergence requirements imposed by theMC estimator and the corresponding variance can be reformulated in a purely deterministic setting. For instance, thecondition of finiteness of the expected value turns out to beequivalent to requiring𝜌(H) 1,(17)where H is the iteration matrix of the fixed point scheme.Indeed, we can see from Equations 4 and 10 that the expectedvalue is expressed in terms of power series of H, and the condition 𝜌(H) 1 is a necessary and sufficient condition for theNeumann series to converge.Next, we address the finiteness requirement for the secondmoment. Equations 5 and 11 for the forward and the adjointmethod, respectively, show that the second moment can bereinterpreted as a power series with respect to the matricesdefined as follows:Hij2Ĥ ij - Forward MethodPijandconditions for convergence. In particular, these papers discusssuitable choices for constructing the transition probabilitymatrix, P.The construction of the transition probability must obviously satisfy the following constraints (called transition conditions):{P 0 ijNj 1 Pij 1 .One additional requirement relates the sparsity pattern of Hto that of the transition probabilities:Forward Method: Hij 0 Pij 0,Adjoint Method: Hji 0 Pij 0.The following auxiliary result can be found in one study.18Lemma 1. Consider a generic vector g (g1 , g2 , , gN )Twhere at least one element is nonzero, gk 0 for some k {1, , N}. Then, the following statements holda. for any probability distribution vector 𝜷 (𝛽 1 , 𝛽 2 , ,N 2 gk𝛽 N )T satisfying the transition conditions, 𝛽k 1 k)2( N gk ; moreover, the lower bound is attained for thek 1Ĥ ij Hji2Pijprobability vector defined by 𝛽k - Adjoint Method. gk ; Nk 1 gk b. there always exists a probability vector 𝜷 such thatIn order for the corresponding power series to converge, wemust requirê 1.𝜌(H)(18)Hence, condition 17 is required for a generic fixed pointscheme to reach convergence, whereas the extra condition18 is typical of the stochastic schemes studied in this work.Moreover, because the finiteness of the variance automatically entails the finiteness of the expected value, we can statethat Equation 18 implicitly entails Equation 17, whereas theconverse is not true in general.c, for all c 1.N g2kk 1𝛽k Consider now a generic realization of a random walk,truncated at a certain kth step:𝜈k r0 r1 r2 · · · rk ,and the corresponding statistical estimator associated with theforward MC method:Hr ,r Hr ,r · · · Hrk 1 ,rkX(𝜈k ) 0 1 1 2fr .Pr0 ,r1 Pr1 ,r2 · · · Prk 1 ,rk kThen, the following result holds (see one study18 ).3.1Necessary and sufficient conditionsHere, we report some results presented in the previousstudies,17,18 concerning necessary conditions and sufficientTheorem 1. (Forward method version)Let H Rn n be such that ‖H‖ 1. Consider 𝜈 k asthe realization of a random walk 𝜈 truncated at the kth step.

BENZI ET AL.7 of 18Then,a transition( there) always exists( ) matrix P such thatVar X(𝜈k ) 0 and VarX(𝜈)is bounded as k .k𝜈If we introduce the estimator associated with the adjointMC:X(𝜈k ) HrT0 ,r1 HrT1 ,r2 · · · HrTk 1 ,rkPr0 ,r1 Pr1 ,r2 · · · Prk 1 ,rkTheorem 2. (Adjoint method version)Let H Rn n with ‖H‖1 1. Consider 𝜈 k as the realization of a random walk 𝜈 truncated at the kth step.( Then,)therealways exists a transition matrix P such that Var X(𝜈k ) 0( )and Var𝜈 X(𝜈k ) is bounded as k .These results represent sufficient (but not necessary) conditions for the convergence of the forward and adjoint MC andcan be easily verified if H is explicitly available. However, inmany cases the conditions ‖H‖ 1 or ‖H‖1 1 may betoo restrictive.The connection between Lemma 1 and Theorems 1-2 willbe explained in the next section, dedicated to the definition oftransition probabilities.Construction of transition probabilitiesThe way the transition probability is defined has a significant impact on the properties of the resulting algorithm, andin many circumstances, the choice can make the differencebetween convergence or divergence of the stochastic scheme.Two main approaches have been considered in the literature:uniform probabilities and weighted probabilities. We discussthese next.3.2.1 Hik p ,and Hji pAdjoint p(ki j ki 1 i) Pij n,pk 1 Hki where p N. The case p 1 is called MC almost optimal (MAO). The reason for the “almost optimal” designation can be understood looking at Lemma 1, as the quantity N g2kk 1 𝛽 is minimized when the probability vector is definedkas 𝛽k gk . Nk 1 gk Indeed, Lemma 1 implies that the almostoptimal probability minimizes the -norm of Ĥ for the forward method and the 1-norm of Ĥ for the adjoint method,because the role of g in Lemma 1 is played by the rows of Hin the former case and by the columns of H in the latter one.This observation provides us with easily computable upper̂bounds for 𝜌(H).3.3Classes of matrices with guaranteed convergenceOn the one hand, sufficient conditions for convergence ofMC linear solvers are very restrictive; see, for example, theprevious studies.17,18 On the other hand, the necessary andsufficient condition in one study18 requires knowledge of̂ which is not readily available. Note that explicit com𝜌(H),̂ is quite expensive, comparable to the costputation of 𝜌(H)of solving the original linear system. Although ensuring that𝜌(H) 1 (by means of appropriate preconditioning) is in̂ 1 is generallymany cases possible, guaranteeing that 𝜌(H)much more problematic.Here, we identify matrix types for which both conditionscan be satisfied by an appropriate choice of splitting, so thatconvergence of the MC scheme is guaranteed.Uniform probabilitiesWith this approach, the transition matrix P is such that all thetransitions corresponding to each row have equal probabilityof occurring:{Forward Pij {Adjoint Pij 01#(nonzeros in row i of H)01#(nonzeros in column i of H)if Hij 0,if Hij 0;if Hji 0,if Hji 0.The MC approach resorting to this definition of the transitionmatrix, in accordance to one study,11 is called uniform MC.3.2.2 Hij pk 1sign(fr0 )‖f‖1 ,then we can state a theorem analogous to Equation 1 (see onestudy18 ).3.2Forward p(ki j ki 1 i) Pij nWeighted probabilitiesAn alternative definition of transition matrices aims to associate nonzero probability to the nonzero entries of H accordingly to their magnitude. For instance, we may employ thefollowing definition:3.3.1Strictly diagonally dominant (SDD) matricesOne of these categories is represented by SDD matrices. Weinvestigate under which conditions diagonal preconditioningis enough to ensure convergence. We recall the followingdefinitions.Definition 1. A matrix A Rn n is SDD by rows ifn aij aij .(19)i 1i jDefinition 2. A matrix A Rn n is SDD by columns if AT isSDD by rows, that is,n aij aij .j 1j i(20)

BENZI ET AL.8 of 18Suppose A is SDD by rows. Then we can apply left diagonal (Jacobi) preconditioning, obtaining an iteration matrixH I P 1 A such that ‖H‖ 1. Introducing a MAOtransition probability for the forward method: Hij Pij nk 1 Hik we have that the entries of Ĥ are defined as follows:( n)Hij2 Ĥ ij Hij Hik .Pijk 1on the corresponding fixed point (preconditioned Richardson)̂ 1.iteration, because in general, we cannot expect that 𝜌(H)Nevertheless, if A is an M-matrix, there exist efficient methods to determine a diagonal scaling of A so that the resultingmatrix is SDD by rows. Note that the scaled matrix is still anM-matrix, therefore applying left Jacobi preconditioning tô 1.this matrix will guarantee that both 𝜌(H) 1 and 𝜌(H)In one study,21 the author presents a procedure to determinewhether a given matrix A Cn n is GDD (in which case thediagonal scaling that makes A SDD is produced), or not. Thealgorithm can be described as follows.Consequently,n Ĥ ij j 1n Ĥ ij (j 1( n n )( Hij j 1)2 Hij 1,n ) Hik k 1 i 1, · · · , n.j 1̂ ‖H‖̂ 1, guaranteeing the forThis implies that 𝜌(H)ward MC converges. However, nothing can be said a prioriabout the convergence of the adjoint method.On the other hand, if (20) holds, we can apply right diagonal(Jacobi) preconditioning, which results in an iteration matrixH I AP 1 such that ‖H‖1 1. In this case, by a similar reasoning we conclude that the adjoint method converges,̂ 1 1; however, nothing can be said a prioriowing to ‖H‖about the forward method.Finally, it is clear that if A is SDD by rows and by columns,then a (left or right) diagonal preconditioning will result in theconvergence of both the forward and the adjoint MC schemes.3.3.2Generalized diagonally dominant (GDD) matricesAnother class of matrices for which the convergence of MCsolvers is ensured is that of GDD matrices. We recall thefollowing definition.Definition 3. A square matrix A Cn n is said to be GDD if aii di n aij dj ,i 1, , n,j 1j ifor some positive vector d (d1 , , dn )T .A proper subclass of the class of GDD matrices is represented by the nonsingular M-matrices. Recall that A is anonsingular M-matrix if it is of the form A rI B, whereB is nonnegative and r 𝜌(B). It can be shown (see, e.g., onestudy19 ) that a matrix A Rn n is a nonsingular M-matrixif and only if there exists a positive diagonal matrix Δ suchthat AΔ is SDD by rows. Clearly, every nonsingular M-matrixis GDD.It is well known (see, e.g., the work of Axelsson20 ) that theclassical Jacobi, Block Jacobi, and Gauss–Seidel splittings areconvergent if A is a nonsingular M-matrix. However, this isnot enough to ensure the convergence of MC schemes basedThis procedure, which in practice converges very fast, turnsa generalized diagonally dominant matri

ing sparse approximate inverses. Numerical experiments on linear systems arising from the discretization of partial differential equations are presented. KEYWORDS iterative methods, Monte Carlo methods, preconditioning, resilience, Richardson iteration, sparse approximate inverses, sparse linear systems 1 INTRODUCTION

Related Documents:

The Markov Chain Monte Carlo Revolution Persi Diaconis Abstract The use of simulation for high dimensional intractable computations has revolutionized applied math-ematics. Designing, improving and understanding the new tools leads to (and leans on) fascinating mathematics, from representation theory through micro-local analysis. 1 IntroductionCited by: 343Page Count: 24File Size: 775KBAuthor: Persi DiaconisExplore furtherA simple introduction to Markov Chain Monte–Carlo .link.springer.comHidden Markov Models - Tutorial And Examplewww.tutorialandexample.comA Gentle Introduction to Markov Chain Monte Carlo for .machinelearningmastery.comMarkov Chain Monte Carlo Lecture Noteswww.stat.umn.eduA Zero-Math Introduction to Markov Chain Monte Carlo .towardsdatascience.comRecommended to you b

Quasi Monte Carlo has been developed. While the convergence rate of classical Monte Carlo (MC) is O(n¡1 2), the convergence rate of Quasi Monte Carlo (QMC) can be made almost as high as O(n¡1). Correspondingly, the use of Quasi Monte Carlo is increasing, especially in the areas where it most readily can be employed. 1.1 Classical Monte Carlo

Fourier Analysis of Correlated Monte Carlo Importance Sampling Gurprit Singh Kartic Subr David Coeurjolly Victor Ostromoukhov Wojciech Jarosz. 2 Monte Carlo Integration!3 Monte Carlo Integration f( x) I Z 1 0 f( x)d x!4 Monte Carlo Estimator f( x) I N 1 N XN k 1 f( x k) p( x

Introduction to Markov Chain Monte Carlo Monte Carlo: sample from a distribution - to estimate the distribution - to compute max, mean Markov Chain Monte Carlo: sampling using "local" information - Generic "problem solving technique" - decision/optimization/value problems - generic, but not necessarily very efficient Based on - Neal Madras: Lectures on Monte Carlo Methods .

vi Equity Valuation 5.3 Reconciling operating income to FCFF 66 5.4 The financial value driver approach 71 5.5 Fundamental enterprise value and market value 76 5.6 Baidu’s share price performance 2005–2007 79 6 Monte Carlo FCFF Models 85 6.1 Monte Carlo simulation: the idea 85 6.2 Monte Carlo simulation with @Risk 88 6.2.1 Monte Carlo simulation with one stochastic variable 88

Electron Beam Treatment Planning C-MCharlie Ma, Ph.D. Dept. of Radiation Oncology Fox Chase Cancer Center Philadelphia, PA 19111 Outline Current status of electron Monte Carlo Implementation of Monte Carlo for electron beam treatment planning dose calculations Application of Monte Carlo in conventi

J.S. Liu and R. Chen, Sequential Monte Carlo methods for dynamic systems , JASA, 1998 A. Doucet, Sequential Monte Carlo Methods, Short Course at SAMSI A. Doucet, Sequential Monte Carlo Methods & Particle Filters Resources Pierre Del Moral, Feynman-Kac

Computational Geometry Aspects of Monte Carlo Approaches to PDE Problems in Biology, Chemistry, and Materials Monte Carlo Methods for PDEs A Little History on Monte Carlo Methods for PDEs Early History of MCMs for PDEs 1.Courant, Friedrichs, and Lewy: Their pivotal 1928 paper has probabilistic interpretations and MC algorithms for linear elliptic