FAST ALGORITHMS TO COMPUTE MATRIX-VECTOR

2y ago
15 Views
2 Downloads
258.53 KB
19 Pages
Last View : 1m ago
Last Download : 3m ago
Upload by : Rosa Marty
Transcription

FAST ALGORITHMS TO COMPUTE MATRIX-VECTORPRODUCTS FOR PASCAL MATRICES ZHIHUI TANG † , RAMANI DURAISWAMI‡ , AND NAIL GUMEROV §Abstract. The Pascal matrix arises in a number of applications. We present a few ways todecompose the Pascal matrices of size n n into products of matrices with structure. Based on thesedecompositions, we propose fast algorithms to compute the product of a Pascal matrix and a vectorwith complexity O(n log n). We also present a strategy to stabilize the proposed algorithms. Finally,we also present some interesting properties of the Pascal matrices that help us to compute fast theproduct of the inverse of a Pascal matrix and a vector, and fast algorithms for generalized PascalMatrices.Key words. Matrix-vector product, Pascal matrices, matrix decomposition, structured matrices, Toeplitz matrices, Hankel matrices, Vandermonde matricesIntroduction. If we arrange the binomial coefficientsCrn r!, r 0, 1, 2, · · · , n 0, 1, · · · , rn!(r n)!(0.1)in staggered rows,11111113456126101513141020,515(0.2)1161we obtain the famous Pascal triangle. However, Pascal was not the first to discoverthis triangle, it had been described about 500 years earlier by Chinese mathematicianYang Hui. Therefore in China, it is known as “Yang Hui’s triangle" . It also hasbeen discovered in India, Persia and Europe. For a fascinating introduction, see[Edwards02].The Pascal matrix arises in many applications such as in binomial expansion,probability, combinatorics, linear algebra [Edelman03], electrical engineering [Biolkova99]and order statistics [Aggarwala2001]. As described in previous work [Brawer92,Call93, Zhang97, Zhang98, Edelman03], three different types of n n matrices can beobtained by arranging the Pascal triangle into a matrix and truncating appropriately. Thiswork was supported by NSF Awards 0086075 and 0219681.of Mathematics and Institute for Advanced Computer Studies, University of Maryland, College Park, MD 20742 (ztang@umiacs.umd.edu)‡ Institute for Advanced Computer Studies, University of Maryland, College Park, MD 20742(ramani@umiacs.umd.edu)§ Institute for Advanced Computer Studies, University of Maryland, College Park, MD 20742(gumerov@umiacs.umd.edu)† Department1

These include the lower triangular matrix 100 110 121 P Ln 133 . .012Cn 1Cn 1Cn 10001.············.0000.3Cn 1···n 1Cn 1 , (0.3)the upper triangular matrix P Un or the symmetric matrix P Sn ············.Cn 12Cn 13Cn 1.0 0 0 0 ···n 1Cn 11000.1100.1210.1331.1 , n 1Cn12Cn 13Cn 2.0Cn 1Cn12Cn 13Cn 2···n 1C2n 2(0.4) . (0.5)Since all these matrices have strong structure, they have many interesting anduseful properties [Brawer92, Call93, Zhang97, Zhang98, Edelman03]. We encounteredit while working to develop fast translation operators in two dimensions and rotation operators in three dimensions for the fast multipole method [Tang03]. The fasttranslation operators are of crucial importance for the fast multipole method.The main results of this paper are fast algorithms, with computational complexityO(n log n), for computing the matrix-vector producty P x,(0.6)where P is one of these Pascal matrices or a matrix related to them of size n n, xand y are vectors of length n.The paper is organized as follows. In Section 1 we describe some classical resultsabout fast matrix-vector products for matrices with Toeplitz, Hankel and Vandermonde structure. In Section 2 we present our results on decompositions of the Pascalmatrices into products of structured matrices, which allow fast computation of thematrix-vector product. If implemented naively, these algorithms are not numericallystable. We provide a modification to stabilize the algorithms in Section 3. Numericalexamples are given in Section 4 to demonstrate the effectiveness of the stabilizingtechnique and accuracy of the algorithm. In Section 5 we extend the algorithms tothe so-called generalized Pascal matrices. Some concluding remarks are provided inSection 6.2

1. Some classical results. The multiplication of a matrix and a vector arisesin many problems in engineering and applied mathematics, and is the fundamentaloperation in much of scientific computation. For a dense matrix A of size n n, itsproduct Ax with an arbitrary input vector x requires O(n2 ) work by standard matrixvector multiplication. In many applications, n is very large, and moreover, for thesame matrix, the multiplication has to be done over and over again with different inputvectors, for example, in iterative methods for solving linear systems. In such cases, oneseeks to identify special properties of the matrices in order to reduce the computationalwork. One special class of matrices are the structured matrices, which are dense butdepend on only O(n) parameters. They often appear in communications, control,optimization, and signal processing, etc. The multiplication of any of these matriceswith any arbitrary input vector can often be done in O(n logk n) time, where usually0 · k · 2, depending on the structure. Since the structured matrices depend on onlyO(n) parameters, they require reduced storage of only O(n) memory. Examples ofstructured matrices are Fourier matrices, circulant matrices, Toeplitz matrices, Hankelmatrices, Vandermonde matrices, etc. To make the paper self-contained we providethe following resume of fast matrix-vector product algorithms for these matrices.1.1. Fourier matrices. The most important class of matrices in all fast algorithms are the Fourier matrices.Definition 1.1. A Fourier matrix of order n is defined as the following 111··· 12n 1 1ωnωn· · · ωn 2(n 1)24 , ωnωn· · · ωnFn 1(1.1) ··· ········· ···2(n 1)(n 1)(n 1)1ω n 1ωn· · · ωnnwhereω n e 2πin(1.2),is an nth root of unity.It is well known that the product of this matrix with any vector is the so-calleddiscrete Fourier transform, which can be done efficiently using the fast Fourier transform (FFT) algorithm [Cooley65]. Notice that the Fourier matrix is a unitary matrix,that is, Fn Fn I, therefore, the conjugate transpose Fn is also a unitary matrix.The corresponding efficient matrix-vector product is the inverse fast Fourier transform (IFFT) [Van92, Golub96].Theorem 1.2. The FFT and IFFT can be done in O(n log n) time and O(n log n)memory references.A proof can be found in [Cooley65] or [Van92]. This theorem is the basis for anumber of other efficient algorithms, for example, the product of a circulant matrixand a vector.1.2. Circulant matrices. Definition 1.3. x1 xn x2 x1 Cn C(x1 , ., xn ) x3 x2 ··· ···xn xn 13A matrix of the form xn 1 · · · x2xn· · · x3 x1· · · x4 ······ ··· xn 2 · · · x1(1.3)

is called a circulant matrix.It is easy to see that a circulant matrix is completely determined by the entries inthe first column. All other columns are obtained by a shift of the previous column.It has the following important property.Theorem 1.4. Circulant matrices Cn (x) can be diagonalized by the Fouriermatrix,Cn (x) Fn · diag(Fn x) · Fn ,(1.4)where x (x1 , ., xn )0 .A proof can be found in [Bai2000]. Given this theorem, we have the followingfast algorithm for a matrix-vector product for a circulant matrix.Given a circulant matrix Cn , and a vector y, the product(1.5)Cn ycan be computed efficiently using the following four steps:1. compute f FFT(y),2. compute g FFT(x),3. compute the element wise vector-vector product h f. g,4. compute z IFFT(h) to obtain Cn ySince the FFT and the IFFT can be done in O(n log n) time and memory references, Cn y can be obtained in O(n log n) time and memory references [Bai2000, Lu98].1.3. Toeplitz matrices. Given an algorithm for a fast matrix-vector productfor circulant matrices, it is easy to see the algorithm for a Toeplitz matrix, since aToeplitz matrix can be embedded into a circulant matrix.Definition 1.5. A matrix of the form x1x2· · · xn 1x0 x 1x0x1· · · xn 2 x 1x0· · · xn 3 Tn T (x n 1 , · · · , x0 , ., xn 1 ) x 2 (1.6) ············ ··· x n 1 x n 2 x n 3 · · · x0is called a Toeplitz matrix.A Toeplitz matrix is completely determined by its first column and first row, andthus depends on 2n 1 parameters. The entries of Tn are constant down the diagonalsparallel to the main diagonal. It arises naturally in problems involving trigonometricmoments. Sometimes we denote a Toeplitz matrix using its first column vector androw vector 0 c c0 c1 c2 . cp 1 , r c0 r1 r2 . rp 1(1.7)by T oep(c, r0 ) T oep 4c0c1c2.cp 1,c0r1r2.rp 1 (1.8)

Theorem 1.6. [Bai2000, Kailath99] The product of any Toeplitz matrix and anyvector can be done in O(n log n) time and memory references.Proof. Given a Toeplitz matrix Tn and a vector y, to compute the product Tn y, aToeplitz matrix can first be embedded into a 2n 2n circulant matrix C2n as follows· Tn SnC2n ,(1.9)Sn Tnwhere Sn 0xn 1xn 2···x1x n 2x n 10···x3x n 10xn 1···x2Then Tn y can be multiplied as· ·yTnC2n · 0n nSnSnTn···············x 1x 2x 3···0 . (1.10) · · yTn y· ,0n nSn y(1.11)which can be implemented to be done in O(n log n) time and memory references.1.4. Hankel matrices. Definition 1.7. A x n 1 x n 2 Hn H(x n 1 , · · ·, x0 , · · · , xn 1 ) x n 3 ···x0matrix of the formx n 2x n 3x n 4···x1x n 3x n 4x n �xn 1 (1.12)is called a Hankel matrix.A Hankel is completely determined by its first column and last row and thusdepends on 2n 1 parameters. The entries of Tn are constant along the diagonalsthat are perpendicular to the main diagonal. It arises naturally in problems involvingpower moments. It has the following property [Golub96].Theorem 1.8. The product of a Hankel matrix and any vector can be done inO(n log n) time and memory references.Proof. Notice that if 00··· 01 00··· 10 (1.13)Ip · · · · · · · · · · · · · · · , 01··· 00 10··· 00is the backward identity permutation matrix, then Ip Hn is a Toeplitz matrix for anyHankel matrix Hn , and Ip Tn is a Hankel matrix for any Toeplitz matrix Tn . Theproduct Hn y for any vector y can be computed as follows [Kailath99]: first compute of a Toeplitz matrix Ip Hn and vector y as in (1.11), then applythe product (Ip Hn )y to have P (P Hn )y, which is what we wantthe permutation to the vector (Ip Hn )yt 1since Ip Ip Ip .5

1.5. Vandermonde matrices. Definition 1.9. Suppose {xi , i 0, 1, .n} Cn 1 , a matrix of the form 1 x0V V (x0 , x1 , ., xn ) ···xn01x1···xn1············ 1xn ··· xnn(1.14)is called a Vandermonde matrix.A Vandermonde matrix is completely determined by its second row, and so depends on n 1 parameters. All rows are powers of the second row from the power 0to power n. It is a fact thatdet A nY(xi xj )(1.15)i,j 0,i jso a Vandermonde matrix is nonsingular if and only if all the (n 1) parametersx0 , x1 , ., xn are distinct. In this paper we assume this is the case whenever we needthe inverse of this matrix.A Fourier matrix is a special case of Vandermonde matrix. The transpose ofa Vandermonde matrix arises naturally in polynomial evaluations or polynomial interpolations. There exist efficient algorithms for fast matrix-vector product for aVandermonde matrix, its transpose, its inverse, and the transpose of its inverse. All although there are associated stability probof them are of complexity O(n log2 n),lems [Driscoll97, Moore93]. The basic idea is to factor the matrices into products ofsparse matrices, Toeplitz matrices and the like, so that the FFT can be applied tospeed up the computations. We state these facts as a theorem below. The details canbe found in [Driscoll97, GohbergL94, GohbergC94, Lu98, Moore93, Pan92].Theorem 1.10. The product of any Vandermonde matrix, its transpose, itsinverse, or the transpose of its inverses with any vector is of complexity O(n log2 n).In later sections, we give representations of the Pascal matrices in factored forms interms of Vandermonde matrices. There exist a number of algorithms for the productof a Vandermonde matrix and a vector and techniques to overcome the instabilityproblems associated with the algorithm [Driscoll97, Moore93]. However, in this paperwe do not recommend use of those factored representations involving Vandermondematrices when alternate representations are available due to the inferior complexityand the reported instability of the Vandermonde algorithms.2. Fast algorithms. In this section we present decompositions of the Pascalmatrices which allow fast computation of the matrix-vector product.2.1. Decomposition of the lower triangular Pascal matrix. We will usethe following identity in our implementation of the fast algorithms.Theorem 2.1. The Pascal matrix P L can be decomposed as,P L diag(v1 ) · T · diag(v2 ),6(2.1)

where the vectors v1 and v2 are and the matrix v1 112!3!.(p 1)! 1 1 1 2! T 1 3! . .1(p 1)! , v2 111!12!13!.1(p 1)!.············.1(p 2)!1(p 3)!···0100111!12!11! , 0000.1(2.2) is a Toeplitz matrix.Proof. Notice that the (n, m) entry P Lnm of the Pascal matrix is½ m 1if n mCn 1,P Lnm 0if n m(2.3)(2.4)(n 1)!m 1 (n m)!(m 1)!. That is, every entry in n-th row of the Pascal matrixwhere Cn 1has a common factor (n 1)!, and every entry in m-th column of the Pascal matrix1. We can take out the common factor (n 1)! of the n-thhas a common factor (m 1)!1of the m-th column, and multiply from left side byrow and common factor (m 1)!a diagonal matrix which is the identity, except that the n-th entry in the diagonalis (n 1)!, and multiply from right side by a diagonal matrix which is the identity,1except that the m-th entry in the diagonal is (m 1)!. This can be done for every rowand column. 0!00··· 00!0!1! 1!0··· 0 1!0!0!1!2!2! 2!···0 2!0!1!1!0!2! 3!3!3!(2.5)PL ···0 3!0!2!1!1!2! . . . (p 1)!(p 1)!(p 1)!···1(p 1)!0!(p 2)!1!(p 3)!2!Therefore we have factored the Pascal matrix into products of matrices with a Toeplitzmatrix T in the middle and p diagonal matrices on the left of T , and p diagonalmatrices on the right of T . Multiplying the diagonal matrices on the left and theright respectively, we end up with the diagonal matrices diag(v1 ) and diag(v2 ).With the notation for Toeplitz matrix introduced earlier, T can be written as 11 10 1 0 2 1(2.6)T Toep , 0 6 . . . 1(p 1)!70

From Theorem 2.1, It is clear that the multiplication of a Pascal matrix P L and avector x can be done in three steps: first calculate the element-wise multiplication ofu v1 . x, which requires p multiplications. Then calculate the product w T u ofToeplitz matrix T and vector u as (1.11), which requires O(p log p) work by Theorem1.6. And finally calculate another element-wise multiplication of v1 . w to obtain theproduct P Lx. Therefore we have the following.Theorem 2.2. The multiplication of the p p lower triangular Pascal matrixand a p vector can be done in O(p log p) operations.While we usually use the above decomposition to build fast algorithms for theproduct of the Pascal matrix and a vector, we have found some other ways to factorthe matrix which we state here. There may be useful in other contexts, such as inanalytical work.2.1.1. Alternate decomposition 1. Lemma 2.3. The Pascal matrix P L canbe decomposed as the following,P L V 2 V 1 1 ,where V2 V1 �·········.1xpx2px3p.xp 11x2p 1xp 13xp 14···xp 1p 1(x1 1)(x1 1)2(x1 1)3.1(x2 1)(x2 1)2(x2 1)3.1(x3 1)(x3 1)2(x3 1)3.············.1(xp 1)(xp 1)2(xp 1)3.(x1 1)p 1(x2 1)p 1(x3 1)p 1···(xp 1)p 1(2.8) (2.9)are Vandermonde matrices, and {xi , i 1, 2, · · · , p} are distinct numbers.Proof. Because P L is a matrix of binomial coefficients, it is easy to see thatP LV1 V2 .(2.10)Notice that {xi , i 1, 2, · · · , p} are distinct numbers and can be arbitrary. Hence V1is nonsingular and its inverse exists. Thus we haveP L V2 V1 1 .(2.11)If we let V1 be a matrix with one single column, the formula (2.10) provides usa tool to obtain the matrix-vector product V2 of P L and V1 , which can be used as atrue value to test against results from other method. In addition, from Theorem 1.10,we know that a Vandermonde matrix and its inverse can be multiplied by vectorsin O(p log2 p) time, this decomposition also allows fast matrix-vector product for thePascal matrix. However, it is slower than the previous decomposition. Furthermore,many existing algorithms for Vandermonde matrices are not stable (see [Moore93]and [Driscoll97]). Therefore it is not generally the preferred algorithm.8

2.1.2. Alternate decomposition 2. Lemma 2.4.decomposed as the product of p 1 matricesA Pascal matrix can beP L A1 A2 · · · Ap 1(2.12)where Ai 100.010.001.000.0.0.0.1.0 ·11 (2.13)with p number of 1’s as its diagonal entries, and i number of 1’s as its sub-diagonalentries starting from the position (p, p 1).Proof. We will prove this by mathematical induction. It is trivial for p 2. Nowassume that for p n,(n)(n)(n)P L(n) A1 A2 · · · An 1 ,(n)where P L(n) is the Pascal matrix of size n n, and Aisize n n. For p n 1, we need to prove that(n 1)P L(n 1) A1(n 1) A2(2.14)is Ai as defined in (2.13) of · · · A(n 1).n(2.15)It is easy to see that(n 1)Ai ·1 0(n)0 Ai for i 1, 2, · · · , n 1.That is, we need to prove· · · 1 01 01 0P L(n 1) . ··· A(n 1)(n)(n)(n)n0 A10 A20 An 1By assumption, we have· · · · 1 01 01 01 0 ··· .(n)(n)(n)0 P L(n)0 A10 A20 An 1Therefore, we need only to prove that ·1 0P L(n 1) A(n 1).n0 P L(n)(2.16)(2.17)(2.18)(2.19)It is easy to see that the entries in the first column of both sides are one’s, the entriesin the first row of both sides are the same. For all other entries, we need to prove that(n 1)P Lij(n)(n) P L(i 1)(j 1) P L(i 1)j9(2.20)

This is trivial for all entries in the upper triangular part of the matrices since all entries(n)are zeroes. This is also true for the entries of the diagonal since P L(i 1)j 0, and(n 1)(n)and P L(i 1)(j 1) are all one’s. What is left to prove is the lower triangularP Lijpart of the matrices. We know(n 1)P Lijj 1 Ci 1,(2.21)and(n)(n)j 2j 1j 1 Ci 2 Ci 1.P L(i 1)(j 1) P L(i 1)j Ci 2(2.22)Therefore for p n 1, we have(n 1)P L(n 1) A1(n 1) A2 · · · A(n 1).n(2.23)This completes the proof.This decomposition would require O(p2 ) operations for matrix-vector product.However, all these are additions and there are no multiplications. This may be suitablefor some computer architectures.2.2. The upper triangular Pascal matrix. We have shown some fast multiplication algorithm for a lower triangular Pascal matrix and a vector. It is easy to seethat the upper triangular Pascal matrix P U is the transpose of the lower triangularPascal matrix P L. Therefore, it can be decomposed in a similar way to the lowertriangular Pascal matrix. Indeed, applying the transpose to different decompositions(2.1), (2.7), and (2.12) of the lower triangular Pascal matrix, we would obtain decompositions which allow fast matrix-vector products. Therefore we also have thefollowing theorem similar to Theorem 2.2.Theorem 2.5. The multiplication of the upper triangular Pascal matrix and avector can be done in O(p log p) operations.2.3. The symmetric Pascal matrix. The following lemma gives the Choleskydecomposition of the matrix P S [Edelman03].Lemma 2.6.PS PL PU(2.24)A number of proofs can be found in [Edelman03]. This identity implies all decompositions that admit fast matrix-vector product for the Pascal matrices P L andP U can be utilized to do fast matrix-vector product for the Pascal matrix P S, sincewe can first multiply P U by the vector to obtain the product, which is a vector, andthen multiply by P L.We also have the following factorization.Lemma 2.7. The matrix P S can be decomposed as the following,P S diag(v) · H · diag(v),where v 111!12!13!.1(p 1)! , and H �···.(p 1)! p! (p 1)! · · ·10(p 1)!p!(p 1)!(p 2)!.(2p 2)! (2.26)

is a Hankel matrix.Proof. This lemma can be proved in a way similar to the proof of (2.1). Noticethat the (n, m) entry P Snm of the matrix P S isn.P Snm Cn m(2.27)n (n m)!where Cn mn!m! . That is, every entry in n-th row of the matrix has a common1factor n!, and every entry in m-th column of the Pascal matrix has a common factor111m! . We can take out the common factor n! of the n-th row and common factor m!of the m-th column, and multiply from left side by a diagonal matrix which is the1identity, except that the n-th entry in the diagonal is n!, and multiply from right sideby a diagonal matrix which is the identity, except that the m-th entry in the diagonal1is m!. This can be done for every row and column. 0!0!0!1!0!1!2!0!2!3!0!3! PS . .(p 1)!0!(p !.p!1!(p 1)!(p 1)!2!(p 1)!············.···(p 1)!(p 1)!0!p!(p 1)!1!(p 1)!(p 1)!2!(p 2)!(p 1)!3!.(2p 2)!(p 1)!(p 1)! . (2.28)Therefore we have factored the Pascal matrix into products of matrices with a Hankelmatrix H in the middle and p diagonal matrices on the left, and p diagonal matriceson the right. Multiplying the diagonal matrices on the left and the right respectively,we end up with the diagonal matrices diag(v) and diag(v).It is clear from the lemmas above that the multiplication of the matrix P S andany vector x can be either done by successively apply P U and P L to x, or firstcalculate the element-wise vector product of v. x, then apply Hankel matrix H tothe product, finally with the obtained vector, do another element-wise vector productwith v to arrive the result of P S · x. In either process, the involved matrices areeither Toeplitz matrices or Hankel matrices. By Theorems 1.6, and 1.8, we have thefollowing.Theorem 2.8. The multiplication of matrix P S and any vector can be done inO(p log p) operations.Although we can use both decompositions to do the matrix-vector product, thefirst one is less efficient in comparison with the second one. The reason is that thecost of the product of a Toeplitz matrix and a vector is the same as that of a Hankelmatrix and a vector, and the first one requires two multiplications of a Toeplitz matrixand a vector. We recommend use of the second decomposition.3. Stability and Implementation Issues. We have discussed how to multiplythe Pascal matrices with a vector efficiently through matrix decomposition. Noticethat the entries of the Toeplitz or Hankel matrices in the decompositions have verydifferent magnitudes of numbers, and there can exist instability problems if the decomposition is implemented naively. We discuss the implementation of these fastalgorithms and provide modifications required to achieve numerical stability in thissection.Fast algorithms for computing the product of a Pascal matrix and a vector arebased on the decomposition (2.1) and (2.25). We analyze the stability problem of11

the fast algorithm for the lower triangular Pascal matrix as an example and presenttechniques to stabilize it.Given a Pascal matrix P L of size p p, we can factor it intoP L diag(v1 ) · T (cp , rp0 ) · diag(v2 ),(3.1)whereFor a vector v1 1hv2 1hcp 1 rp 1a the product 0(p 1)! ,i0111· · · (p 1)!1! 2!,3!i0111· · · (p 1)!1! 2!,3! 0 0 ··· 0 .1! 2! 3! · · ·a1a0a2. ap 1 0(3.2)(3.3)(3.4)(3.5)(3.6),P La diag(v1 ) · T (cp , rp0 ) · diag(v2 ) · a(3.7)requires three matrix-vector products, of which two involve diagonal matrices, oneinvolves a Toeplitz matrix. Therefore the product P x can be done in O(p log p) timeinstead of O(p2 ) time required by straightforward matrix-vector product. A naiveimplementation of the above method shows that the precision gets worse as p getslarger. The reason for this is that the entries in the Toeplitz matrix and the vector1v2 are of very different magnitudes, varying approximately from 1 to (p 1)!. Whenwe compute the matrix-vector product, we need to compute the FFT of two vectorsi0h111···00···0u 1 1! 2!,(3.8)3!(p 1)!hi0111a2 3!a3 · · · (p 1)!ap 1 0 0 · · · 0 .x a0 1!a1 2!(3.9)For a large p, when we compute the FFT of u and x, the final result would be the same1if we simply treated the entries such as (p 1)!as zeros, and this causes the numericalinstability. Therefore we need to find a way to increase the effect of entries of smallermagnitude by bringing all nonzero terms in u and x to the same magnitude. Thiscan be done by multiplying or dividing the entries by some constant factors and stillpreserving the same structure, viz. a Toeplitz matrix. Indeed, the Pascal matrix canbe expressed by introducing a new parameter α as follows, P (α) diag(v1 (α)) · T oep cp (α), rp0 · diag(v2 (α)),(3.10)wherev1 (α) v2 (α) cp (α) rp (α) hhh 1111α2α26α3.α1α22α36.α1α22α361 0 0 ···12. 0 .(p 1)!αp 1αp 1(p 1)!αp 1(p 1)!i0,(3.11),(3.12)i0,(3.13)i0(3.14)

With this factorization, it is possible to implement a fast, numerically stable algorithmby selecting a proper value of α.We need to select a proper α so that the magnitude of maximum and minimumof the nonzero entries in the vector v2 (α) and cp (α) are approximately the same. TheFast Fourier transform is applied to the two vectorsi0hp 123ap 1x(α) a0 αa1 1 α 2a2 α 6a3 . α (p 1)!(3.15)0 0 . 0andu(α) h1 22α1 α1 36α.1p 1(p 1)! α0 0 . 0i0.(3.16) 0are of the same magnitude,Assuming all entries of a a0 a1 a2 . ap 1the entries in x(α) and the entries in u(α) are of the same magnitude.We want to choose one value α so that all nonzero entries of x(α) and u(α) areas close to each other as possible. Let us consider a typical entryf (m) αm, m 0, 1, 2, ., p 1.m!(3.17)We want the maximum and minimum of this function to be as close as possible. Wewill iteratively find an α which satisfies this criterion. To start the iteration we needan approximate guess for α. The following analysis provides this guess.If α p 1, thenfmin 1,and fmax αp 1.(p 1)!(3.18)In this case, we should choose α p 1. If 1 · α p 1, then when 0 · m · α,fmin 1,and fmax α[α];([α])!(3.19)when α m · p 1,fmin αp 1,(p 1)!and fmax α[α].([α])!(3.20)Comparing these two cases, it is easy to see that the proper value of α should be1 · α p 1 and we need to select α such thatmin(max(ααα αα (p 1)!,)).α!αp 1 α!(3.21)Using Stirling’s formula,eααα eααα ; 0.5α 0.5α!(2π) α(2πα)0.5(3.22)andαα (p 1)!αα (2π)0.5 (p 1)p 1 0.5 eα αp 1 α!αp 1 (2π)0.5 αα 0.5 ep 113rp 1αµp 1αe¶p 1eα .(3.23)

So whenp 1αe 1, we would achieve our objective, that is, whenp 1,(3.24)ethe magnitude of the nonzero entries of x(α) and u(α) are about the closest.This can provide us an initial value for the proper value of α. We obtaqin thebest α by searching in this vicinity. For each fixed p, we can pre-compute and test afew values of α to build a look-up table containing the best values of α(p) that achievenumerical stability.We would also like to note that the modification does not have much effect on thecomplexity of the algorithm: once p is known, we can select an α from the look-up tableand compute cp (α) and the FFT of u(α) and store it before we start the computationof the matrix-vector product. The vectors x(α) and v1 (α) can be computed fromcp (α). Note that if the multiplication is to be done with several vectors, the FFT ofu(α) only needs to be computed once. This reduces the number of FFTs to two eachtime, which naturally speeds up the multiplication even further.This technique will eventually stop working since the entries will be of very different magnitudes with very large number p even by utilizing this technique. We suggesta technique to address the stability problem in case of high precision requirementsand for large p. In this case the corresponding Toeplitz matrix could be subdividedinto blocks of smaller Toeplitz matrices, each of which consists elements of similarmagnitude, therefore the above method could be used on each block to stabilize thecomputation with the same complexity. Furthermore, if we view each Toeplitz blockas one entry in the whole matrix, the whole matrix is a Toeplitz matrix again as thefollowing, . · · · A0 A 1 A 2 A 3 A 4 A 5 · · · · · · A1 A0 A 1 A 2 A 3 A 4 · · · · · · A2 A1A0 A 1 A 2 A 3 · · · (3.25) · · · A3 A2A1A0 A 1 A 2 · · · · · · A4 A3 A2A1A0 A 1 · · · · · · A5 A4AAA1A0 · · · 32 .α where {Ai , i ., 2, 1, 0, 1, 2.} are Toeplitz matrices. Therefore the fast algorithmcould be applied to each of the individual matrix as well as to the whole matrix.Since for the transpose matrix P U of the lower triangular Pascal matrix and thesymmetric Pascal matrix P S, the corresponding decompositions are very similar, theprocess to introduce the parameter α is the same. For the matrix, P U , we haveP U (α) diag(v3 ) · T oep [cp , rp 0 ] · diag(v4 ),wherev3 (α) hh1 α1α223 α6.αp 1(p 1)! α63 . (p 1)!αp 1 0cp (α) 1 0 0 · · · 0 ,ih23αp 1rp (α) 1 α1 α2 α6 . (p 1)!;v4 (α) 1 α12α214i0i0(3.26),(3.27),(3.28)(3.29)(3.30)

p69121518212427303336Naive algorithm2.5352e 0164.5658e 0151.2296e 0121.4068e 0104.3558e 0087.2048e 0050.2388262.467.0563e 0058.4567e 0088.6394e 012Stabilized algorithm1.8608e 0165.0705e 0161.3944e 0152.3761e 0151.2296e 0144.9564e 0141.4088e 0132.5018e 0133.8519e 0132.0082e 0126.9394e 012Stabilizing α1.1522.23.93.954.75.15.86.46.157.1α from Eq. .88Table 4.1Accuracy comparison of the naive algorithm and the stabilized algorithm. The initial guess isseen to overestimate the correct value.and for the matrix, P S, we haveP S(α

the following resume of fast matrix-vector product algorithms for these matrices. 1.1. Fourier matrices. The most important class of matrices in all fast algo-rithms are the Fourier matrices. Definition 1.1. A Fourier matri

Related Documents:

CONTENTS CONTENTS Notation and Nomenclature A Matrix A ij Matrix indexed for some purpose A i Matrix indexed for some purpose Aij Matrix indexed for some purpose An Matrix indexed for some purpose or The n.th power of a square matrix A 1 The inverse matrix of the matrix A A The pseudo inverse matrix of the matrix A (see Sec. 3.6) A1 2 The square root of a matrix (if unique), not elementwise

A Matrix A ij Matrix indexed for some purpose A i Matrix indexed for some purpose Aij Matrix indexed for some purpose An Matrix indexed for some purpose or The n.th power of a square matrix A 1 The inverse matrix of the matrix A A The pseudo inverse matrix of the matrix A (see Sec. 3.6) A1/2 The square root of a matrix (if unique), not .

CONTENTS CONTENTS Notation and Nomenclature A Matrix Aij Matrix indexed for some purpose Ai Matrix indexed for some purpose Aij Matrix indexed for some purpose An Matrix indexed for some purpose or The n.th power of a square matrix A 1 The inverse matrix of the matrix A A The pseudo inverse matrix of the matrix A (see Sec. 3.6) A1/2 The square root of a matrix (if unique), not elementwise

CONTENTS CONTENTS Notation and Nomenclature A Matrix A ij Matrix indexed for some purpose A i Matrix indexed for some purpose Aij Matrix indexed for some purpose An Matrix indexed for some purpose or The n.th power of a square matrix A 1 The inverse matrix of the matrix A A The pseudo inverse matrix of the matrix A (see Sec. 3.6) A1 2 The sq

Further Maths Matrix Summary 1 Further Maths Matrix Summary A matrix is a rectangular array of numbers arranged in rows and columns. The numbers in a matrix are called the elements of the matrix. The order of a matrix is the number of rows and columns in the matrix. Example 1 [is a ] 3 by 2 or matrix as it has 3 rows and 2 columns. Matrices are .

GPU Virtualization with Virtual Compute Server NVIDIA Virtual Compute Server enables the benefits of VMware virtualization for GPU-accelerated PowerEdge servers. With Virtual Compute Server, data center admins are able to power compute-intensive workloads with GPUs in a virtual machine (VM). Virtual Compute Server software virtualizes NVIDIA .

25 JAN 2016 API Version 17.0 vCloud Air Compute Service not supported N/A API Version 16.0 vCloud Air Compute Service not supported 15 DEC 2015 API Version 14.0 vCloud Air Compute Service not supported 21 SEP 2015 API Version 13.0 vCloud Air Compute Service not supported 22 AUG 2015 API Version 12.0 vCloud Air Compute Service not supported

Fundamentals; Harmony; Jazz, Pop, and Contemporary Music Theory (including Twentieth-Century Music); and Form in Music. The format for each volume is consistent: 1. The left column lists terms to help you organize your study and find topics quickly. 2. Bold indicates key concepts. 3. Each volume ends with a Remember-Forever Review and More