Default Bayesian Analysis For Hierarchical Spatial .

3y ago
27 Views
2 Downloads
235.80 KB
23 Pages
Last View : 24d ago
Last Download : 3m ago
Upload by : Amalia Wilborn
Transcription

Technical Report RM 680, Department of Statistics and Probability, Michigan State UniversityDefault Bayesian Analysis for HierarchicalSpatial Multivariate ModelsSarat C. Dass, Chae Young Lim and Tapabrata Maiti Department of Statistics & ProbabilityMichigan State University, East Lansing, MI 48824Email: {sdass,lim,maiti}@stt.msu.eduAbstractIn recent years, multivariate spatial models have been proven to be an effectivetool for analyzing spatially related multidimensional data arising from a common underlying spatial process. Currently, the Bayesian analysis is perhaps the only solutionavailable in this framework where prior selection plays an important role in the inference. The present article contributes towards the development of Bayesian inferentialmethodology for hierarchical spatial multivariate generalized linear mixed models. Thetwo main contributions of this article are the development of a shrinkage-type defaultprior and innovative computational techniques for the Gibbs sampling implementation.The default prior elicitation is non-informative but results in a proper posterior on therelated parameter spaces. This elicitation not only provides robust inference (withrespect to prior choice), but also provides improved estimation. In the computationalstep, we have developed a transformation of the parameters that avoids sampling fromrestricted domains, thus providing more stability and efficiency in the Gibbs implementation. The methodology has been extended to the case of missing responses in themulti-dimensional setup. Both simulations and real examples are provided to validateand illustrate the proposed methodology.Keywords: Generalized linear mixed models, conditional autoregressive models, default Bayesian analysis, health disparity Authors’ names are in alphabetical order.1

1IntroductionMany spatial problems, particularly those concerning environmental, health and socio-economicvariables, are inherently multivariate, meaning that two or more such variables are recordedat each spatial location simultaneously. Multivariate spatial analysis is becoming more relevant and prevalent with the advent of geographic information systems (GIS) that allow thedisplay of spatial data at varying spatial resolutions. Sain and Cressie (2007) viewed thedevelopments of spatial analysis in two main categories: models for geostatistical data (thatis, the indices of data points belong in a continuous set) and models for lattice data (datawith indices in a discrete or countable set), while specifically mentioning that the latter isnot as developed.The issue of “health disparity” is central to the distribution of federal and state aidbased on socio-economic indicators. Health disparity studies analyze how health status ofindividuals vary across various socio-economic groups and spatial locations, in particular inrelation to a specific disease. Multiple response variables are available as indicators of healthstatus, and as a result, models for multivariate spatial lattice data are an indispensable toolfor analyzing health disparity data. Recently, Sain and Cressie (2007), Kim et al. (2001),Gelfand and Vounatsou (2003), Jin et al. (2005) explored multivariate spatial models forlattice data, adopting the Bayesian framework as the natural inferential approach. The onlyexception, Sain (2009) developed the maximum likelihood estimation procedure for a specialcase, namely the multivariate conditional autoregressive (CAR) normal model of Sain andCressie (2007).Although the Bayesian inferential framework is a natural choice for spatial lattice data,one obvious obstacle is the choice of prior distribution for the model parameters. All previousworks mentioned above are based on standard subjective, and at best, vague priors to accountfor the lack of subjective knowledge. Subjective specification of priors have the obviousdrawback of introducing bias in the estimation procedure, the extent of which may not beeasy to gauge in applications. A default, or non-informative prior, is therefore preferable forthe Bayesian approach. However, in the case of hierarchical generalized linear models as inthis paper, putting non-informative priors (i.e., improper priors) result in the posterior beingimproper. This led Natarajan and Kass (2000) to develop default priors in the context ofgeneralized linear mixed models (GLMM) for which the posterior can be established to beproper.A generalized linear mixed model (GLMM) with a multivariate Gaussian CAR model canbe viewed as a special case of a general random effects model with specific restrictions on thestructure of the covariance matrix. The Natarajan-Kass prior neither takes into account thespecial structure of the covariance matrix nor considers the dimension reduction capabilities2

of a CAR model in the spatial case. In fact, some simple modifications greatly reduce thenumber of parameters while maintaining flexibility in modeling (Sain and Cressie, 2007).This work is motivated from an application of joint mapping of lung cancer mortalityincidence and poverty for all counties in the state of Michigan. The questions we seek toaddress are two-fold: First, whether health and socio-economic disparities are correlated, andsecond, if so, are the correlations stronger in certain spatial locations compared to others.In this paper, we develop a new default prior for the multivariate spatial lattice data in thecontext of spatial multivariate GLMM. The standard analysis using subjective and vaguepriors indicates significant prior sensitivity compared to the proposed prior, which justifiesthe use of the latter in real applications. Our Bayesian computational scheme is differentfrom previous approaches. We adopt a new parametrization of the multivariate CAR modelbased on the Cholesky and spectral decompositions of matrices. The enormous advantagegained through this re-parametrization is the removal of constraints on the parameter spaceinduced by the positive definiteness requirement on the inverse covariance matrix. TheBayesian inferential procedure is illustrated through simulation and an application based onSEER data for the state of Michigan.The rest of the article is organized as follows: Section 2 develops the multivariate GLMMmodel in the spatial context. Section 3 discusses the motivation for developing the proposeddefault prior by taking the spatial information into account. The resulting posterior distribution is shown to be proper in this section for both complete and missing data cases. TheGibbs steps are outlined in Section 4 whereas Section 5 gives the numerical findings for bothsimulated and real data. This is followed by a brief discussion summarizing our findings inSection 6 and the Appendix.2Multivariate Generalized Linear Mixed ModelsIn what follows, we assume that there are n distinct sites on a spatial domain where observations on p variables are recorded. Let the multivariate data consist of the p-dimensionalrandom vector yj (y1j , y2j , · · · , ypj )0 for the j-th site, for j 1, 2, · · · , n. Correspondingto the response yij , denote by xij (xij1 , xij2 , xijqi )0 to be the qi 1 vector of explanatoryvariables. The following two stage hierarchical model is considered for the distribution ofthe np 1 vector of all observables y (y10 , y20 , · · · , yn0 )0 : In the first stage of the hierarchy,the variables yij are independent with densityfij (y ηij ) C(y) exp{ ηij y hi (ηij )},3(1)

where fij belongs to an exponential family of densities with canonical parameter ηij , andhi (ηij ) is the normalizing constant satisfyingZexp{ hi (ηij ) } C(y) exp{ ηij y } dy.It can be shown (see, for example, McCullagh and Nelder (1989)) that hi (·) is a differentiablefunction with inverse h 1i . In the second stage, the canonical parameter ηij is related to xijvia a link function in the usual generalized linear model set-up,ηij x0ij βi ²ij(2)where βi is a qi 1 vector of regression coefficients and ²ij are error random variables. Thehierarchical specification is completed by eliciting the distribution for the error component²ij s, namely, ² Nnp (0, D) where ²j (²1j , ²2j , · · · , ²pj )0 is the p 1 error vector at the j-thspatial site, ² (²01 , ²02 , · · · , ²0n )0 is the np 1 vector of all the error variables and D is thecovariance matrix (of dimension np np ) of ². Such models based on an unstructured Dare called a random-effects GLMs or generalized linear mixed models (GLMMs), and hasbeen the subject of many theoretical as well as practical studies in recent years.2.1Multivariate Gaussian CARIn the spatial context, the distribution of ², and hence D, can be given a more concretestructure based on neighboring dependencies. Following Mardia (1988), we define the multivariate Gaussian CAR model as follows: For some p p matrices Γj and Λjk , suppose thatthe vector ²j is p-variate Gaussian withE(²j ² j ) XΛjk ²k ,and V ar(²j ² j ) Γj ,for j 1, 2, · · · , n,(3)k Njwhere ² j { ²k : k Nj } denotes the rest of the ² at all locations k in the neighborhoodNj . The existence of the joint distribution for ² is guaranteed by the symmetry and positivedefiniteness of D which, respectively, translates toΛjk Γk Γj Λ0kj(4)for all pairs (j, k), and Block( Γ 1j Λjk ) is positive definite with Λjj I. Then, fromMardia (1988), ² is Nnp (0, D) with 1D {Block( Γ 1j Λjk )} .4(5)

We introduce some more notation here relevant to the spatial context. The weight matrixW ((wjk )) (of dimension n n ) consists of entries(1 if j and k are neighbors, andwjk (6)0, otherwise.Pwith wjj 0 for j, k 1, 2, · · · , n. For each site j, define wj k Nj wjk to be the sumthat represents the total number of neighbors of site j, and M to be the n n diagonalmatrixM diag(w1 , w2 , · · · , wn ).(7)It is convenient both computationally as well as for practical analysis to represent all theΓj and Λjk in terms of two common p p matrices Γ and H, respectively. The first stageof parametrization is to takeΓj Γwj and Λjk wjk· H.wj (8)The first part of equation (8) entails a common covariance matrix Γ for all sites j, rescaled 1by the weight factor wj . It follows that sites with a larger number of neighbors will havelower variability, which is reasonable to expect. The second part of equation (8) entails acommon set of regression coefficients H rescaled by the inverse of the number of neighborswj . The matrix Γ must be positive definite since Γj represents the covariance matrix of ²jgiven ² j . Substituting (8) in (4), the symmetry requirement of (4) is equivalent toHΓ ΓH 0(9)or, in other words, F Γ 1/2 HΓ1/2 should be symmetric, where Γ1/2 is the (unique) squareroot matrix of Γ and Γ 1/2 is its inverse.With the above re-parametrization, the distribution of ² is multivariate normal withmean 0 and covariance matrix D, which is now a function of Γ and F only. In fact, theinverse of D has the expressionD 1 (I n Γ 1/2 )(M I p W F )(I n Γ 1/2 )(10)where A B represents the Kronecker product of two matrices A and B. To ensure thatD 1 is positive definite, we state the following theorem:Theorem 1. Let λ1 , λ2 , · · · , λp denote the eigenvalues of F Γ 1/2 HΓ1/2 . The covariancematrix D is positive definite if 1 λk 1 for all k 1, 2, · · · , p.5

The reader is referred to the Appendix for the proof. The concatenated vector of all ηij s atthe j-th spatial location is denoted by η j (η1j , η2j , · · · , ηpj )0 . Further, η (η 01 , η 02 , · · · , η 0n )0represents the np 1 vector of all ηij variables. Since ² Nnp (0, D), it follows thatη Nnp (Xβ, D)(11)where β (β10 , β20 , · · · , βp0 )0 is the q 1 concatenated vector of all regression coefficients withPq pi 1 qi , and X is the p n q design matrix given by X1x01j 0 · · · 0 X2 0 x02j · · · 0 (12),withXj X . . . . . . Xn (np q)00 · · · x0pj (p q)denoting the design matrix at the j-th spatial location for j 1, 2, · · · , n.2.2Handling Partial and Missing ObservationsIn some applications, part of the y observations can be either missing or partially observed,thus requiring the development of spatial hierarchical GLMMs for partially observed datastructures. The sets M, P and C represent all pairs (i, j) where yij is either missing,partially observed or completely observed, respectively. Given η, the conditional likelihoodcontribution corresponding to the partially observed data, Dobs say, can be written as theproduct of two componentsYYFij (Pij ηij )(13)fij (yij ηij ) (Dobs η) (i,j) P(i,j) Ctaken over the sets C and P, withZFij (Pij ηij ) fij (yij ηij ) dyij(14)yij Pijdenoting the contribution arising from the partial information that yij belongs to the set Pij .To write down the (unconditional) likelihood for the hierarchical GLMM, we integrate overthe distribution of η in (11):Z (Dobs β, F , Γ) (Dobs η) f0 (η β, F , Γ) dη(15)ηwhere f0 is the distribution of η given in (11). Examples of partially observed data arecommon in rare diseases. For example, when mapping cancer incidences, certain counties donot report the exact number of incidences if the total number is less than a known threshold.In this case the partial information is Pij { yij τ } where τ is the known threshold.6

3Default Prior ElicitationThis section discusses the appropriate default priors on the model parameters β, H andΓ. Since each βi represents the regression effects to the mean of the observations yij , it isnatural to elicit a standard “flat” non-informative prior on each βi for i 1, 2, · · · , p:πN (β) 1(16)It is also natural to consider the Jeffrey’s type non-informative prior on Γ of the formπJ (Γ) (det(Γ)) (p 1)/2 . We stateTheorem 2. Let π0 (H) be a proper prior on H. The default prior specificationπ0 (β, H, Γ) πN (β) πJ (Γ) π0 (H)(17)gives a posterior distribution on the parameters that is improper.Proof: We refer the reader to a proof in the Appendix. The consequence of Theorem 2is that new motivation is required for the development of a default prior on Γ, which shouldmake the posterior proper. We discuss the development of such a prior in the subsequentparagraphs.Our justification for the default prior comes from looking at the conditional update of eachηij given the data yij and the rest of the η elements (excluding ηij ). In the normal-normalcase, one can explicitly derive an expression for the weights that represent the contributionof the data, yij , and the rest of the η to the conditional mean of ηij . However, in thecase of non-conjugate GLMMs, it is not possible to obtain a closed form expression for theweights. An approximate approach can be considered based on a quadratic expansion ofthe exponential family pdf to yield a similar analysis as in the normal-normal model. Weconsider the prediction of the vector η j given y j and η j (which are the rest of the ηij sexcluding the ones at site j). The GLM approach to estimating η j is iterative and expandsthe function hi s in (1) around the current estimate η j . From Taylor’s expansion, we have1(1)(2)hi (ηij ) hi (ηij ) (ηij ηij )hi (ηij ) (ηij ηij )2 hi (ηij ).(18)2Using the expansion above, the exponential family of densities can be expanded around η jsimilarly aspYi 1pexp{ ηij yij hi (ηij ) }¾1 2 (2) (ηij ηij ) hi (ηij ) (ηij exp ηij yij 2i 1½¾1(1)(2)0 0 exp (η j η j ) H j (η j η j ) η j (y j hj ) ,2Y½(1)ηij )hi (ηij )hi (ηij )7

where(1)hj³ (1) (1) h1 (η1j), h2 (η2j), · · · 0(1) , hp (ηpj)is the p 1 of first derivatives, and³ (2)(2) (2) H j diag h1 (η1j), h2 (η2j), · · · , h(2)(η)ppj(19)is the diagonal matrix of all second derivatives of hi , i 1, 2, · · · , p evaluated at η j . TheTaylor’s expansion above allows us to revert back to the normal-normal case where by completing squares, the observational part for η j can be derived as¾½1(2) 0 exp (η j η i (y i )) H j (η j η i (y i ))2η i (y i )³(2)Hj 1(1)(2)with (y j hj Hj η i ). The contribution from the multivariate Gaussian CAR prior in (11) (i.e., the rest of the η elements) ison wj 0 1CAR)exp )(Γ)(η η(η j η CARjjj2Pwijwhere ηjCAR H k Nj wj η k . Combining the likelihood and prior parts, the conditionalmean of η j (again by completion of squares) turns out to be(2)(2)(2)(H j Γ 1 wj ) 1 H j η j (y j ) (H j Γ 1 wj ) 1 Γ 1 wj η CARj(20)with (matrix) weights(2)(2)W 1j (H j Γ 1 wj ) 1 H jand W 2j I W 1j(21)corresponding to the direct estimate y j and the population mean. Our proposal is to inducea prior on Γ such that the prior onà (2)! 1(2)³ 1HjHj(2)(2) 1 1W 1j H j Γ wj Hj Γwj wj is uniform. A similar technique was used by Daniels (1998) and Natarajan and Kass (2000)in the univariate non-spatial context. Since W 1j varies with j, we first replace it by itsaverage across all the sites. Thus, we setà (2) !Hj1 Xw0 .tracenp jwj (2)Substitutingdefined byHjwj by its average w0 I p in the expression of W1j above, we get the matrix U¡ 1U w0 I p Γ 1w0 I p ( w0 Γ I p ) 1 w0 Γ8

Note that 0 U I p in terms of positive definiteness with det(U ) representing the volumeof the weight matrix U . We seek a prior on Γ that is induced by a default prior on det(U ).The class of multivariate Beta distribution given byf (U a, b) C(det(U ))a (p 1)/2 (det(I p U ))b (p 1)/2with a (p 1)/2 and b (p 1)/2 forms a class of prior for U . The uniform prior onthe weight matrix U is obtained by setting a b (p 1)/2. The resulting prior on Γcorresponding to uniform volume of U isπU V (Γ) det (I p w0 Γ) (p 1) .(22)This is also the prior developed in Natarajan and Kass (2000) leading to shrinkage-typeestimators in the non-spatial context. The uniform volume prior is proper from Theorem 2of Natarajan and Kass (2000).The prior on H is induced via F . The prior on F is taken to be independent of Γ andis elicited as follows: Writing the spectral decomposition of F asF QΛQ0 ,(23)we put a uniform prior on Q and in view of Theorem 1, we put a uniform prior U ( 1, 1)on the eigenvalues in Λ. The default prior on (β, F , Γ) is thusπ0 (β, F , Γ) πN (β) πU V (Γ) 1.2p(24)For each i 1, 2, · · · , p, the design matrix corresponding to the i-th response variable is then qi matrix X̃ i (xi1 , xi2 , · · · , xin )0 . The submatrix X̃ Ci is formed by taking all rows jof X̃ i for which (i, j) C. We stateTheorem 3. Assume that fij and Fij in (13) are bounded above by a constant independentof ηij for each pair (i, j). Using the default prior (24), the posterior is proper if there existsqi linearly independent row vectors in X̃ Ci for each i 1, 2, · · · , p such that!Z Z Z Z ÃYpqiYfij (yij ηij ) f0 (η β, F , Γ) dη π0 (β, F , Γ) dβ dF dΓ , (25)ΓFβηi 1 j 1where f0 and π0 is, respectively, the distribution of η and the prior, as given in (11) and(24).Remark 1: Under the assumptions of Theorem 3, 0 fij A and 0 Fij B, say. Inour applications, fij is taken to be either a Poisson pmf or a normal pdf. For the Poisson9

(or, generally for a discrete distribution), it easily follows that the bounds A B 1,independent of i and j. When fij is normal with mean ηij and fixed standard deviation σ0 , it follows that A 1/ 2πσ0 and B 1, thus independent of i and j again. Generally fordensities, the bound A needs to be established on a case-by-case basis.Remark 2: Theorem 3 shows that propriety can be achieved if there are at least qi sites onthe lattice for which the yij s are completely observed. The only further check that we haveto perform is to see if the design matrix corresponding to those sites X̃ Ci is non-singular.This is a requirement for each i, so the conditions of Theorem 3 can be checked separatelyfor each i 1, 2, · · · , p.4Bayesian InferenceThe Gibbs sampler is a natural method for posterior simulations in the case of GLMMs,and is also the method utilized for our spatial models. A slightly differ

Default Bayesian Analysis for Hierarchical Spatial Multivariate Models . display of spatial data at varying spatial resolutions. Sain and Cressie (2007) viewed the developments of spatial analysis in two main categories: models for geostatistical data (that is, the indices of data points belong in a continuous set) and models for lattice data .

Related Documents:

Bruksanvisning för bilstereo . Bruksanvisning for bilstereo . Instrukcja obsługi samochodowego odtwarzacza stereo . Operating Instructions for Car Stereo . 610-104 . SV . Bruksanvisning i original

example uses a hierarchical extension of a cognitive process model to examine individual differences in attention allocation of people who have eating disorders. We conclude by discussing Bayesian model comparison as a case of hierarchical modeling. Key Words: Bayesian statistics, Bayesian data a

Although hierarchical Bayesian models for spatio-temporal dynamical problems such as pop-ulation spread are relatively easy to specify, there are a number of complicating issues. First and foremost is the issue of computation. Hierarchical Bayesian models are most often implemented with Markov Chain Monte Carlo (MCMC) methods.

10 tips och tricks för att lyckas med ert sap-projekt 20 SAPSANYTT 2/2015 De flesta projektledare känner säkert till Cobb’s paradox. Martin Cobb verkade som CIO för sekretariatet för Treasury Board of Canada 1995 då han ställde frågan

service i Norge och Finland drivs inom ramen för ett enskilt företag (NRK. 1 och Yleisradio), fin ns det i Sverige tre: Ett för tv (Sveriges Television , SVT ), ett för radio (Sveriges Radio , SR ) och ett för utbildnings program (Sveriges Utbildningsradio, UR, vilket till följd av sin begränsade storlek inte återfinns bland de 25 största

Hotell För hotell anges de tre klasserna A/B, C och D. Det betyder att den "normala" standarden C är acceptabel men att motiven för en högre standard är starka. Ljudklass C motsvarar de tidigare normkraven för hotell, ljudklass A/B motsvarar kraven för moderna hotell med hög standard och ljudklass D kan användas vid

LÄS NOGGRANT FÖLJANDE VILLKOR FÖR APPLE DEVELOPER PROGRAM LICENCE . Apple Developer Program License Agreement Syfte Du vill använda Apple-mjukvara (enligt definitionen nedan) för att utveckla en eller flera Applikationer (enligt definitionen nedan) för Apple-märkta produkter. . Applikationer som utvecklas för iOS-produkter, Apple .

ARCHAEOLOGICAL ILLUSTRATION 13 HOME PAGE WHY DRAW? EQUIPMENT START HERE: TECHNIQUES HOW TO DRAW MORE ACTIVITIES LINKS Drawing pottery The general aim when drawing pottery is not only to produce an accurate, measured drawing but also to show the type of pot. Sh ape (or form) and decoration are therefore important. Many illustrators now include extra information to show how a pot was .