Nonparametric Bayesian Data Analysis

2y ago

36 Views

2 Downloads

234.14 KB

32 Pages

Last View : 17d ago

Last Download : 3m ago

Upload by : Camille Dion

Report this link

Download PDF

Transcription

Nonparametric Bayesian Data AnalysisPeter Müller Fernando A. Quintana†Abstract. We review the current state of nonparametric Bayesian inference. The discussion follows a list of important statistical inference problems, including density estimation,regression, survival analysis, hierarchical models and model validation. For each inferenceproblem we review relevant nonparametric Bayesian models and approaches including Dirichlet process (DP) models and variations, Polya trees, wavelet based models, neural networkmodels, spline regression, CART, dependent DP models, and model validation with DP andPolya tree extensions of parametric models.1INTRODUCTIONNonparametric Bayesian inference is an oxymoron and misnomer. Bayesian inference bydefinition always requires a well defined probability model for observable data y and anyother unknown quantities θ, i.e., parameters. Nonparametric Bayesian inference traditionally refers to Bayesian methods that result in inference comparable to classical nonparametricinference, like kernel density estimation, scatterplot smoothers, etc. Such flexible inferenceis typically achieved by models with massively many parameters. In fact, a commonly usedtechnical definition of nonparametric Bayesian models are probability models with infinitelymany parameters (Bernardo and Smith 1994). Equivalently, nonparametric Bayesian models are probability models on function spaces. Nonparametric Bayesian models are used toavoid critical dependence on parametric assumptions, to robustify parametric models, andto define model diagnostics and sensitivity analysis for parametric models by embeddingthem in a larger encompassing nonparametric model. The latter two applications are technically simplified by the fact that many nonparametric models allow to center the probabilitydistribution at a given parametric model. Department of Biostatistics, Box 447, University of Texas M. D. Anderson Cancer Center, Houston, TX77030-4009, USA. e-mail: pm@odin.mdacc.tmc.edu†Departamento de Estadı́stica, Pontificia Universidad Católica de Chile, Casilla 306, Santiago 22, CHILE.e-mail: quintana@mat.puc.cl. Partially supported by grant FONDECYT 1020712. First author supportedby NIH/NCI under grant NIH R01CA75981.1

In this article we review the current state of Bayesian nonparametric inference. Thediscussion follows a list of important statistical inference problems, including density estimation, regression, survival analysis, hierarchical models and model validation. The list isnot exhaustive. In particular, we will not discuss nonparametric Bayesian approaches intime series analysis, and in spatial and spatio-temporal inference.Other recent surveys of nonparametric Bayesian models appear in Walker et al. (1999)and Dey et al. (1998). Nonparametric models based on Dirichlet process mixtures are reviewed in MacEachern and Müller (2000). A recent review of nonparametric Bayesian inference in survival analysis can be found in Sinha and Dey (1997).2DENSITY ESTIMATIONiidThe density estimation problem starts with a random sample xi F (xi ), i 1, . . . , n,generated from some unknown distribution F . A Bayesian approach to this problem requiresa probability model for the unknown F . Traditional parametric inference considers modelsthat can be indexed by a finite dimensional parameter, for example, the mean and covariancematrix of a multivariate normal distribution of the appropriate dimension. In many cases,however, constraining inference to a specific parametric form may limit the scope and typeof inferences that can be drawn from such models. In contrast, under a nonparametricperspective we consider a prior probability model p(F ) for the unknown density F , forF in some infinite dimensional function space. This requires the definition of probabilitymeasures on a collection of distribution functions. Such probability measures are genericallyreferred to as random probability measures (RPM). Ferguson (1973) states two importantdesirable properties for this class of measures (see also Antoniak 1974): (I) their supportshould be large and (II) posterior inference should be “analytically manageable.” In theparametric case, the development of MCMC methods (see, e.g. Gelfand and Smith 1990)allows to largely overcome the restrictions posed by (II). In the nonparametric context,however, computational aspects are still the subject of much research.We next describe some of the most common random probability measures adopted in theliterature.2.1The Dirichlet ProcessMotivated by properties (I) and (II), Ferguson (1973) introduced the Dirichlet process (DP)as an RPM. A random probability distribution F is generated by a DP if for any partitionA1 , . . . , Ak of the sample space the vector of random probabilities F (Ai ) follows a Dirichletdistribution: (F (A1 ), . . . , F (Ak )) D(M · F0 (A1 ), . . . , M · F0 (Ak )). We denote this byF D(M, F0 ). Two parameters need to be specified: the weight parameter M , and the2

base measure F0 . The base measure F0 defines the expectation, E(B) F0 (B), and M is aprecision parameter that defines variance. For more discussion of the role of these parametersee Walker et al. (1999). A fundamental motivation for the DP construction is the simplicityof posterior updating. Assumeiidx1 , . . . , xn F F,andF D(M, F0 ).(1)Let δx (·) denote a point mass at x. The posterior distribution is F x1 , . . . , xn D(M n, F1 )Pwith F1 F0 ni 1 δxi .More properties of the DP are discussed, among others, in Ferguson (1973), Korwar andHollander (1973), Antoniak (1974), Diaconis and Freedman (1986), Rolin (1992), Diaconisand Kemperman (1996) and in Cifarelli and Melilli (2000). Of special relevance for computational purposes is the Polya urn representation by Blackwell and MacQueen (1973).Another very useful result is the construction by Sethurman (1994): Any F D(M, F0 ) canbe represented as XF (·) wh δµh (·),h 1iidµh F0 and wh UhYiid(1 Uj ) with Uh Beta(1, M )(2)j hIn words, realizations of the DP can be represented as infinite mixtures of point masses.The locations µh of the point masses are a sample from F0 , and the random weights wh aregenerated by a “stick-breaking” procedure. In particular, the DP is an almost surely (a.s.)discrete RPM.The DP is by far the most popular nonparametric model in the literature (for a recentreview, see MacEachern and Müller 2000). However, the a.s. discreteness is in many applications inappropriate. A simple extension to remove the constraint to discrete measures isto introduce an additional convolution, representing the RPM F asZF (x) f (x θ)dG(θ)withG D(M, G0 ).(3)Such models are known as DP mixtures (MDP) (Escobar 1988, MacEachern 1994, Escobarand West 1995). Using a Gaussian kernel, f (x µ, S) φµ,S (x) exp[ (x µ)T S 1 (x µ)/2],and mixing with respect to θ (µ, S) we obtain density estimates resembling traditionalkernel density estimation. Related models have been studied in Lo (1984), Escobar andWest (1995) and in Gasparini (1996). Posterior consistency is discussed in Ghosal, Ghoshand Ramamoorthi (1999).Posterior inference in MDP models is based on MCMC posterior simulation. Most approaches proceed by breaking the mixture in (3) with the introduction of latent variables θi3

as xi θi f (x θ) and θi G. Efficient MCMC simulation for general MDP models is discussed, among others, in Bush and MacEachern (1996), MacEachern and Müller (1998), Neal(2000) and West, Müller and Escobar (1994). For related algorithms in a more general setting, see Ishwaran and James (2001). Alternatively to MCMC simulation, sequential importance sampling-based methods have been proposed for MDP models. Examples can befound in Liu (1996), Quintana (1998), MacEachern, Clyde and Liu (1999), Ishwaran andTakahara (2002) and references therein. A third class of methods for MDP models, calledthe predictive recursion, was proposed by Newton and Zhang (1999). Consider the posdefterior predictive distribution in model (3). Let Fn (B) E(F (B) x1 , . . . , xn ) denote theposterior mean of the RPM. The posterior mean is identical to the predictive distribution,Fn (B) P (θn 1 B x1 , . . . , xn ) for any Borel set B in the appropriate space. The Polyaurn representation impliesF1 (B) 1MF0 (B) P (θ1 B x1 ).M 1M 1Newton and Zhang (1999) extrapolate this representation to a recursion in the general case:Fi (B) (1 wi )Fi 1 (B) wi Pi 1 (θi B xi ),(4)where the probability in the second term in the right-hand side of (4) is computed underthe current approximation Fi 1 , and the nominal values for the weights are wi 1/(M i),i 1. The approximation is exact for i 1. In general Fn (B) depends on the order inwhich x1 , . . . , xn are processed, but this dependence is rather week, and in practice, it isrecommended to average over a number of permutations of the data. The method is veryfast to execute and produces very good approximations, although it tends to over-smooththe results. For a comparison of the computational strategies mentioned here, see Quintanaand Newton (2000).Model (1) has the advantage of the conjugate form. However, getting exact draws from aDP is impossible because this requires the generation of an infinite mixture of point masses.Typical MCMC schemes are based on integrating out the DP via Blackwell and MacQueen’s(1973) representation. This makes it difficult to produce inference on functionals of theposterior DP. A similar problem is found in the more general MDP models. Some authorspropose MCMC strategies where, instead of integrating out the DP, an approximation toPthe DP is considered. This is usually done by drawing from Nh 1 wh δµh (·) for large enoughN . Examples of this strategy can be found in Muliere and Tardella (1998), Ishwaran andJames (2002), Kottas and Gelfand (2001), and Gelfand and Kottas (2002).4

2.2Other Discrete Random Probability MeasuresAn interesting extension of the DP that has been used in the context of density estimationis the invariant DP introduced by Dalal (1979). The idea is to define a prior process onthe space of distribution functions that have a structure that can be characterized via invariance, for example, symmetry or exchangeability. Dalal’s (1979) construction is based oninvariance under a finite group, essentially by restricting Ferguson’s (1973) definition to invariant centering measures and partitions. This guarantees that the posterior process is alsoinvariant. Dalal (1979) uses this setup to estimate distribution functions that are symmetricwith respect to a known value µ, using F0 such that F0 (t) 1 F0 (2µ t) for all t µ andthe group G {g1 , g2 } where g1 (x) x and g2 (x) 2µ x.An alternative model to (1) or (3) is obtained by replacing the prior DP with a convenientapproximation. Natural candidates follow from truncating Sethurman’s (1994) constructionPPN(2). In this setup, the prior h 1 wh δµh (·) is replaced byh 1 wh δµh (·) for some appropriately chosen value of N . An example of this procedure is the -DP proposed by Muliere andTardella (1998), where N is chosen such that the total variation distance between the DPand the truncation is bounded by a given . Another variation is the Dirichlet-multinomialprocess introduced by Muliere and Secchi (1995). Here the RPM is, for some finite N ,F (·) NXwh δµh (·),h 1iid(w1 , . . . , wN ) D(M · N 1 , . . . , M · N 1 ) and µh F0 .More generally, Pitman (1996) described a class of modelsF (·) Xwh δµh (·) h 11 X!whF0 (·),(5)h 1iidwhere, for a continuous distribution F0 , we have µh F0 , assumed independent of thePnon-negative random variables wh . The weights wh are constrained by h 1 wh 1. Themodel is known as Species Sampling Model (SSM), with the interpretation of wh as therelative frequency of the h-th species in a list of species present in a certain population, andPµh as the tag assigned to that species. If h 1 wh 1 the SSM is called proper and thecorresponding prior RPM is discrete. The stick-breaking priors studied by Ishwaran andPJames (2001) are a special case of (5), adopting the form Nh 1 wh δµh (·), where 1 N .Qh 1The weights are defined as wh j 1 (1 Uj ) Uh with Uh Beta(ah , bh ), independently,for a given sequences (a1 , a2 , . . .) and (b1 , b2 , . . .). Stick-breaking priors are quite general,including not only the Dirichlet-multinomial process and the DP as special cases, but alsoa two-parameter DP extension, known as the Pitman-Yor process (Pitman and Yor 1987),5

and the beta two-parameter process (Ishwaran and Zarepour 2000). Additional examplesand MCMC implementation details for stick-breaking RPMs can be found in Ishwaran andJames (2001). Further discussion on SSMs appears in Pitman (1996) and Ishwaran andJames (2003).An interesting property of MDP models is that any exchangeable sequence of randomvariables can be well approximated in the sense of the Prokhorov metric by a certain sequenceof mixtures of DPs (Regazzini 1999). In practice, however, this result has limited use. Wereview next some methods for defining RPMs supported on the set of continuous distributionsthat have been used in density estimation problems.2.3Polya TreesPolya trees (PT) are proposed in Lavine (1992, 1994) as a generalization of the DP. Likethe DP, the PT model satisfies conditions (I) and (II). The PT includes DP models as aspecial case. But in contrast to the DP, an appropriate choice of the PT parameters allowsto generate continuous distributions with probability 1. The definition requires a nestedsequence Π {πm , m 1, 2, . . .} of partitions of the sample space Ω. Without loss ofgenerality, we assume the partitions are binary. We start with a partition π1 {B0 , B1 } ofthe sample space, Ω B0 B1 , and continue with nested partitions defined by B0 B00 B01 ,B1 B10 B11 , etc. Thus the partition at level m is πm {B , 1 . . . m }, where areall binary sequences of length m. We say that F has a PT (prior) distribution, denoted byF PT(Π, A) if there is a sequence of nonnegative constants A {α } and independentrandom variables Y {Y } such that Y Beta(α 0 , α 1 ) and for every ( 1 , . . . , m ) andm 1 mmYYF (B 1 ,., m ) Y 1 ··· j 1 (1 Y 1 ··· j 1 ) .j 1; j 0j 1; j 1The type of models used for density estimation now replace the DP in (1) and (3) by thePT(Π, A) prior. For a description of samples from a PT prior, see Walker et al. (1999). Posterior consistency issues for density estimation using PT priors have been discussed in Barron,Schervish and Wasserman (1999).Polya trees have some practical limitations. First, the resulting RPM is dependent on thespecific partition adopted. Second, the fixed partitioning scheme results in discontinuitiesin the predictive distributions. Third, implementations for higher dimensional distributionsrequire extensive housekeeping and are impractical. To mitigate problems related to thediscontinuities Paddock et al. (2003) and Hanson and Johnson (2002) introduced randomizedPolya trees. The idea is based on dyadic rational partitions, but instead of taking the nominalhalf-point Paddock et al. (2003) randomly choose a “close” cutoff. This construction is6

shown to reduce the effect of the binary tree partition on the first two points noted above.On the other hand, Hanson and Johnson (2002) consider instead a mixture with respect to ahyperparameter that defines the partitioning tree. The problem concerning high dimensionpersists though.2.4Bernstein PolynomialsFor a distribution function F on the unit interval, the corresponding Bernstein polynomialis defined as kXk jB(x, k, F ) F (j/k) ·x (1 x)k j .jj 0A remarkable property of B(x, k, F ) is that it converges uniformly to F as k . Thedefinition for B(x, k, F ) takes the form of a mixture of Beta densities. Petrone(1999a, 1999b)exploits this property to propose a class of prior distributions on the set of densities definedon (0, 1]. Petrone and Wasserman (2002) consider the following model. Assume x1 , . . . , xnare conditionally i.i.d. given k and wk with common densityf (x k, wk ) kX wjkj 1k!(j 1)!(k j)! xj 1 (1 x)k j ,where k is the number of components in the mixture of Beta densities and the weightsPwk (w1k , . . . , wkk ) satisfy wjk 0 and kj 1 wjk 1. We call f a Bernstein polynomialdensity (BPD). The model is completed by assuming a prior distribution p(k) for k and adistribution Hk (·) given k on the (k 1)-dimensional simplex. Petrone (1999a) showed that ifp(k) 0 for all k 1 then every distribution on (0, 1] is the (weak) limit of some sequence ofBPD, and every continuous density on (0, 1] can be well approximated in the KolmogorovSmirnov distance by BPD. Petrone and Wasserman (2002) discuss MCMC strategies forfitting the above model and prove consistency of posterior density estimation under mildconditions. Rates of such convergence are given in Ghosal (2001).2.5Other Random DistributionsLenk (1988) introduces the logistic normal process. The construction of a logistic normalprocess starts with a Gaussian process Z(x) with mean function µ(x) and covariance function σ(x, y). The transformed process W exp(Z) is a lognormal process. Stopping theconstruction here, and defining a random density f (x) W would be impractical. Thelognormal process is not closed under prior to posterior updating, i.e., the posterior on fconditional on observing yi f , i 1, . . . , n is not proportional to a lognormal process.Instead Lenk (1988) proceeds by defining the generalized lognormal process LNX (µ, σ, ζ),7

defined essentially by weighting realizations under the lognormal process with the randomRintegral ( W dλ)ζ . Let f (x) V (x) for V LNX (µ, σ, ζ). The density f is said to belogistic normal process LN SX (µ, σ, ζ). The posterior on f , conditional on a random sampley f , is again a logistic normal process LN SX (µ , σ, ζ ). The updated parameters areµ (s) µ(s) σ(s, y) and ζ ζ 1.3REGRESSIONThe generic regression problem seeks to estimate an unknown mean function g(x) based ondata with i.i.d. measurement errors: yi g(xi ) i , i 1, . . . , n. Bayesian inference on gstarts with a prior probability model for the unknown function g. If restrictive parametricassumptions for g are inappropriate we are led to consider nonparametric Bayesian models.Many approaches proceed by considering some basis B {f1 , f2 , f3 , . . .} for an appropriatefunction space, like the space of square integrable functions. Typical examples are theFourier basis, wavelet bases, and spline bases. Given a chosen basis B, any function g canPbe represented as g(·) h bh fh (·). A random function g is parametrized by the sequenceθ (b1 , b2 , . . .) of basis coefficients. Assuming a prior probability model for θ we implicitlyput a prior probability model on the random function.3.1Spline ModelsA commonly used class of basis functions are splines, for example cubic regression splinesB {1, x, x2 , x3 , (x ξ1 )3 , . . . , (x ξT )3 }, where (x) max(x, 0) and ξ (ξ1 , . . . , ξT )is a set of knots. Together with a normal measurement error i N (0, σ) this defines anonparametric regression modelyi Xbh fh (xi ) i .(6)The model is completed with a prior p(ξ, c, σ) on the set of knots and corresponding coefficients. Smith and Kohn (1996), Denison, Mallick and Smith (1998b), and DiMatteo,Genovese and Kass (2001) are typical examples of such models. Approaches differ mainlyin the choice of priors and the implementation. Typically the prior is assumed to factorp(ξ, b, σ) p(ξ)p(σ)p(b σ). Smith and Kohn (1996) use the Zellner g-prior (Zellner 1986)for p(b). The prior covariance matrix V ar(b σ) is assumed to be proportional to (B 0 B) 1 ,where B is the design matrix for the given data set. Assuming a conjugate normal priorb N (0, cσ(B 0 B) 1 ) the conditional posterior mean E(b ξ, σ) is a simple linear shrinkageof the least squares estimate b̂. DiMatteo, Genovese and Kass (2001) use a unit-informationprior which is defined as a Zellner g-prior with the scalar c chosen such that the prior variance8

is equivalent to one observation. Denison et al. (1998b) prefer a ridge prior p(b) N (0, V )with V diag( , v, . . . , v).Posterior simulation in (6) is straightforward except for the computational challenge ofupdating ξ, the number and location of knots. This typically involves reversible jump MCMC(Green 1995). Denison et al. (1998a) propose “birth,” “death” and “move” proposals to add,delete and change knots from the currently imputed set ξ of knots. In the implementationof these moves it is important to marginalize with respect to the coefficients bh . In theconditionally conjugate setup with a normal prior p(b σ) the marginal posterior p(ξ σ, y)can be evaluated analytically. DiMatteo et al. (2001) propose an approximate evaluation ofthe relevant Bayes factors based on BIC (Bayesian information criterion). An interestingalternative, called focused sampling, is discussed in Smith and Kohn (1998).3.2Multivariate RegressionExtensions of spline regression to multiple covariates are complicated by the curse of dimensionality. Smith and Kohn (1997) define a spline based bivariate regression model. General,higher dimensional regression models require some simplifying assumptions about the nature of interactions to allow a practical implementation. One approach is to assume additiveeffectsXyi gj (xij ) i ,jand proceed with each gj as before. Shively, Kohn and Wood (1999) and Denison, Mallick andSmith (1998b) propose such implementations. Denison, Mallick and Smith (1998c) explorean alternative extension of univariate splines, following the idea of MARS (multivariateadaptive regression splines, Friedman 1991). MARS uses basis functions that are constructedas products of univariate functions. Let xi (xi1 , . . . , xip ) denote the multivariate covariatevector. MARS assumesg(xi ) b0 kXbh fh (xi ) with fh (x) JhYshj (xwhj thj ) .j 1h 1Here we used linear spline terms (x thj ) to construct the basis functions fh . Each basisfunction defines an interaction of Jh covariates. The indices whj specify the covariates andthj gives the corresponding knots.Another intuitively appealing multivariate extension are CART (classification and regression tree) models. Chipman, George and McCulloch (1998) and Denison, Mallick andSmith (1998a) discuss Bayesian inference in CART models. A regression tree is parametrizedby a pair (T, θ) describing a binary tree T with b terminal nodes, and a parameter vectorθ (θ1 , . . . , θb ) with θi defining the sampling distribution for observations that are assigned9

to terminal node i. Let yik , k 1, . . . , ni denote the observations assigned to the i-th node.In the simplest case the sampling distribution for the i-th node might be i.i.d. sampling,yik N (θi , σ), k 1, . . . , ni , with a node-specific mean. The tree T describes a set ofrules that decide how observations are assigned to terminal nodes. Each internal node ofthe tree has an associated splitting rule that decides whether an observation is assigned tothe right or to the left branch. Let xj , j 1, . . . , p denote the covariates of the regression.The splitting rule is of the form (xj s) for some threshold s. Thus each splitting node isdefined by a covariate index and threshold. The leaves of the tree are the terminal nodes.Chipman, George and McCulloch (1998) and Denison, Mallick and Smith (1998a) proposeBayesian inference in regression trees by defining a prior probability model for (θ, T ) andimplementing posterior MCMC. The MCMC scheme includes the following types of moves:(a) splitting a current terminal node (“grow”); (b) removing a pair of terminal nodes andmaking the parent into a terminal node (“prune”); (c) changing a splitting variable or threshold (“change”). Chipman, George and McCulloch (1998) use an additional swap move topropose a swap of splitting rules among internal nodes. The complex nature of the parameter space makes it difficult to achieve a well mixing Markov chain simulation. Chipman,George and McCulloch (1998) caution against using one long run, and instead advise to usefrequent restarts. MCMC posterior simulation in CART models should be seen as stochasticsearch for high posterior probability trees. Achieving practical convergence in the MCMCsimulation is not typically possible.An interesting special case of multivariate regression arises in spatial inference problems.The spatial coordinates (xi1 , xi2 ) are the covariates for a response surface g(xi ). Wolpertand Ickstadt (1998a) propose a nonparametric model for a spatial point process. At thetop level of a hierarchical model they assume a Poisson process as sampling model for theobserved data. Let xi denote the coordinates of an observed event. For example, xi couldbe the recorded occurrence of a species in a species sampling problem. The model assumesa Poisson process xi P o(Λ(x)) with intensity function Λ(x). The intensity function inturn is modeled as a convolution of a normal kernel k(x, s) and a Gamma process, Λ(x) Rk(x, s)Γ(ds) and Γ(ds) Gamma(α(ds), β(ds)). With constant β(s) β and rescalingthe Gamma process to total mass one, the model for Λ(x) reduces to a Dirichlet processmixture of normals.Arjas and Heikkinen (1997) propose an alternative approach to inference for a spatialPoisson process. The prior probability model is based on Voronoi tessellations with a randomnumber and location of knots.10

3.3Wavelet based modelingP PWavelets provide an orthonormal basis in L2 representing g L2 as g(x) j k djk ψjk (x),with basis functions ψjk (x) 2j/2 ψ(2j x k) that can be expressed as shifted and scaledversions of one underlying function ψ. The practical attraction of wavelet bases is theavailability of super-fast algorithms to compute the coefficients djk given a function, andvice versa. Assuming a prior probability model for the coefficients djk implicitly puts a priorprobability model on the random function g. Typical prior probability models for waveletcoefficients include positive probability mass at zero. Usually this prior probability massdepends on the “level of detail” j, P r(djk 0) πj . Given a non-zero coefficient, anindependent prior with level dependent variances is assumed, for example, p(djk djk 6 0) N (0, τj2 ). Appropriate choice of πj and τj achieves posterior rules for the wavelet coefficientsdjk , which closely mimic the usual wavelet thresholding and shrinkage rules (Chipman et al.1997, Vidakovic 1998). Clyde and George (2000) discuss the use of empirical Bayes estimatesfor the hyperparameters in such models.Posterior inference is greatly simplified by the orthonormality of the wavelet basis. Consider a regression model yi g(xi ) i , i 1, . . . , n, with equally spaced data xi , forP Pexample, xi i/n. Substitute a wavelet basis representation g(·) j k djk ψjk (x), lety, d and denote the data vector, the vector of all wavelet coefficients and the residual vector,respectively. Also, let B [ψjk (xi )] denote the design matrix of the wavelet basis functionsevaluated at the xi . Then we can write the regression in matrix notation as y Bd .The discrete wavelet transform of the data finds, in a computationally highly efficient algorithm, dˆ B 1 y. Assuming independent normal errors, i N (0, σ 2 ), orthogonality of thedesign matrix B implies dˆjk N (djk , σ 2 ), independently across (j, k). Assuming a prioriindependent djk leads to a posteriori independence of the wavelet coefficients djk . In otherwords, we can consider one univariate inference problem p(djk y) at a time. Even if theprior probability model p(d) is not marginally independent across djk , it typically assumesindependence conditional on hyperparameters, still leaving a considerable simplification ofposterior simulation.The above detailed explanation serves to highlight two critical assumptions. Posteriorindependence, conditional on hyperparameters or marginally, only holds for equally spaceddata and under a priori independence over djk . In most applications prior independence is atechnically convenient assumption, but does not reflect genuine prior knowledge. However,incorporating assumptions about prior dependence is not excessively difficult either. Startingwith an assumption about dependence of g(xi ), i 1, . . . , n, Vannucci and Corradi (1999)show that a straightforward two dimensional wavelet transform can be used to derive thecorresponding covariance matrix for the wavelet coefficients djk .11

In the absence of equally spaced data the convenient mapping of the raw data yi to theempirical wavelet coefficients dˆjk is lost. The same is true for inference problems other thanregression where wavelet decomposition is used to model random functions. Typical examples are the unknown density in a density estimation (Müller and Vidakovic 1998), or thespectrum in a spectral density estimation (Müller and Vidakovic 1999). In either case evaluation of the likelihood p(y d) requires reconstruction of the random function g(·). Althougha technical inconvenience, this does not hinder the practical use of a wavelet basis. Thesuper-fast wavelet decomposition and reconstruction algorithms still allow computationallyefficient likelihood evaluation even with the original raw data.3.4Neural NetworksNeural networks are another popular approach following the general theme of defining random functions by probability models for coefficients with respect to an appropriate basis.Now the basis are rescaled versions of logistic functions. Let Ψ(η) exp(η)/(1 exp(η)),P0then g(x) Mj 1 βj Ψ(x γj ) can be used to represent a random function g. The randomfunction is parametrized by θ (β1 , γ1 , . . . , βM , γM ). Bayesian inference proceeds by assuming an appropriate prior probability model and considering posterior updating

Nonparametric Bayesian inference is an oxymoron and misnomer. Bayesian inference by deﬁnition always requires a well deﬁned probability model for observable data yand any other unknown quantities θ, i.e., parameters.

Related Documents:

Lecture Notes on Bayesian Nonparametrics Peter Orbanz

value of the parameter remains uncertain given a nite number of observations, and Bayesian statistics uses the posterior distribution to express this uncertainty. A nonparametric Bayesian model is a Bayesian model whose parameter space has in nite dimension. To de ne a nonparametric Bayesian model, we have

21 Views

1y ago

Bayesian Nonparametric Latent Feature Models

Priors for Bayesian nonparametric latent feature models were originally developed a little over ve years ago, sparking interest in a new type of Bayesian nonparametric model. Since then, there have been three main areas of research for people interested in these priors: extensions/gen

16 Views

2y ago

Nonparametric Estimation in Economics: Bayesian and ...

Nonparametric Estimation in Economics: Bayesian and Frequentist Approaches Joshua Chan, Daniel J. Hendersony, Christopher F. Parmeter z, Justin L. Tobias x Abstract We review Bayesian and classical approaches to nonparametric density and regression esti-mation and illustrate how thes

21 Views

2y ago

Nonparametric Tests - UW Courses Web Server

Nonparametric Tests Nonparametric tests are useful when normality or the CLT can not be used. Nonparametric tests base inference on the sign or rank of the data as opposed to the actual data values. When normality can be assumed, nonparametr ic tests are less efficient than the

149 Views

2y ago

Bayesian Nonparametric Bivariate Meta -Analysis

variate analysis. In such a case, the correlation between summary statistics would be ignored. In contrast, a multivariate meta-analysis model, use from these cor-relations, synthesizes the outcomes, jointly to estimate the multiple pooled e ects simultaneously. In this paper, we present a nonparametric Bayesian bivariate random-e ect meta .

22 Views

3y ago

Analysis with R. Introduction to Bayesian Data

Bayesian data analysis is a great tool! and R is a great tool for doing Bayesian data analysis. But if you google “Bayesian” you get philosophy: Subjective vs Objective Frequentism vs Bayesianism p-values vs subjective probabilities

37 Views

3y ago

A consistent test of functional form via nonparametric ...

Recent developments in nonparametric methods offer powerful tools to tackle the inconsistency problem of earlier specification tests. To obtain a consistent test, we may estimate the infinite-dimensional alternative or true model by nonparametric methods and compare the nonparametric model with the para-

25 Views

3y ago

The Application of Color in Healthcare Settings

The Application of Color in Healthcare Settings SPONSORED BY KI JAIN MALKIN INC. PALLAS TEXTILES . Sheila J. Bosch serves as the director of research and innovation for Gresham, Smith and Partners. An invited member of The Center for Health Design’s Research Coalition and an active participant in national-level research activities, Bosch is a recognized expert in her field. Her more than 20 .

45 Views

3y ago

Recent Views

Family Law and You Booklet - lsc.sa.gov.au

FAMILY LAW AND YOU The Family Law Act is the main law that deals with divorce, disputes about children and property matters. All children are covered by the Family Law Act, no matter where in Australia they live or who their parents are. The courts that can make decisions under the Family Law Act are federal courts called Family Law Courts.

1y ago

143 Views

12 PUBLIC LAW AND PRIVATE LAW - Home: The National .

INTRODUCTION TO LAW MODULE - 3 Public Law and Private Law Classification of Law 164 Notes z define Criminal Law; z list the differences between Public and Private Law; and z discuss the role of Judges in shaping Law 12.1 MEANING AND NATURE OF PUBLIC LAW Public Law is that part of law, which governs relationship between the State

3y ago

745 Views

Dr. Ram Manohar Lohiya National Law University, Lucknow

2. Health and Medicine Law 3. Int. Commercial Arbitration 4. Law and Agriculture IXth SEMESTER 1. Consumer Protection Law 2. Law, Science and Technology 3. Women and Law 4. Land Law (UP) Xth SEMESTER 1. Real Estate Law 2. Law and Economics 3. Sports Law 4. Law and Education **Seminar Courses Xth SEMESTER (i) Law and Morality (ii) Legislative .

3y ago

496 Views

Case Law Update by Victor P. Valmus Family Law uarterly

Family Law uarterly Official Publication of the Cobb County Family Law Section The Cobb Case Law Update The Cobb Family Law uarterlyJune, 201 The Cobb Family Law Quarterly June, 2014 In this Edition Business Valuation and Reporting in Matrimonial Disputes by Marc L. Effron, CPA/ CFF, JD, CVA and Kevin P. Couillard, ASA, CFA

1y ago

114 Views

Board Beans Collection - BOARD BEANS - Board Beans

Catan Family 3 4 4 Checkers Family 2 2 2 Cherry Picking Family 2 6 3 Cinco Linko Family 2 4 4 . Lost Cities Family 2 2 2 Love Letter Family 2 4 4 Machi Koro Family 2 4 4 Magic Maze Family 1 8 4 4. . Top Gun Strategy Game Family 2 4 2 Tri-Ominos Family 2 6 3,4 Trivial Pursuit: Family Edition Family 2 36 4

2y ago

384 Views

Companies Law - Cayman Islands dollar

Law 1 of 1971-15th December, 1970 Law 7 of 2000- 20th July, 2000 Law 7 of 1973-28th June, 1973 Law 5 of 2001-20th April, 2001 Law 24 of 1974-22nd November, 1974 Law 10 of 2001-25th May, 2001 Law 25 of 1975-9th December, 1975 Law 29 of 2001-26th September, 2001 Law 19 of 1977-10th November, 1977 Law 46 of 2001-14th January, 2002

3y ago

454 Views

It’s the Law!

ciples stated in Boyle’s Law, Charles’ Law, Gay-Lussac’s Law, Henry’s Law, and Dalton’s Law. Students will be able to explain the application of Boyle’s Law, Charles’ Law, Gay-Lussac’s Law, Henry’s Law, and Dalton’s Law to observations or events related to SCUBA diving. MateriaLs None audio/visuaL MateriaLs None teachinG tiMe

2y ago

378 Views

WHAT LAW IS ? An Introduction to Law

common law system civil law system!! sources of law in civil law !! a1. primary: statutes (written law) enacted by legislative power are the principal source of law. ! a2. two subsidiary sources of law: ! a2.1 administrative regulations a.2.2 customs!! ! sources of law in common law !!! b1. two primary sources of

2y ago

385 Views

Intermediate Law Law and You Worksheet 3: Australian law - Home Affairs

4. There are different kinds of law to deal with different kinds of problems. Four important kinds of law are civil law, criminal law, family law and administrative law. Civil law deals with disputes between individuals; for example, if someone sells you goods that are faulty, or that cause you injury or damage, you can take that person to court.

4m ago

110 Views

What is Family Law? - Courts and Tribunals Judiciary

What is family law? After all, the law of inheritance is usually thought of as a branch of property law and thus a matter for the Chancery rather than the Family Division. And family 1 Changing families: family law yesterday, today and tomorrow - a view from south of the Border [2018] Fam Law 538, 542-3.

1y ago

128 Views

Domestic Violence and Family Law in Papua New Guinea

Family law in PNG Family law deals with issues relating to family and domestic relationships. Major topics covered by family law include marriage, divorce, child maintenance, prop - erty claims following separation and the custody and adoption of children (Jessep and Luluaki 1985:11). Much of PNG's family law legislation was adopted as

1y ago

126 Views

Faculty of Juridical, Social and Political Sciences Year .

Law L Law IV 8 Drept procesual civil II / Civil Procedure Law II 5 Law L Law IV 8 Dreptul comerțului internațional / International ommercial Law 4 Law L Law IV 8 riminalistică / Forensics 4 Law L Law IV 8 Practică de cercetare pentru elaborarea lucrării de lincență(3 săptămân

2y ago

384 Views

Ohm ’s Law

Ohm ’s Law Ohm's law states that, in an electrical circuit, the current passing through most materials is directly proportional to the potential difference applied across them. 3-1—3-3: Ohm ’s Law Formulas There are three forms of Ohm’s Law: I V/R V IR R V/I where:File Size: 1MBPage Count: 40Explore furtherOhm's Law Quiz MCQs with Answers Ohm Lawohmlaw.comOhm’s Law Worksheet - Basic Electricity - All About omohms law worksheet - eering.orgOhm’s Law Worksheet - Richmond County School Systemwww.rcboe.orgOhm's Law with Examples - Physics Problems with Solutions ended to you b

2y ago

295 Views

Family Law for the Future — An Inquiry into the Family Law .

Review of the Family Law System On 27 September 2017, the Australian Law Reform Commission received Terms of Reference to undertake an inquiry into the family law system. On behalf of the Members of the Commission involved in this Inquiry, and in accordance with the Australian Law

3y ago

136 Views

Practice Material - Family - Law Society of British Columbia

The Law Society's . Report of the Family Law Task Force: Best Practice Guidelines for Law-yers Practising Family Law. Family law has undergone significant changes over the past several years, and more changes are underway. 2. It is important to verify that your legal knowledge and re-sources are current. For example, note these changes:

1y ago

125 Views

Nonparametric Bayesian Data Analysis

It looks like you're using an ad-blocker