Semiparametric Regression Analysis Via Infer - University Of Sydney

1y ago

6 Views

1 Downloads

829.89 KB

36 Pages

Last View : 11d ago

Last Download : 3m ago

Upload by : Victor Nelms

Report this link

Download PDF

Transcription

JSSJournal of Statistical SoftwareMMMMMM YYYY, Volume VV, Issue II.http://www.jstatsoft.org/Semiparametric Regression Analysis via Infer.NETJan LutsTheSearchParty.com Pty Ltd.Shen S.J. WangUniversity of QueenslandJohn T. OrmerodUniversity of SydneyMatt P. WandUniversity of Technology, SydneyAbstractWe provide several examples of Bayesian semiparametric regression analysis via theInfer.NET package for approximate deterministic inference in graphical models. The examples are chosen to encompass a wide range of semiparametric regression situations.Infer.NET is shown to produce accurate inference in comparison with Markov chainMonte Carlo via the BUGS package, but is considerably faster. Potentially, this contribution represents the start of a new era for semiparametric regression, where large andcomplex analyses are performed via fast graphical models methodology and software,mainly being developed within Machine Learning.Keywords: additive mixed models, expectation propagation, generalized additive models,measurement error models, mean field variational Bayes, missing data models, penalizedsplines, variational message passing .Date of this draft: 17 JAN 20141. IntroductionInfer.NET (Minka et al. 2013) is a relatively new software package for performing approximate inference in large graphical models, using fast deterministic algorithms such as expectation propagation (Minka 2001) and variational message passing (Winn and Bishop 2005).We demonstrate its application to Bayesian semiparametric regression. A variety of situations are covered: non-Gaussian response, longitudinal data, bivariate functional effects,robustness, sparse signal penalties, missingness and measurement error.The Infer.NET project is still in its early years and, and at the time of this writing, has not progressed beyond beta releases. We anticipate continual updating and enhancement for many

2Regression via Infer.NETyears to come. In this article we are necessarily restricted to the capabilities of Infer.NET atthe time of preparation. All of our examples use Infer.NET 2.5, Beta 2, which was releasedin April 2013.The graphical models viewpoint of semiparametric regression (Wand 2009), and other advanced statistical techniques, has the attraction that various embellishments of the standardmodels can be accommodated by enlarging the underlying directed acyclic graph. For example, nonparametric regression with a missing predictor data model involves the additionof nodes and edges to the graph, corresponding to the missing data mechanism. Figure 4of Faes et al. (2011) provides some specific illustrations. Marley and Wand (2010) exploitedthe graphical models viewpoint of semiparametric regression to handle a wide range of nonstandard models via the Markov chain Monte Carlo inference engine implemented by BUGS(Lunn et al. 2000). However, many of their examples take between several minutes and hoursto run. The inference engines provided by Infer.NET allow much faster fitting for manycommon semiparametric regression models. On the other hand, BUGS is the more versatilepackage and not all models that are treated in Marley and Wand (2010) are supported byInfer.NET.Semiparametric regression, summarized by Ruppert et al. (2003, 2009), is a large branch ofStatistics that includes nonparametric regression, generalized additive models, generalizedadditive mixed models, curve-by-factor interaction models, wavelet regression and geoadditive models. Parametric models such as generalized linear mixed models are special casesof semiparametric regression. Antecedent research for special cases such as nonparametricregression was conducted in the second half of the 20th Century at a time when data setswere smaller, computing power was lower and the Internet was either non-existent or in itsinfancy. Now, as we approach the mid-2010s, semiparametric regression is continually beingchallenged by the size, complexity and, in some applications, arrival speed of data-sets requiring analysis. Implementation and computational speed are major limiting factors. Thiscontribution represents the potential for a new era for semiparametric regression – tappinginto 21st Century Machine Learning research on fast approximate inference on large graphical models (e.g. Minka 2001; Winn and Bishop 2005; Minka 2005; Minka and Winn 2008;Knowles and Minka 2011) and ensuing software development.Section 2 lays down the definitions and notation needed to describe the models given inlater sections. Each of Sections 3–10 illustrates a different type of semiparametric regressionanalysis via Infer.NET. All code is available as web-supplement to this articles. For most ofthe examples, we also compare the Infer.NET results with those produced by BUGS. Section 11 compares BUGS with Infer.NET in terms of versatility and computational speed. Asummary of our findings on semiparametric analysis via Infer.NET is given in Section 12.An appendix gives a detailed description of using Infer.NET to fit the simple semiparametricregression model of Section 3.2. Preparatory infrastructureThe semiparametric regression examples rely on mathematical infrastructure and notation,which we describe in this section.

Journal of Statistical Software32.1. Distributional notationThe density function of a random vector x is denoted by p(x). The notation for a set ofind.independent random variables yi (1 i n) with distribution Di is yi Di . Table 1 liststhe distributions used in the examples and the parametrization of their density functions.distributiondensity function in xNormal(2πσ 2 ) 1/2 exp{ (x µ)2 /(2σ 2 )};Laplace(2σ) 1 exp( x µ /σ);Γν 12t GammaB A xA 1 e B x;Γ(A)Half-Cauchy2σ; σ2)σ 0N (µ, σ 2 )Laplace(µ, σ 2 )σ 0 πνσ 2 Γ(ν/2){1 π(x2abbreviation(x µ)2 ν 1} 2νσ 2;σ, ν 0x 0; A, B 0x 0; σ 0t(µ, σ 2 , ν)Gamma(A, B)Half-Cauchy(σ)Table 1: Distributions used in the examples. The density function argument x and parameters range over R unless otherwise specified.2.2. Standardization and default priorsIn real data examples, the measurements are often recorded on several different scales.Therefore, all continuous variables are standardized prior to fitting analysis using Infer.NETor Markov Chain Monte Carlo (MCMC) in BUGS. This transformation makes the analysesscale-invariant and can also lead to better behavior of the MCMC.Since we do not have prior knowledge about the model parameters in each of the examples,non-informative priors are used. The prior distributions for a fixed effects parameter vectorβ and a standard deviation parameter σ areβ N (0, τβ 1 I) and σ Half-Cauchy(A)with default hyperparametersτβ 10 10and A 105 .This is consistent with the recommendations given in Gelman (2006) for achieving noninformativeness for variance parameters. Infer.NET and BUGS do not offer direct specification of Half-Cauchy distributions and therefore we use the result:if x a Gamma( 12 , a) and a Gamma( 12 , 1/A2 )then x 1/2 Half-Cauchy(A).(1)

4Regression via Infer.NETThis result allows for the imposition of a Half-Cauchy prior using only Gamma distributionspecifications. Both Infer.NET and BUGS support the Gamma distribution.2.3. Variational message passing and expectation propagationInfer.NET has two inference engines, variational message passing (Winn and Bishop 2005) andexpectation propagation (Minka 2001), for performing fast deterministic approximate inferencein graphical models. Succinct summaries of variational message passing and expectationpropagation are provided in Appendices A and B of Minka and Winn (2008).Generally speaking, variational message passing is more amenable to semiparametric regression than expectation propagation. It is a special case of mean field variational Bayes (e.g.Wainwright and Jordan 2008). The essential idea of mean field variational Bayes is to approximate joint posterior density functions such as p(θ1 , θ2 , θ3 D), where D denotes the observeddata, by product density forms such asqθ1 (θ1 ) qθ2 (θ2 ) qθ3 (θ3 ),qθ1 ,θ3 (θ1 , θ3 ) qθ2 (θ2 ) or qθ1 (θ1 ) qθ2 ,θ3 (θ2 , θ3 ).(2)The choice of the product density form is usually made by trading off tractability againstminimal imposition of product structure. Once this is choice is made, the optimal q-densityfunctions are chosen to minimize the Kullback-Liebler divergence from the exact joint posterior density function. For example, if the second product form in (2) is chosen then theoptimal density functions qθ 1 ,θ3 (θ1 , θ3 ) and qθ 2 (θ2 ) are those that minimize Z Z Zqθ1 ,θ3 (θ1 , θ3 ) qθ2 (θ2 ) logp(θ1 , θ2 , θ3 D)qθ1 ,θ3 (θ1 , θ3 ) qθ2 (θ2 ) dθ1 dθ2 dθ3 ,(3)where the integrals range over the parameter spaces of θ1 , θ2 and θ3 . This minimizationproblem gives rise to an iterative scheme which, typically, has closed form updates and goodconvergence properties. A by-product of the iterations is a lower-bound approximation tothe marginal likelihood p(D), which we denote by p(D).Further details on, and several examples of, mean field variational Bayes are provided bySection 2.2 of Ormerod and Wand (2010). Each of these examples can also be expressed,equivalently, in terms of variational message passing.In typical semiparametric regression models the subscripting on the q-density functions isquite cumbersome. Hence, it is suppressed for the remainder of the article. For example,q(θ1 , θ3 ) is taken to mean qθ1 ,θ3 (θ1 , θ3 ).Expectation propagation is also based on product density restrictions such as (2), but differsin its method of obtaining the optimal density functions. It works with a different versionof the Kullback-Leibler divergence than that given in (3) and, therefore, leads to differentiterative algorithms and approximating density functions.2.4. Mixed model-based penalized splinesMixed model-based penalized splines are a convenient way to model nonparametric functional relationships in semiparametric regression models, and are amenable to the hierarchical Bayesian structures supported by Infer.NET. The penalized spline of a regression function

Journal of Statistical Software5f , with mixed model representation, takes the generic formf (x) β0 β1 x KXuk zk (x),uk N (0, τu 1 ).ind.(4)k 1Here z1 (·), . . . , zK (·) is a set of spline basis functions and τu controls the amount of penalization of the spline coefficients u1 , . . . , uk . Throughout this article, we use O’Sullivan splinesfor the zk (·). Wand and Ormerod (2008) provides details on their construction. O’Sullivansplines lead to (4) being a low-rank version of smoothing splines, which is also used by theR function smooth.spline().The simplest semiparametric regression model is the Gaussian response nonparametric regression modelind.yi N (f (xi ), σε2 ),(5)where (xi , yi ), 1 i n are pairs of measurements on continuous predictor and responsevariables. Mixed model-based penalized splines give rise to hierarchical Bayesian modelsfor (5) such as Pind. 1 ,yi β0 , β1 , u1 , . . . , uK , τε N β0 β1 xi Kuz(x),τiεk 1 k kind.uk τu N (0, τu 1 ), 1/2τuβ0 , β1 N (0, τβ 1 ), Half-Cauchy(Au ),ind. 1/2τε(6) Half-Cauchy(Aε ).Such models allow nonparametric regression to be performed using Bayesian inference engines such as BUGS and Infer.NET. Our decision to work with precision parameters, ratherthan the variance parameters (which are common in the semiparametric regression literature), is driven by the former being the standard parametrization in BUGS and Infer.NET.It is convenient to express (6) using matrix notation. This entails putting y11 x1z1 (x1 ) · · · zK (x1 ) .y . , X . . , Z .yn1 xn(7)z1 (xn ) · · · zK (xn )and u1 and u . .uK β β0β1 (8)We then re-write (6) as ind.y β, u, τε N Xβ Zu, τε 1 ,u τu N (0, τu 1 I), 1/2τu Half-Cauchy(Au ),β N (0, τβ 1 I), 1/2τε Half-Cauchy(Aε ).

6Regression via Infer.NETThe effective degrees of freedom (edf) corresponding to (6) is defined to be edf(τu , τε ) tr [C C blockdiag{τβ2 I 2 , (τu /τε ) I}] 1 C C(9)and is a scale-free measure of the amount of fitting being performed by the penalized splines.Further details on effective degrees of freedom for penalized splines are given in Section 3.13of Ruppert et al. (2003).Extension to bivariate predictorsThe bivariate predictor extension of (5) isind.yi N (f (xi ), σε2 ),(10)where the predictors xi , 1 i n, are each 2 1 vectors. In many applications, the xi s correspond to geographical location but they could be measurements on any pair of predictorsfor which a bivariate mean function might be entertained. Mixed model-based penalizedsplines can handle this bivariate predictor case by extending (4) tof (x) β0 β T1 x KXuk zk (x),uk N (0, τu 1 )ind.(11)k 1and setting zk to be appropriate bivariate spline functions. There are several options for doing this (e.g. Ruppert et al. 2009, Section 2.2). A relatively simple choice is described hereand used in the examples. It is based on thin plate spline theory, and corresponds to Section13.5 of Ruppert et al. (2003). The first step is to choose the number K of bivariate knots andtheir locations. We denote these 2 1 vectors by κ1 , . . . , κK . Our default rule for choosing knot locations involves feeding the xi s and K into the clustering algorithm known asCLARA (Kaufman and Rousseeuw 1990) and setting the κk to be cluster centers. Next, formthe matrices 1 xT1 . ,X . 1 xTn kx1 κ1 k2 log kx1 κ1 k · · · kx1 κK k2 log kx1 κK k . ,.22kxn κ1 k log kxn κ1 k · · · kxn κK k log kxn κK k ZK kκ1 κ1 k2 log kκ1 κ1 k · · · kκ1 κK k2 log kκ1 κK k .and Ω .22kκK κ1 k log kκK κ1 k · · · kκK κK k log kκK κK k Based on the singular value decomposition Ω U diag(d)V T , compute Ω1/2 U diag( d)V Tand then set Z Z K Ω 1/2 . Thenzk (xi ) the (i, k) entry of Z

Journal of Statistical Software7ind.with uk N (0, τu 1 ).3. Simple semiparametric modelThe first example involves the simple semiparametric regression modelyi β0 β1 x1i β2 x2i PKk 1 uk zk (x2i )ind.εi N (0, τε 1 ), εi ,1 i n,(12)where zk (·) is a set of spline basis functions as described in Section 2.4. The correspondingBayesian mixed model can then be represented byy β, u, τε N (Xβ Zu, τε 1 I),β N (0, τβ 1 I), 1/2τuu τu N (0, τu 1 I), Half-Cauchy(Au ), 1/2τε(13) Half-Cauchy(Aε ),where τβ , Au and Aε are user-specified hyperparameters and y1β0 y . , β β1 , u β2yn u1. ,. uK 1 x11 x21z1 (x21 ) · · · zK (x21 ) .X . , Z .1 x1n x2nz1 (x2n ) · · · zK (x2n ) Infer.NET 2.5, Beta 2 does not support direct fitting of model (13) under product restriction(14)q(β, u, τu , τε ) q(β, u) q(τu , τε ).We get around this by employing the same trick as that described Section 3.2 of Wang andWand (2011). It entails the introduction of the auxiliary n 1 data vector a. By setting theobserved data for a equal to 0 and by assuming a very small number for κ, fitting the modelin (15) provides essentially the same result as fitting the model in (13). The actual modelimplemented in Infer.NET isy β, u, τε N (Xβ βuZu, τε 1 I), a β, u, τu N N (0, κ 1 I),bu Gamma( 12 , 1/A2u ),βu 1 τβ I0,,0τu 1 Iτu bu Gamma( 12 , bu ),τε bε Gamma( 21 , bε ),(15)bε Gamma( 21 , 1/A2ε ),with the inputted a vector containing zeroes. While the full Infer.NET code is included inthe Appendix, we will confine discussion here to the key parts of the code that specify (15).

8Regression via Infer.NETSpecification of the prior distributions for τu and τε can be achieved using the followingInfer.NET code:Variable double tauu auu");Variable double tauEps Eps");while the likelihood in (15) is specified by the following line:y[index] erProduct(betauWork,cvec[index]),tauEps);(17)The variable index represents a range of integers from 1 to n and enables loop-type structures. Finally, the inference engine is specified to be variational message passing via thecode:InferenceEngine engine new InferenceEngine();engine.Algorithm new ons nIterVB;(18)with nIterVB denoting the number of mean field variational Bayes iterations. Note thatthis Infer.NET code is treating the coefficient vector βuas an entity, in keeping with product restriction (14).Figures 1 and 2 summarize the results from Infer.NET fitting of (15) to a data set on the yields(g/plant) of 84 white Spanish onions crops in two locations: Purnong Landing and Virginia,South Australia. The response variable, y is the logarithm of yield, whilst the predictors areindicator of location being Virginia (x1 ) and areal density of the plants (plants/m2 ) (x2 ). Thehyperparameters were set at τβ 10 10 , Aε 105 , Au 105 and κ 10 10 , while thenumber of mean field variational Bayes iterations was fixed at 100.As a benchmark, Bayesian inference via MCMC was performed. To this end, the followingBUGS program was used:for(i in 1:n){mu[i] - (beta0 beta1*x1[i] beta2*x2[i] inprod(u[],Z[i,]))y[i] dnorm(mu[i],tauEps)}for (k in 1:K){u[k] dnorm(0,tauu)}beta0 dnorm(0,1.0E-10) ; beta1 dnorm(0,1.0E-10)beta2 dnorm(0,1.0E-10) ;bu dgamma(0.5,1.0E-10) ; tauu dgamma(0.5,bu)bEps dgamma(0.5,1.0E-10) ; tauEps dgamma(0.5,bEps)(19)

Journal of Statistical Software9A burn-in length of 50000 was used, while 1 million samples were obtained from the posterior distributions. Finally, a thinning factor of 5 was used. The posterior densities for MCMCwere produced based on kernel density estimation with plug-in bandwidth selection via theR package KernSmooth (Wand and Ripley 2010). Figure 1 displays the fitted regression linesand pointwise 95% credible intervals. The estimated regression lines and credible intervalsfrom Infer.NET and MCMC fitting are highly similar. Figure 2 visualizes the approximateposterior density functions for β1 , τε 1 and the effective degrees of freedom of the fit. Thevariational Bayes approximations for β1 and τε 1 are rather accurate. The approximate posterior density function for the effective degrees of freedom for variational Bayesian inferencewas obtained based on Monte Carlo samples of size 1 million from the approximate posteriordistributions of τε 1 and τu 1 .Figure 1: Onions data. Fitted regression line and pointwise 95% credible intervals for variational Bayesian inference by Infer.NET and MCMC.4. Generalized additive modelThe next example illustrates binary response variable regression through the modelind.yi β, u Bernoulli(F ({Xβ Zu}i )),u τu N (0, τu 1 I),(20)β N (0, τβ 1 I), 1/2τu Half-Cauchy(Au ),1 i n,where F (·) denotes an inverse link function and X, Z, τβ and Au are defined as in the previous section. Typical choices of F (·) correspond to logistic regression and probit regression.

10Regression via Infer.NETFigure 2: Variational Bayes approximate posterior density functions produced by Infer.NETand MCMC posterior density functions of key model parameters for fitting a simple semiparametric model to the onions data set.As before, introduction of the auxiliary variables a 0 enables Infer.NET fitting of theBayesian mixed modelind.yi β, u Bernoulli(F ({Xβ Zu}i )), 1 τβ I0βa β, u, τu N,,u0τu 1 I βu N (0, κ 1 I),bu Gamma( 12 , 1/A2u ),(21)τu bu Gamma( 12 , bu ),1 i n.The likelihood for the logistic regression case is specified as followsVariableArray bool y Variable.Array bool (index).Named("y");y[index] t(betauWork, cvec[index]));(22)while variational message passing is used for fitting purposes. Setting up the prior for τuand specifying the inference engine is done as in code chunk (16) and (18), respectively. Forprobit regression, the last line of (22) is replaced byy[index] riance(Variable.InnerProduct(betauWork, cvec[index]), 1));(23)and expectation propagation is usedengine.Algorithm new ExpectationPropagation();(24)Instead of the half-Cauchy prior in (20), a gamma prior with shape and rate equal to 2 isused for τu in the probit regression modelVariable double tauU Variable.GammaFromShapeAndRate(2,2);(25)

Journal of Statistical Software11Figure 3 visualizes the results for Infer.NET and MCMC fitting of a straightforward extension of model (21) with inverse-logit and probit link functions to a breast cancer data set(Haberman 1976). This data set contains 306 cases from a study that was conducted between1958 and 1970 at the University of Chicago’s Billings Hospital on the survival of patients whohad undergone surgery for breast cancer. The binary response variable represents whetherthe patient died within 5 years after operation (died), while 3 predictor variables are used:age of patient at the time of operation (age), year of operation (year) and number of positive axillary nodes (nodes) detected. The actual model isind.diedi β, u Bernoulli(F (β0 f1 (agei ) f2 (yeari ) f3 (nodesi ))),1 i 306.Note that all predictor variables were standardized prior to model fitting and the followinghyperparameters were used: τβ 10 10 , Au 105 , κ 10 10 and the number of variationalBayes iterations was 100. Figure 3 illustrates that there is good agreement between the resultsfrom Infer.NET and MCMC.(a)(b)Figure 3: Breast cancer data. Fitted regression lines and pointwise 95% credible intervalsfor logit (a) and probit (b) regression via variational Bayesian inference by Infer.NET andMCMC.

12Regression via Infer.NET5. Robust nonparametric regression based on the t-distributionA popular model-based approach for robust regression is to model the response variableto have a t-distribution. Outliers occur with moderate probability for low values of thet-distribution’s degrees of freedom parameter (Lange et al. 1989). More recently, Staudenmayer et al. (2009) proposed a penalized spline mixed model approach to nonparametricregression using the t-distribution.The robust nonparametric regression model that we consider here is a Bayesian variant ofthat treated by Staudenmayer et al. (2009): Pind. 1 , ν ,yi β0 , β1 , u, τε , ν t β0 β1 xi Kuz(x),τ1 i n,εk 1 k k iu τu N (0, τu 1 I), 1/2τu Half-Cauchy(Au ),β N (0, τβ 1 I), 1/2τε(26) Half-Cauchy(Aε ),p(ν) discrete on a finite set N .Restricting the prior distribution of ν to be that of a discrete random variable allows Infer.NET fitting of the model using structured mean field variational Bayes (Saul and Jordan1996). This extension of ordinary mean field variational Bayes is described in Section 3.1 ofWand et al. (2011). Since variational message passing is a special case of mean field variational Bayes this extension also applies. Model (26) can be fitted through calls to Infer.NETwith ν fixed at each value in N . The results of each of these fits are combined afterwards.Details are given below.Another challenge concerning (26) is that Infer.NET does not support t-distribution specifications. We get around this by appealing to the result if x g N µ, (gτ ) 1and g Gamma( ν2 , ν2 ) then x t(µ, τ 1 , ν).(27)As before, we use the a 0 auxiliary data trick described in Section 3 and the Half-Cauchyrepresentation (1). These lead to following suite of models, for each fixed ν N , needing tobe run in Infer.NET:y β, u, τε N (Xβ Zu, τε 1 diag(1/g)), 1 τβ I0ββa β, u, τu N,, N (0, κ 1 I)uu0τu 1 Iu τu N (0, τu 1 I),ind.gi ν Gamma( ν2 , ν2 ),β N (0, τβ 1 I),τu bu Gamma( 12 , bu ),bu Gamma( 12 , 1/A2u ),τε bε Gamma( 12 , bε ),bε Gamma( 21 , 1/A2ε ),(28)where the matrix notation of (7), (8) is being used, g [g1 , . . . , gn ]T and τβ , Au and Aε areuser-specified hyperparameters with default values τβ 10 10 , Au Aε 105 . The priorfor ν was set to be a uniform distribution over the atom set N , set to be 30 equally-spacednumbers between 0.05 and 10 inclusive.

Journal of Statistical Software13After obtaining fits of (28) for each ν N , the approximate posterior densities are obtainedfromXXq(β, u) qν (ν)q(β, u ν), q(τu ) qν (ν) q(τu ν),ν Nq(τε ) Xν Nq(ν) q(τε ν),ν Np(ν)p(y ν)with q(ν) p(ν y) X,p(ν 0 )p(y ν 0 )ν 0 Nand p(y ν) denotes the variational lower bound on the conditional likelihood p(y ν).Specification of the prior distribution for g is achieved using the following Infer.NET code:VariableArray double g Variable.Array double (index);g[index] index);(29)while the likelihood in (28) is specified by:y[index] (30)The lower bound log p(y ν) can be obtained by creating a mixture of the current model withan empty model in Infer.NET. The learned mixing weight is then equal to the marginal loglikelihood. Therefore, an auxiliary Bernoulli variable is set up:Variable bool auxML Variable.Bernoulli(0.5).Named("auxML");IfBlock model Variable.If(auxML);(31)The normal code for fitting the model in (28) is then enclosed withIfBlock model inally, the lower bound log p(y ν) is obtained from:double marginalLogLikelihood engine.Infer Bernoulli (auxML).LogOdds; (34)Figure 4 presents the results of the structured mean field variational Bayes analysis usingInfer.NET fitting of model (28) to a data set on a respiratory experiment conducted by Professor Russ Hauser at Harvard School of Public Health, Boston, USA. The data correspond to60 measurements on one subject during two separate respiratory experiments. The responsevariable yi represents the log of the adjusted time of exhalation for xi equal to the time inseconds since exposure to air containing particulate matter. The adjusted time of exhalationis obtained by subtracting the average time of exhalation at baseline, prior to exposure tofiltered air. Interest centers upon the mean response as a function of time. The predictor andresponse variable were both standardized prior to Infer.NET analysis. The following hyperparameters were chosen: τβ 10 10 , Aε 105 , Au 105 and κ 10 10 , while the numberof variational Bayes iterations equaled 100. Bayesian inference via MCMC was performedas a benchmark.

14Regression via Infer.NETFigure 4: Structured mean field variational Bayesian, based on Infer.NET, and MCMC fittingof the robust nonparametric regression model (26) to the respiratory experiment data. Left:Posterior mean and pointwise 95% credible sets for the regression function. Right: Approximate posterior function for the degrees of freedom parameter ν.Figure 4 shows that the variational Bayes fit and pointwise 95% credible sets are close tothe ones obtained using MCMC. Finally, the approximate posterior probability function andthe posterior from MCMC for the degrees of freedom ν are compared. The Infer.NET andMCMC results coincide quite closely.6. Semiparametric mixed modelSince semiparametric regression models based on penalized splines fit in the mixed modelframework, semiparametric longitudinal data analysis can be performed by fusion with classical mixed models (Ruppert et al. 2003). In this section we illustrate the use of Infer.NET forfitting the class of semiparametric mixed models having the form:ind.yij β, u, Ui , τε Nβ0 β T xij Ui KX!uk zk (sij ), τε 1k 1Ui τU N (0, τU 1 ),ind.1 j ni ,,(35)1 i m,for analysis of the longitudinal data-sets such as that described in Bachrach et al. (1999). Herexij is a vector of predictors that enter the model linearly and sij is another predictor that enters the model non-linearly via penalized splines. For each 1 i m, Ui denotes the random

Journal of Statistical Software15intercept for the ith subject. The corresponding Bayesian mixed model is represented byy β, u, τε N (Xβ β N (0, τβ 1 I),Zu, τε 1 I), 1/2τu u τU , τu N Half-Cauchy(Au ), 1/2τU 0, 1/2τετU 1 I00τu 1 I , Half-Cauchy(Aε ),(36) Half-Cauchy(AU ),where y, β and X are defined in a similar manner as in the previous sections and τβ , Au , Aεand AU are user-specified hyperparameters. Introduction of the random intercepts resultsin 1 · · · 0 z1 (s11 ) · · · zK (s11 ) U1 . . . . . . . . . . 1 · · · 0 z1 (s1n1 ) · · · zK (s1n1 ) Um . . . , Z u . . . u1 0 · · · 1 z1 (sm1 ) · · · zK (sm1 ) . . . . . . . . . .uK0 · · · 1 z1 (smnm ) · · · zK (smnm )We again use the auxiliary data vector a 0 to allow direct Infer.NET fitting of the Bayesianlongitudinal penalized spline modely β, u, τε N (Xβ a β, u, τU , τu N Zu, τε 1 I), βu βu N (0, κ 1 I), τβ 1 I00, 0τU 1 I0 , 100τu I τu bu Gamma( 21 , bu ),bu Gamma( 21 , 1/A2u ),τε bε Gamma( 21 , 1/bε ),bε Gamma( 12 , 1/A2ε ),τU bU Gamma( 12 , bU ),bU Gamma( 21 , 1/A2U ).(37)Specification of the prior distribution for τU can be achieved in Infer.NET in a similar manneras prior specification in code chunk (16).Figure 5 shows the Infer.NET fits of (37) to the spinal bone mineral density data (Bachrachet al. 1999). A population of 230 female subjects aged between 8 and 27 was followed overtime and each subject contributed either one, two, three, or four spinal bone mineral densitymeasurements. Age enters the model non-linearly and corresponds to sij in (35). Data onethnicity are available and the entries of xij correspond to the indicator variables for Black(x1ij ), Hispanic (x2ij ) and White (x3ij ), with Asian ethnicity corresponding to the baseline.The following hyperparameters were chosen: τβ 10 10 , Aε 105 , Au 105 , AU 105and κ 10 1

in graphical models. Succinct summaries of variational message passing and expectation propagation are provided in Appendices A and B of Minka and Winn (2008). Generally speaking, variational message passing is more amenable to semiparametric re-gression than expectation propagation. It is a special case of mean ﬁeld variational Bayes (e.g.

Semiparametric Regression Analysis Via Infer - University Of Sydney

It looks like you're using an ad-blocker