3y ago

72 Views

4 Downloads

795.64 KB

19 Pages

Transcription

GLMs for Count DataOutlineMichael Friendly1Generalized linear models2GLMs for count dataExample: phdpubs3Model diagnosticsInteractionsNonlinearityOutliers, leverage and influence4OverdispersionQuasi-poisson modelsNegative-binomial models5Excess zerosZero-inflated modelsHurdle modelsExample6WrapupPsych 6136 1.5 1.401.8 1.71.61.5 1.4101marriedNumber of articlesNumber of articlesfemale1.81.71.61.51.41.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0Number of articles1.91.81.71.6Number of articlesNumber of articlesNovember 28, 01020phdprestige3040506070mentor2 / 74Generalized linear modelsGeneralized linear modelsGeneralized linear modelsGeneralized linear modelsLogistic regressionWe have used generalized linear models (glm()) in two contexts so far:the outcome variable is a categorical response y, with predictors XThe model is expressed as a linear model for the log odds that y 1 vs.y 0. Pr(y 1)logit(y) log XβPr(y 0)Loglinear modelsthe outcome variable is the vector of frequencies y in a tablecross-classified by factors in a design matrix XThe model is expressed as a linear model for log yThe random (or unexplained) variation is expressed as a Binomialdistribution for E(y X )log(y) X βHey, aren’t these both very like the familiar, classical linear model,The random (or unexplained) variation is expressed as a Poissondistribution for E(y X )y X β , N (0, σ 2 I)?Yes, for some transformation, g(y), and with different distributions!3 / 744 / 74

Generalized linear modelsGeneralized linear modelsGeneralized linear modelsMean functionsNelder & Wedderburn (1972) said, “Let there be light!”, a generalized linearmodel, encompassing them all, and many more. This has 3 components:A random component, specifying the conditional distribution of y giventhe explanatory variables in X , with mean E(yi xi ) µiStandard GLM link functions and their inverses:The normal (Gaussian), binomial, and Poisson are already familiarBut, these are all members of an exponential familyGLMs now include an even wider family: negative-binomial and othersThe systematic component, a linear function of the predictors called thelinear predictorη Xβor ηi β0 β1 Xi1 · · · βp XipAn invertible link function, g(µi ) ηi xiT β that transforms the expectedvalue of the response to the linear predictorThe link function is invertable, so we can go back to the mean functiong 1 (ηi ) µiThe top section recognizes standard transformations often used withtraditional linear modelsThe bottom section is for binomial data, where yi represents an observedproportion in ni trials5 / 74Generalized linear models6 / 74Generalized linear modelsCanonical links and variance functionsVariance functions and over-dispersionIn the classical Gaussian linear model, the conditional variance isconstant, φ σ 2 .For every distribution family, there is a default, canonical link functionEach one also specifies the expected relationship between mean andvarianceFor binomial data, the variance function is V(µi ) µi (1 µi )/ni , with φfixed at 1In the Poisson family, V(µi ) µi and the dispersion parameter is fixed atφ 1.In practice, it is common for count data to exhibit overdispersion, meaningthat V(µi ) µi .One way to correct for this is to allow the dispersion parameter to beestimated from the data, giving what is called the quasi-Poisson family,b i.with V(µi ) φµ7 / 748 / 74

Generalized linear modelsGeneralized linear modelsVariance functions and over-dispersionML EstimationGLMs are fit by the method of maximum likelihood.For the Poisson distribution with mean µ, the probability that the randomvariable Y takes values y 0, 1, 2, . . . isPr(Y y ) Overdispersion often results from failures of the assumptions of the model:e µ µyy!In the GLM with a log link, the mean, µi depends on the predictors in xthroughloge (µi ) xiT βsupposedly independent observations may be correlatedthe probability of an event may not be constant, orit may vary with unmeasured or unmodeled variablesThe log-likelihood function (ignoring a constant) for n independentobservations has the formloge L(β) nX{yi loge (µi ) µi }It can be shown that the maximum likelihood estimators are solutions tothe estimating equations,X Ty X TµThe solutions are found by iteratively re-weighted least squares.9 / 74Generalized linear models10 / 74GLMs for count dataGoodness of fitGLMs for count dataThe residual deviance defined as twice the difference between themaximum log-likelihood for the saturated model that fits perfectly andmaximized log-likelihood for the fitted model.Typicaly, these are fit using: glm( yx1 x2 x3,family poisson, data mydata)As in other linear models, the predictors xj can be discrete factors,quantitative variables, and so forth.This fixes the dispersion parameter φ to 1, assuming that the countvariable y conditional on x1, x2, . . . is Poisson distributed.It is possible to fit a quasi Poisson model, allowing φ to be estimated fromthe data. Specify: family quasipoisson. This allows the variance tobe proportional to the mean,b 2[loge L(y; y) loge L(y; µ)]b .D(y, µ)For classical (Gaussian) linear models, this is just the residual sum ofsquaresFor Poisson models with a log link giving µ exp(x T β), the deviancetakes the form n Xyib 2D(y, µ)yi loge (yi µbi ) .µbiV(yi ηi ) φµii 1Another possibility is the negative-binomial model, which hasFor a GLM with p parameters, both the Pearson and residual deviancestatistics follow approximate χ2n p distributions with n p degrees offreedom.V(yi ηi ) µi µ2i /θ11 / 7412 / 74

GLMs for count dataExample: phdpubsGLMs for count dataExample: Publications of PhD CandidatesExample: phdpubsExample: Publications of PhD CandidatesInitially, ignore the predictors.Example 3.24 in DDAR gives data on the number of publications by PhDcandidates in biochemistry in the last 3 years of studyFor the Poisson, equivalent to an intercept-only model:glm(articles 1, data PhdPubs, family "poisson")data("PhdPubs", package "vcdExtra")table(PhdPubs articles)As a quick check on the Poisson assumption:####012## 275 246 1783844675276177128192101111122161191with(PhdPubs, c(mean mean(articles),var var(articles),ratio var(articles)/mean(articles)))Predictors are: gender, marital status, number of young children, prestigeof the doctoral department, and number of publications by the student’smentor.##meanvar ratio## 1.6929 3.7097 2.1914The assumption that mean variance could be met when we add predictors.13 / 74GLMs for count data14 / 74Example: phdpubsGLMs for count dataExample: Publications of PhD CandidatesExample: phdpubsFitting the Poisson modelFirst, look at rootograms:Fit the model with all main effects:plot(goodfit(PhdPubs articles), xlab "Number of Articles",main "Poisson")plot(goodfit(PhdPubs articles, type "nbinomial"),xlab "Number of Articles", main "Negative binomial")# predictors: female, married, kid5, phdprestige, mentorphd.pois - glm(articles ., data PhdPubs, family sis of Deviance Table (Type II tests)Response: articlesLR Chisq Df Pr( Chisq)female17.1 13.6e-05 ***married6.6 10.01 *kid522.1 12.6e-06 ***phdprestige1.0 10.32mentor126.8 1 2e-16 ***--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1Only phdprestige is NS; it does no harm to keep it, for now.One reason the Poisson doesn’t fit: excess 0s (some never published?)15 / 7416 / 74

Example: phdpubsGLMs for count dataInterpreting coefficientsAs usual, we can understand the fitted model from predicted values for themodel effects:round(cbind(beta coef(phd.pois),expbeta exp(coef(phd.pois)),pct 100 * (exp(coef(phd.pois)) - 1)), 3)1.91.81.71.6 1.5 1.401.8 1.71.61.5 1.4101females publish -0.224 fewer log (articles), or 0.8 that of malesmarried publish 0.157 more log (articles); or 1.17 unmarried (17%increase)each additional young child decreases this by 0.185; or 0.831 articles(16.9% decrease)each mentor pub multiplies student pub by 1.026, a 2.6% dNumber of articlesfemaleThus:Number of articlesbeta expbetapct(Intercept) 0.2661.304 30.425female1-0.2240.799 -20.102married10.1571.170 17.037kid5-0.1850.831 -16.882phdprestige 0.0251.0262.570mentor0.0251.0262.555Number of articleslibrary(effects); plot(allEffects(phd.pois))Number of articles##############Example: phdpubsEffect plotsβj is the increment in log (articles) for a 1 unit change in xj ; exp(βj ) is themultiple of articles:Number of articlesGLMs for count data1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 6070mentorThese are better visual summaries for a model than a table of coefficients.17 / 7418 / 74Model diagnosticsModel diagnosticsModel diagnosticsInteractionsTesting for interactionsAs a quick check for interactions, fit the model with all two-way termsphd.pois1 - update(phd.pois, . .ˆ2)Anova(phd.pois1)Diagnostic tests for count data GLMs are similar to those used for classicallinear modelsTest for presence of ####Fit model(s) with some or all two-way interactionsNon-linear effects of quantitative predictors?Component-plus-residual plots— car::crPlot() are useful hereOutliers? Influential observations?car::influencePlot() is your friendFor count data models, we should also check for over-dispersion. This issimilar to homogeneity of variance checks in lm()19 / 74Analysis of Deviance Table (Type II tests)Response: articlesLR Chisq Df Pr( Chisq)female14.5 10.00014 ***married6.2 10.01277 *kid519.5 19.8e-06 ***phdprestige1.0 10.32655mentor128.1 1 2e-16 ***female:married0.3 10.60995female:kid50.1 10.72929female:phdprestige0.2 10.63574female:mentor0.0 10.91260married:kid50married:phdprestige1.7 10.19153married:mentor1.2 10.28203kid5:phdprestige0.2 10.68523kid5:mentor2.8 10.09290 .phdprestige:mentor3.8 10.05094 .--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 120 / 74

Model diagnosticsInteractionsModel diagnosticsCompare models ICompare models IICompare models: LR tests for nested models (anova()), and AIC/BIC(LRstats())LRstats(phd.pois, phd.pois1)Interactionsanova(phd.pois, phd.pois1, test "Chisq")########################## Likelihood summary table:##AIC BIC LR Chisq Df Pr( Chisq)Analysis of Deviance Table## phd.pois 3313 33421634 909 2e-16 ***## phd.pois1 3316 33881618 900 2e-16 ***Model 1: articles female married kid5 phdprestige mentor## --Model 2: articles female married kid5 phdprestige mentor female:married ## Signif. codes:0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1female:kid5 female:phdprestige female:mentor married:kid5 married:phdprestige married:mentor kid5:phdprestige kid5:mentor phdprestige:mentorThere seems to be no reason to include interactions in the modelResid. Df Resid. Dev Df Deviance Pr( Chi)19091634We might want to re-visit this, after examining other models for the29001618 915.20.086 .count distribution (quasi-poisson, negative-binomial)--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1basic21 / 74Model diagnosticsInteractions22 / 74Model diagnosticsBasic model plotsNonlinearityNonlinearity diagnosticsOnly two of the standard model plots are informative for count data modelsplot(phd.pois, which c(1,5))Non-linear relations are difficult to assess in marginal plots, because theydon’t control (or adjust) for other predictorsComponent-plus-residual plots (also called partial residual plots) canshow non-linear relations for numeric predictorsResiduals vs Leverage6Residuals vs Fitted8 914 20246 915 Std. Pearson resid.20 2914 0.00.51.01.5Predicted values2.00.000.05These graph the value of β̂i xi residuali vs. the predictor, xi .In this plot, the slope of the points is the coefficient, β̂i in the full modelThe residual is yi ŷi in the full model10.5A non-parametric (e.g., loess()) smooth makes it easy to detectnon-linearity Cook's distance 4 4Residuals4 913 9110.100.15328 0.50.20Leverage23 / 7424 / 74

Model diagnosticsNonlinearityModel diagnosticsOutliers, leverage and influenceNonlinearity diagnostics: car::crPlot()Residuals IIs the relationship between articles published by the student and the mentoradequately represented as linear?Several types of residuals can be defined based on the Pearson and deviancegoodness-of-fit measuresthe Pearson residual is the case-wise contribution to Pearson χ2crPlot(phd.pois, "mentor", pch 16, lwd 4, id.n 2)yi µbiriP qb i)V(ythe deviance residual is the signed square root of the contribution to thedeviance G2priD sign(yi µbi ) diBoth of these have standardized forms that correct for conditionalvariance and leverage, and have approx. N (0, 1) distributions.eriPeriD riPqb hi )φ(1riDqb hi )φ(125 / 74Model diagnosticsOutliers, leverage and influence26 / 74Model diagnosticsResiduals IIOutliers, leverage and influenceOutliers, leverage and influenceinfluencePlot(phd.pois)The most useful is the studentized residual (or deletion residual),rstudent() in R. This estimates the standardized residual resultingfrom omitting each observation in turn. An approximation is:Several observations (913–915)stand out with large residualsOne observation (328) has a largeleverageqeriS sign(yi µbi ) (1 hi )(eriD )2 hi (eriP )2 .Why are they unusual? Do theyaffect our conclusions?Look back at data & decide whatto do!27 / 7428 / 74

Model diagnosticsOutliers, leverage and influenceModel diagnosticsOutliers, leverage and influenceOutliers, leverage and influenceOutlier testAt the very least, we should look at these observations in the data:A formal test for outliers can be based on the studentized residuals,rstudent(model), using the standard normal distribution for p-valuesA Bonferroni correction should be applied, because interest focuses onthe largest n absolute residuals.For the Poisson model, 4 observations are nominated as large outliers:PhdPubs[c(328, 913:915),]##########328913914915articles female married kid5 phdprestige d.pois, cutoff 0.001)##########case 328: Mentor published 77 papers! Student, only 1913–915: all published predicted914913911915rstudent unadjusted p-value Bonferonni 9e-0429 / 74Overdispersion30 / 74OverdispersionOverdispersionTesting overdispersionThe Poisson model for counts assumes V(µi ) µi , i.e., the dispersionparameter φ 1But often, the counts exhibit greater variance than the Poissondistribution allows, V(µi ) µi or φ 1Statistical tests for overdispersion are described in DDAR §11.3.4.They test H0 : V(y ) µ, vs. H1 that variance depends on the meanaccording to some function f (µ)The observations (counts) may not be independent (clustering)The probability of an “event” may not be constantThere may be unmeasured influences, not accounted for in the modelThese effects are sometimes called “unmodeled heterogeneity”V(y) µ α f (µ)This is implemented in dispersiontest() in the AER package.If significant, overdispersion should not be ignoredAlternatively, you can try fitting a more general model to see what differenceit makes.The consequences are:Standard errors of the coefficients, se(βbj ) are optimistically smallWald tests, zj βbj /se(βbj ), are too large, and thus overly liberal.31 / 7432 / 74

OverdispersionQuasi-poisson modelsOverdispersionOverdispersion: Quasi-poisson modelsQuasi-poisson modelsOverdispersion: Quasi-poisson modelsOne estimate of the dispersion parameter is the residual deviance dividedby degrees of freedom φb D(y, µb)/df2The Pearson χ statistic has better statistical properties and is morecommonly usedInstead, we can fit another version of the model in which the dispersion φis a free parameter, estimated along with the other coefficients. That is,the conditional variance is allowed to benX (yi µXP2bi )2φb /(n p) .n pµbiV(yi ηi ) φµii 1For the PhdPubs data, these estimates are quite similar: about 80%overdispersionThis model is fit with glm() using family quasipoissonthe estimated coefficients βb are unchangedb1/2the standard errors are multiplied by φpeace, order, and good governance is restored!with(phd.pois, deviance / df.residual)## [1] 1.7971sum(residuals(phd.pois, type "pearson")ˆ2) / phd.pois df.residual## [1] 1.830434 / 7433 / 74OverdispersionCoefficients unchanged; std. errors multiplied by φb1/2 Quasi-poisson modelsFitting the quasi-poisson model 1.83 1.35.summary(phd.qpois)The quasi-Poisson model is can be fit using glm() as:phd.qpois - glm(articles ., data PhdPubs, family quasipoisson)The dispersion parameter estimate φb can be obtained as follows:(phi - summary(phd.qpois) dispersion)## [1] 1.8304This is much better than variance/mean ratio of 2.91 calculated for themarginal distribution ignoring the predictors.35 / ####Call:glm(formula articles ., family quasipoisson, data PhdPubs)Deviance Residuals:Min1Q Median-3.488 -1.538 -0.3653Q0.577Max5.483Coefficients:Estimate Std. Error t value Pr( t )(Intercept) 0.265620.134781.97 0.04906 *female1-0.224420.07384-3.04 0.00244 **married10.157320.082871.90 0.05795 .kid5-0.184910.05427-3.41 0.00069 ***phdprestige 0.025380.034190.74 0.45815mentor0.025230.002759.19 2e-16 ***--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1(Dispersion parameter for quasipoisson family taken to be 1.8304)Null deviance: 1817.4Residual deviance: 1633.6AIC: NAon 914on 909degrees of freedomdegrees of freedomNumber of Fisher Scoring iterations: 5

OverdispersionNegative-binomial modelsOverdispersionThe negative-binomial modelNegative-binomial modelsThe negative-binomial modelThe negative-binomial model is a different generalization of the Poissonthat allows for over-dispersionMathematically, it allows the mean µ xi to vary across observations as agamma distribution with a shape parameter θ.The variance function, V(yi ) µi µ2i /θ, allows the variance of y toincrease more rapidly than the mean.Overdispersion decreases as θincreasesAnother parameterization uses α 1/θV(yi ) µi µ2i /θ µi αµ2i ,As α 0, V(yi ) µi and the negative-binomial converges to thePoisson.37 / 74Overdispersion38 / 74Negative-binomial modelsOverdispersionThe negative-binomial model: FittingNegative-binomial modelsVisualizing the mean variance relationOne way to see the difference among models is to plot the variance vs. meanfor grouped values of the fitted linear predictor.The smoothed (loess)curve gives the empiricalmean–variancerelationshipAlso plot the theoreticalmean–variance fromdifferent modelsFor PhdPubs, the data ismost similar to thenegative-binomialThe models differ most forMean number of articlesthose with 3 articlesFor fixed θ, the negative-binomial is another special case of the GLMThis is handled in the MASS package, withfamily negative.binomial(theta)10 8lowessnegbin6VarianceBut most often, θ is unknown, and must be estimated from the dataThis is implemented in glm.nb() in the MASS package. quasi Poisson library(MASS)phd.nbin - glm.nb(articles ., data PhdPubs)4 2 Poisson 0.539 / 741.01.52.02.53.03.540 / 74

OverdispersionNegative-binomial modelsOverdispersionNegative-binomial modelsVisualizing

Quasi-poisson models Negative-binomial models 5 Excess zeros Zero-inﬂated models Hurdle models Example 6 Wrapup 2/74 Generalized linear models Generalized linear models We have used generalized linear models (glm()) in two contexts so far: Loglinear models the outcome variable is thevector of frequencies y in a table

Related Documents: