Generalized Linear Models in RMarkus Gesmann21 November 20171
AgendaGeneralized linear models (GLMs) are a flexible generalizationof linear models, with applications in many disciplines.This talk will give an introduction to GLMs from adistribution-centric point of view.Using a small toy data set we will discuss how differentassumptions about the data generating process lead todifferent assumptions about the distribution of the data(response variable) and therefore different GLMs.2
The problem GLMs solved Linear regression of data generated from distributionsother than the Gaussian First published in 1972 by John Nelder and RobertWedderburn, as a unifying approach for many differentdistributions3
Original AbstractFigure 1: Nelder, John; Wedderburn, Robert (1972). “GeneralizedLinear Models”. Journal of the Royal Statistical Society. Series A(General). Blackwell Publishing. 135 (3): 370–384.4
The original authors of GLMFigure 2: John Nelder (1924 - 2010) and Robert Wedderburn(1947 - 1975). Photos courtesy of Wikipedia and CambridgeStatistical Laboratory5
Example problem: Selling ice creamFigure 3: How many units of ice cream should I stock?6
The challenge Create a model that predicts the number of ice creamssold for different temperatures Ensure model behaves well when the temperature dropsto 0ºC and for a very hot summer’s day at 35ºC.7
Available dataicecream - data.frame(temp c(11.9, 14.2, 15.2, 16.4, 17.2, 18.1,18.5, 19.4, 22.1, 22.6, 23.4, 25.1),units c(185L, 215L, 332L, 325L, 408L, 421L,406L, 412L, 522L, 445L, 544L, 614L))8
Plot of ice cream data400300200Units sold500600Number of ice creams sold12141618202224Temperature (Celsius)9
Use a ruler and pencilNumber of ice creams sold400300200Units sold500600observationlinear least square12141618202224Temperature (Celsius)10
Model 1: Gaussian GLM11
Linear regressionLet’s start with a probability distribution centric description ofthe data.I believe the observation yi was drawn from a Normaldistribution with a mean µi , depending on the temperature xiand a constant variance σ 2 across all temperatures.yi N (µi , σ 2 ),E[yi ] µi α βxi for all i(1)(2)The first equation describes the distribution of the responseand the second equation how the distribution parameter µi islinked in a linear way.12
Model 1: Gaussian GLMlibrary(arm) # for 'display' function onlylin.mod - glm(units temp, data icecream,family gaussian(link "identity"))display(lin.mod)glm(formula units temp, family gaussian(link "identity"),data icecream)coef.est coef.se(Intercept) -159.4754.64temp30.092.87--n 12, k 2residual deviance 14536.3, null deviance 174754.9 (difference 160218.6)overdispersion parameter 1453.6residual sd is sqrt(overdispersion) 38.1313
Model 1: Gaussian GLM PlotFigure 4: yi N (µi , σ 2 ) with µi 159.5 30.1 xi and σ 38.114
Model 1: CritiqueAlthough the linear model looks fine in the range oftemperatures observed, it doesn’t make much sense at 0ºC.The intercept is at -159, which would mean that customersreturn on average 159 units of ice cream on a freezing day.Well, I don’t think so.15
Model 2: Log-transformedGaussian GLM16
Model 2: Log-transformed Gaussian GLMOk, perhaps I can transform the data first. Ideally I would likeensure that the transformed data has only positive values. So,let’s model the ice cream sales on a logarithmic scale.log(yi ) N (µi , σ 2 )(3)E[log(yi )] µi α βxi(4)This model implies that I believe the sales follow a log-normaldistribution: yi log N (µi , σ 2 ).17
Model 2: Lognormal distributionThe log-normal distribution is skewed to the right, whichmeans that I regard higher sales figures more likely than lowersales figures.Although the model is still linear on a log-scale, I have toremember to transform the predictions back to the originalscale because E[log(yi )] ̸ log(E[yi ]).yi log N (µi , σ 2 )2(5)2E[yi ] exp(µi σ /2) exp(α βxi σ /2)(6)18
Model 2: Log-transformed Gaussian GLMlog.lin.mod - glm(log(units) temp, data icecream,family gaussian(link "identity"))display(log.lin.mod)glm(formula log(units) temp, family gaussian(link "identity"),data icecream)coef.est coef.se(Intercept) 4.400.20temp0.080.01--n 12, k 2residual deviance 0.2, null deviance 1.4 (difference 1.2)overdispersion parameter 0.0residual sd is sqrt(overdispersion) 0.1419
Model 2: Log-transformed Gaussian GLMlog.lin.sig - summary(log.lin.mod) dispersionlog.lin.pred - exp(predict(log.lin.mod) 0.5*log.lin.sig)600Number of ice creams sold400300200Units sold500observationlog transformed LM12141618202224Temperature (Celsius)20
Model 2: Log-transformed Gaussian GLMFigure 5: yi log N (µi , σ 2 ), µi 4.4 0.08 xi and E[yi ] exp(4.4 0.08 xi 12 0.142 )21
Model 2: CritqueThis plot looks a little better than the previous linear modeland it predicts that I would sell, on average, 82 ice creamswhen the temperature is 0ºC:exp(coef(log.lin.mod)[1] 0.5 * log.lin.sig)(Intercept)82.3945But the assumed model distribution generates real numbers,while ice cream sales only occur in whole numbers.22
Model 3: Poisson GLM23
Model 3: Poisson GLMThe classic approach for count data is the Poisson distribution.The Poisson distribution has only one parameter, here µi ,which is also its expected value. The canonical link functionfor µi is the logarithm, i.e. the logarithm of the expected valueis regarded a linear function of the predictors.This is distinctively different to the log-transformed linearmodel above, where the original data was transformed, not theexpected value of the data.24
Model 3: Poisson GLMAlthough the expected value of my observation is a realnumber, the Poisson distribution will generate only wholenumbers, in line with the actual sales.yi Poisson(µi )E[yi ] µi exp(α βxi ) exp(α) exp(βxi )log(µi ) α βxi(7)(8)(9)Note, the exponential function turns the additive scale into amultiplicative one.25
Model 3: Poisson GLMpois.mod - glm(units temp, data icecream,family poisson(link "log"))display(pois.mod)glm(formula units temp, family poisson(link "log"), data icecream)coef.est coef.se(Intercept) 4.540.08temp0.080.00--n 12, k 2residual deviance 60.0, null deviance 460.1 (difference 400.1)26
Model 3: Poisson GLMpois.pred - predict(pois.mod, type "response")600Number of ice creams sold400300200Units sold500observationPoisson (log) GLM12141618202224Temperature (Celsius)27
Model 3: Poisson GLMFigure 6: yi Poisson(µi ), log(µi ) 4.54 0.08 xi and E[yi ] µi28
Model 3: CritqueSo far, so good. My model is in line with my observations.Additionally, it will not predict negative sales and if I wouldsimulate from a Poisson distribution I will always only getwhole numbers back.However, my model will also predict that I should expect tosell over 1000 ice creams when the temperature reaches 32ºC:predict(pois.mod, newdata data.frame(temp 32), type "response")11056.65129
Model 4: Binomial GLM30
Model 4: Binomial GLMOk, let’s me think about the problem this way: If I have 800potential sales then I’d like to understand the proportion soldat a given temperature.This suggests a binomial distribution for the number ofsuccessful sales out of 800.The key parameter for the binomial distribution is theprobability of success, the probability that someone will buyice cream as a function of temperature.31
Model 4: Binomial GLMDividing my sales statistics by 800 would give me a first proxyfor the probability of selling all ice cream.Therefore I need an S-shape curve that maps the salesstatistics into probabilities between 0 and 100%.A canonical choice is the logistic function:logit(u) eu1 ue 11 e u32
Model 4: Binomial GLMThe Binomial GLM can be described asyi Binom(n, µi )logit(µi ) α βxiE[yi ] µi logit 1 (α βxi )(10)(11)(12)(13)33
Model 4: Binomial GLMmarket.size - 800icecream opportunity - market.size - icecream unitsbin.glm - glm(cbind(units, opportunity) temp, data icecream,family binomial(link "logit"))display(bin.glm)glm(formula cbind(units, opportunity) temp, family binomial(link "logit"data icecream)coef.est coef.se(Intercept) -2.970.11temp0.160.01--n 12, k 2residual deviance 84.4, null deviance 909.4 (difference 825.0)34
Model 4: Binomial GLMbin.pred - predict(bin.glm, type "response")*market.size600Number of ice creams sold400300200Units sold500observationBinomial (logit) GLM12141618202224Temperature (Celsius)35
Model 4: Binomial GLMFigure 7: yi Binom(n, µi ), logit(µi ) 2.97 0.16 xi and E[yi ] µi logit 1 (µi )36
Model 4: CritqueThis model doesn’t look too bad at all!I can predict sales at 0ºC and 35ºC using the inverse of thelogistic function, which is given in R as ept)39.09618plogis(coef(bin.glm)[1] 937
SummaryLet’s bring all the models together into one graph, withtemperatures ranging from 0 to 35ºC.800Number of ice creams sold4002000Units sold600observationlinear modellog transformed LMPoisson (log) GLMBinomial (logit) GLM05101520253035Temperature (Celsius)38
ConclusionsFitting a model to data requires more than just applying analgorithm. In particular it is worth to think about: the range of expected values: are they bounded or rangefrom to ? the type of observations: do I expect real numbers, wholenumbers or proportions? how to link the distribution parameters to theobservations39
Further observationsThere are many aspects of GLMs which I haven’t touched onhere, such as: all the above models incorporate a fixed level of volatility.However, in practice, the variability of making a sale atlow temperatures might be significantly different than athigh temperatures. Check the residual plots and consideran over-dispersed model. I used so called canonical link functions in my models,which have nice theoretical properties, but other choicesare possible as well. the distribution has to be chosen from the exponentialfamily, e.g. Normal, Gamma, Poisson, binomial, Tweedie,etc.40
Further readings and more detailsBooks: Generalized Linear Models, P. McCullagh and J.A.Nelder An Introduction to Generalized Linear Models, Annette J.Dobson and Adrian Barnett Data Analysis Using Regression and Multilevel/Hierarchical Models, Andrew Gelman and Jennifer HillOn my blog: Generalized Linear Models in R Visualising theoretical distributions of GLMs Bayesian regression models using Stan in R41
The End. Thanks.42
Nov 21, 2017 · Generalized Linear Models in R Markus Gesmann 21 November 2017 1. Agenda Generalized linear models (GLMs) are a flexible generalization of linear models, with applications in many disciplines. This talk will give an introd
Quasi-poisson models Negative-binomial models 5 Excess zeros Zero-inflated models Hurdle models Example 6 Wrapup 2/74 Generalized linear models Generalized linear models We have used generalized linear models (glm()) in two contexts so far: Loglinear models the outcome variable is thevector of frequencies y in a table
Overview of Generalized Nonlinear Models in R Linear and generalized linear models Generalized linear models Problems with linear models in many applications: I range ofy is restricted (e.g.,y is a count, or is binary, or is a duration) I e ects are not additive I va
of software for implementing generalized linear mixed models, we have found researchers increasingly interested in using these models, but it is "easier said than done." Our goal is to help those who have worked with linear mixed models to begin moving toward generalized linear mixed models. The benefits and chal-lenges are discussed from a .
All of these models extend straightforwardly to generalized nonparametric regression, much as linear models extend to generalized linear models (discussed in Chapter 5 of the text). The random and link components are as in generalized linear models, but the linear predictor of the GLM 0 1x 1 2x 2 px p
The data exhibit an exponential pattern, which means that a log linear model can be appropriate. SPSS commands To t a Loglinear Model to the data: 1. Analyze Generalized Linear Models Generalized Linear Models 2. Type of Model Tab:
Semiparametric Generalized Linear Models with the gldrm Package . The regression . or large support for the nonparametric response distribution. The algorithm is implemented in a new R package called gldrm. Introduction Rathouz and Gao(2009) introduced the generalized linear density ratio model (gldrm), which is a .
2 5 : Generalized linear models The logistic regression, on the other hand, can be written as a Bernoulli distribution rst, p(yjx) (x)y(1 (x))1 y where (x) 1 1 e Tx We can used the brute-force gradient method as in LR, but we can also apply generic laws by observing the p(yjx) is an exponential family function. It’s a generalized linear .
manufacturers in the automotive industry. The ‘Xtra’ range of products are also more resistant to humidity and thermal cycling (rapid changes in heating and cooling) than the standard range. The following graph shows the effects of humidity (168 hours, 25 C, 90% RH) and thermal cycling (25 cycles between -25 C and 65 C) on HTC and HTCX. The results show that the rheology of HTC changes .