2y ago

23 Views

3 Downloads

1,017.24 KB

42 Pages

Transcription

Generalized Linear Models in RMarkus Gesmann21 November 20171

AgendaGeneralized linear models (GLMs) are a flexible generalizationof linear models, with applications in many disciplines.This talk will give an introduction to GLMs from adistribution-centric point of view.Using a small toy data set we will discuss how differentassumptions about the data generating process lead todifferent assumptions about the distribution of the data(response variable) and therefore different GLMs.2

The problem GLMs solved Linear regression of data generated from distributionsother than the Gaussian First published in 1972 by John Nelder and RobertWedderburn, as a unifying approach for many differentdistributions3

Original AbstractFigure 1: Nelder, John; Wedderburn, Robert (1972). “GeneralizedLinear Models”. Journal of the Royal Statistical Society. Series A(General). Blackwell Publishing. 135 (3): 370–384.4

The original authors of GLMFigure 2: John Nelder (1924 - 2010) and Robert Wedderburn(1947 - 1975). Photos courtesy of Wikipedia and CambridgeStatistical Laboratory5

Example problem: Selling ice creamFigure 3: How many units of ice cream should I stock?6

The challenge Create a model that predicts the number of ice creamssold for different temperatures Ensure model behaves well when the temperature dropsto 0ºC and for a very hot summer’s day at 35ºC.7

Available dataicecream - data.frame(temp c(11.9, 14.2, 15.2, 16.4, 17.2, 18.1,18.5, 19.4, 22.1, 22.6, 23.4, 25.1),units c(185L, 215L, 332L, 325L, 408L, 421L,406L, 412L, 522L, 445L, 544L, 614L))8

Plot of ice cream data400300200Units sold500600Number of ice creams sold12141618202224Temperature (Celsius)9

Use a ruler and pencilNumber of ice creams sold400300200Units sold500600observationlinear least square12141618202224Temperature (Celsius)10

Model 1: Gaussian GLM11

Linear regressionLet’s start with a probability distribution centric description ofthe data.I believe the observation yi was drawn from a Normaldistribution with a mean µi , depending on the temperature xiand a constant variance σ 2 across all temperatures.yi N (µi , σ 2 ),E[yi ] µi α βxi for all i(1)(2)The first equation describes the distribution of the responseand the second equation how the distribution parameter µi islinked in a linear way.12

Model 1: Gaussian GLMlibrary(arm) # for 'display' function onlylin.mod - glm(units temp, data icecream,family gaussian(link "identity"))display(lin.mod)glm(formula units temp, family gaussian(link "identity"),data icecream)coef.est coef.se(Intercept) -159.4754.64temp30.092.87--n 12, k 2residual deviance 14536.3, null deviance 174754.9 (difference 160218.6)overdispersion parameter 1453.6residual sd is sqrt(overdispersion) 38.1313

Model 1: Gaussian GLM PlotFigure 4: yi N (µi , σ 2 ) with µi 159.5 30.1 xi and σ 38.114

Model 1: CritiqueAlthough the linear model looks fine in the range oftemperatures observed, it doesn’t make much sense at 0ºC.The intercept is at -159, which would mean that customersreturn on average 159 units of ice cream on a freezing day.Well, I don’t think so.15

Model 2: Log-transformedGaussian GLM16

Model 2: Log-transformed Gaussian GLMOk, perhaps I can transform the data first. Ideally I would likeensure that the transformed data has only positive values. So,let’s model the ice cream sales on a logarithmic scale.log(yi ) N (µi , σ 2 )(3)E[log(yi )] µi α βxi(4)This model implies that I believe the sales follow a log-normaldistribution: yi log N (µi , σ 2 ).17

Model 2: Lognormal distributionThe log-normal distribution is skewed to the right, whichmeans that I regard higher sales figures more likely than lowersales figures.Although the model is still linear on a log-scale, I have toremember to transform the predictions back to the originalscale because E[log(yi )] ̸ log(E[yi ]).yi log N (µi , σ 2 )2(5)2E[yi ] exp(µi σ /2) exp(α βxi σ /2)(6)18

Model 2: Log-transformed Gaussian GLMlog.lin.mod - glm(log(units) temp, data icecream,family gaussian(link "identity"))display(log.lin.mod)glm(formula log(units) temp, family gaussian(link "identity"),data icecream)coef.est coef.se(Intercept) 4.400.20temp0.080.01--n 12, k 2residual deviance 0.2, null deviance 1.4 (difference 1.2)overdispersion parameter 0.0residual sd is sqrt(overdispersion) 0.1419

Model 2: Log-transformed Gaussian GLMlog.lin.sig - summary(log.lin.mod) dispersionlog.lin.pred - exp(predict(log.lin.mod) 0.5*log.lin.sig)600Number of ice creams sold400300200Units sold500observationlog transformed LM12141618202224Temperature (Celsius)20

Model 2: Log-transformed Gaussian GLMFigure 5: yi log N (µi , σ 2 ), µi 4.4 0.08 xi and E[yi ] exp(4.4 0.08 xi 12 0.142 )21

Model 2: CritqueThis plot looks a little better than the previous linear modeland it predicts that I would sell, on average, 82 ice creamswhen the temperature is 0ºC:exp(coef(log.lin.mod)[1] 0.5 * log.lin.sig)(Intercept)82.3945But the assumed model distribution generates real numbers,while ice cream sales only occur in whole numbers.22

Model 3: Poisson GLM23

Model 3: Poisson GLMThe classic approach for count data is the Poisson distribution.The Poisson distribution has only one parameter, here µi ,which is also its expected value. The canonical link functionfor µi is the logarithm, i.e. the logarithm of the expected valueis regarded a linear function of the predictors.This is distinctively different to the log-transformed linearmodel above, where the original data was transformed, not theexpected value of the data.24

Model 3: Poisson GLMAlthough the expected value of my observation is a realnumber, the Poisson distribution will generate only wholenumbers, in line with the actual sales.yi Poisson(µi )E[yi ] µi exp(α βxi ) exp(α) exp(βxi )log(µi ) α βxi(7)(8)(9)Note, the exponential function turns the additive scale into amultiplicative one.25

Model 3: Poisson GLMpois.mod - glm(units temp, data icecream,family poisson(link "log"))display(pois.mod)glm(formula units temp, family poisson(link "log"), data icecream)coef.est coef.se(Intercept) 4.540.08temp0.080.00--n 12, k 2residual deviance 60.0, null deviance 460.1 (difference 400.1)26

Model 3: Poisson GLMpois.pred - predict(pois.mod, type "response")600Number of ice creams sold400300200Units sold500observationPoisson (log) GLM12141618202224Temperature (Celsius)27

Model 3: Poisson GLMFigure 6: yi Poisson(µi ), log(µi ) 4.54 0.08 xi and E[yi ] µi28

Model 3: CritqueSo far, so good. My model is in line with my observations.Additionally, it will not predict negative sales and if I wouldsimulate from a Poisson distribution I will always only getwhole numbers back.However, my model will also predict that I should expect tosell over 1000 ice creams when the temperature reaches 32ºC:predict(pois.mod, newdata data.frame(temp 32), type "response")11056.65129

Model 4: Binomial GLM30

Model 4: Binomial GLMOk, let’s me think about the problem this way: If I have 800potential sales then I’d like to understand the proportion soldat a given temperature.This suggests a binomial distribution for the number ofsuccessful sales out of 800.The key parameter for the binomial distribution is theprobability of success, the probability that someone will buyice cream as a function of temperature.31

Model 4: Binomial GLMDividing my sales statistics by 800 would give me a first proxyfor the probability of selling all ice cream.Therefore I need an S-shape curve that maps the salesstatistics into probabilities between 0 and 100%.A canonical choice is the logistic function:logit(u) eu1 ue 11 e u32

Model 4: Binomial GLMThe Binomial GLM can be described asyi Binom(n, µi )logit(µi ) α βxiE[yi ] µi logit 1 (α βxi )(10)(11)(12)(13)33

Model 4: Binomial GLMmarket.size - 800icecream opportunity - market.size - icecream unitsbin.glm - glm(cbind(units, opportunity) temp, data icecream,family binomial(link "logit"))display(bin.glm)glm(formula cbind(units, opportunity) temp, family binomial(link "logit"data icecream)coef.est coef.se(Intercept) -2.970.11temp0.160.01--n 12, k 2residual deviance 84.4, null deviance 909.4 (difference 825.0)34

Model 4: Binomial GLMbin.pred - predict(bin.glm, type "response")*market.size600Number of ice creams sold400300200Units sold500observationBinomial (logit) GLM12141618202224Temperature (Celsius)35

Model 4: Binomial GLMFigure 7: yi Binom(n, µi ), logit(µi ) 2.97 0.16 xi and E[yi ] µi logit 1 (µi )36

Model 4: CritqueThis model doesn’t look too bad at all!I can predict sales at 0ºC and 35ºC using the inverse of thelogistic function, which is given in R as ept)39.09618plogis(coef(bin.glm)[1] 937

SummaryLet’s bring all the models together into one graph, withtemperatures ranging from 0 to 35ºC.800Number of ice creams sold4002000Units sold600observationlinear modellog transformed LMPoisson (log) GLMBinomial (logit) GLM05101520253035Temperature (Celsius)38

ConclusionsFitting a model to data requires more than just applying analgorithm. In particular it is worth to think about: the range of expected values: are they bounded or rangefrom to ? the type of observations: do I expect real numbers, wholenumbers or proportions? how to link the distribution parameters to theobservations39

Further observationsThere are many aspects of GLMs which I haven’t touched onhere, such as: all the above models incorporate a fixed level of volatility.However, in practice, the variability of making a sale atlow temperatures might be significantly different than athigh temperatures. Check the residual plots and consideran over-dispersed model. I used so called canonical link functions in my models,which have nice theoretical properties, but other choicesare possible as well. the distribution has to be chosen from the exponentialfamily, e.g. Normal, Gamma, Poisson, binomial, Tweedie,etc.40

Further readings and more detailsBooks: Generalized Linear Models, P. McCullagh and J.A.Nelder An Introduction to Generalized Linear Models, Annette J.Dobson and Adrian Barnett Data Analysis Using Regression and Multilevel/Hierarchical Models, Andrew Gelman and Jennifer HillOn my blog: Generalized Linear Models in R Visualising theoretical distributions of GLMs Bayesian regression models using Stan in R41

The End. Thanks.42

Nov 21, 2017 · Generalized Linear Models in R Markus Gesmann 21 November 2017 1. Agenda Generalized linear models (GLMs) are a flexible generalization of linear models, with applications in many disciplines. This talk will give an introd

Related Documents: