Lecture 2 Linear Regression: A Model For The Mean

2y ago
11 Views
2 Downloads
1.36 MB
56 Pages
Last View : 15d ago
Last Download : 3m ago
Upload by : Kelvin Chao
Transcription

Lecture 2Linear Regression:A Model for the MeanSharyn O’Halloran

Closer Look at: Linear Regression ModelLeast squares procedureInferential toolsConfidence and Prediction Intervals U9611AssumptionsRobustnessModel checkingLog transformation (of Y, X, orboth)Spring 20052

Linear Regression: Introduction Data: (Yi, Xi) for i 1,.,n Interest is in the probabilitydistribution of Y as a function of X Linear Regression model: U9611Mean of Y is a straight line function of X,plus an error term or residualGoal is to find the best fit line thatminimizes the sum of the error termsSpring 20053

Estimated regression lineSteer example (see Display 7.3, p. 177)Intercept 6.987Equation for estimated regression line:6.5.73Fitted line 6.98-.73XY 6PH15.5Error term01ltimeFitted v aluesU9611Spring 20052PH4

Create a new variableltime log(time)Regression analysisU9611Spring 20055

Regression TerminologyRegression:Regression the mean of a response variable as afunction of one or more explanatory variables:µ{Y X}Regression model:model an ideal formula to approximatethe regressionSimple linear regression model:modelµ{Y X } β 0 β 1 X“mean of Y given X” or“regression of Y on X”U9611InterceptSpring 2005SlopeUnknownparameter6

Regression TerminologyYXDependent variableIndependent variableExplained variableExplanatory variableResponse variableControl variableY’s probability distribution is to beexplained by Xb0 and b1 are the regression coefficients(See Display 7.5, p. 180)Note: Y b0 b1 X is NOT simple regressionU9611Spring 20057

Regression Terminology: Estimated coefficientsβ 0 β 1Xβ 0 β 1Xβˆ 0 βˆ 1 Xβˆ 0 βˆ 1 Xβˆ 0β0 β1βˆ 1βˆ 0 βˆ 1ChooseU9611β̂ 0andβ̂ 1 to make the residuals smallSpring 20058

Regression Terminology Fitted value for obs. i is its estimatedmean: ˆY fiti µ{Y X } β 0 β1 X Residual for obs. i:resi Yi - fit i ei Yi Yˆ Least Squares statistical estimationmethod finds those estimates thatminimize the sum of squared residuals.nn2ˆ(y (β βx)) (y y) i 0 1i i2i 1i 1Solution (from calculus) on p. 182 of SleuthU9611Spring 20059

Least Squares Procedure The Least-squares procedure obtains estimates of the linearequation coefficients β0 and β1, in the modelyˆi β0 β1xi by minimizing the sum of the squared residuals or errors (ei)2ˆSSE e ( yi yi )2i This results in a procedure stated asSSE e ( yi ( β 0 β1 xi ))2i 2Choose β0 and β1 so that the quantity is minimized.U9611Spring 200510

Least Squares Procedure The slope coefficient estimator isnβ̂1 ( x X )( yii 1i Y )n2x X() ii 1 CORRELATIONBETWEEN X AND YsY rxysXSTANDARD DEVIATIONOF Y OVER THESTANDARD DEVIATIONOF XAnd the constant or intercept indicator isβˆ0 Y βˆ1 XU9611Spring 200511

Least Squares Procedure(cont.) Note that the regression line always goes throughthe mean X, Y.Think of thisregression line asRelation Between Yield and Fertilizerthe expected value100of Y for a given80value of X.That is, for any value of theindependent variable there isa single most likely value forthe dependent variableY i e l d (B u s h e l / A c r e ) 60Trend line402000100200300400500600700800Fertilizer (lb/Acre)U9611Spring 200512

Tests and Confidence Intervals for β0, β1 Degrees of freedom:(n-2) sample size - number of coefficients Variance {Y X}σ2 (sum of squared residuals)/(n-2) Standard errors (p. 184)Ideal normal model:the sampling distributions of β0 and β1 have theshape of a t-distribution on (n-2) d.f. Do t-tests and CIs as usual (df n-2)U9611Spring 200513

P valuesfor Ho 0ConfidenceintervalsU9611Spring 200514

Inference Tools Hypothesis Test and Confidence Interval for meanof Y at some X:Estimate the mean of Y at X X0 byµˆ {Y X 0 } βˆ 0 βˆ1 X 0Standard Error of βˆ0SE [ µˆ {Y X 0 }] σˆ 1 ( X 0 X )2 n( n 1) s x2Conduct t-test and confidence interval in the usualway (df n-2)U9611Spring 200515

Confidence bands for conditional meansconfidence bandsin simple regressionhave an hourglass shape,narrowest at the mean of Xthe lfitci commandautomaticallycalculate and graphthe confidence bandsU9611Spring 200516

Prediction Prediction of a future Y at X X0 Standard error of prediction:predictionPred(Y X 0 ) µˆ{Y X 0 }SE[Pred(Y X 0 )] σˆ ( SE[ µˆ (Y X 0 )])2Variability of Yabout its mean2Uncertainty inthe estimated mean 95% prediction interval:intervalPred (Y X 0 ) t df (.975) * SE[Pred(Y X 0 )]U9611Spring 200517

Residuals vs. predicted values plotAfter any regression analysiswe can automatically draw aresidual-versus-fitted plotjust by typingU9611Spring 200518

Predicted values (yhat)yhatAfter any regression,the predict command can createa new variable yhatcontaining predicted Y valuesabout its meanU9611Spring 200519

Residuals (e)the resid command can createa new variable econtaining the residualsU9611Spring 200520

The residual-versus-predicted-values plot could bedrawn “by hand” using these commandsU9611Spring 200521

Second type of confidence interval for regressionprediction: “prediction band”This express our uncertaintyin estimatingthe unknown value of Yfor an individual observationwith known X valueCommand:lftci withstdf option

Additional note: Predict can generate two kinds of standard errorsfor the predicted y value, which have two different applications.Confidence bands for individual-case predictions (stdf)-1001Distance1Distance2233Confidence bands for conditional means (stdp)-5000VELOCITY5001000-5000VELOCITY5001000

3Confidence bands for conditional means (stdp)Distance295% confidence intervalfor µ{Y 1000}01confidence band:banda set ofconfidence intervalsfor µ{Y X0}-5000VELOCITY5001000U9611Distance10Calibration interval:intervalvalues of X for which Y0is in aprediction interval-195% prediction intervalfor Y at X 100023Confidence bands for individual-case predictions (stdf)-500Spring 20050VELOCIT Y500100024

Notes about confidence and prediction bands Both are narrowest at the mean of XBeware of extrapolationThe width of the Confidence Interval is zero if n islarge enough; this is not true of the PredictionInterval.U9611Spring 200525

Review of simple linear regression1. Model withµ{Y X } β 0 β 1 Xconstant variance.2. Least squares:squareschoose estimatorsβ0 and β1to minimize the sum ofsquared residuals.var{Y X } σβˆ 1 n (Xi 12ni X )(Yi Y ) / ( X i X ) .i 1βˆ 0 Y βˆ1 Xresi Yi βˆ0 βˆ1 X i (i 1,., n)3. Propertiesof estimators.nσˆ resi /(n 2)22i 1SE ( βˆ1 ) σˆ / (n 1) s x2U961122ˆSpring2005ˆSE ( β 0 ) σ / (1 / n) X /(n 1) s x262

Assumptions of Linear Regression A linear regression model assumes:Linearity: µ {Y X} β0 β1XConstant Variance: var{Y X} σ2Normality Dist. of Y’s at any X is normalIndependence U9611Given Xi’s, the Yi’s are independentSpring 200527

Examples of Violations Non-LinearityThe true relation between the independent anddependent variables may not be linear. For example, consider campaign fundraising and theprobability of winning an election.P (w )The probability ofwinning increases witheach additional dollarspent and then levelsoff after 50,000.Probability ofWinning anElection 5 0 ,0 0 0U9611Spring 2005S p e n d in g28

Consequences of violation of linearity U9611:If “linearity”is violated, misleading conclusionsmay occur (however, the degree of the problemdepends on the degree of non-linearity)Spring 200529

Examples of Violations: Constant Variance Constant Variance or HomoskedasticityThe Homoskedasticity assumption implies that, onaverage, we do not expect to get larger errors insome cases than in others. Of course, due to the luck of the draw, some errors will turnout to be larger then others.But homoskedasticity is violated only when this happens ina predictable manner.Example: income and spending on certain goods. U9611People with higher incomes have more choices about whatto buy.We would expect that there consumption of certain goodsis more variable than for families with lower incomes.Spring 200530

Violation of constant varianceX10X8Spendingε8X6ε6ε (Y6 (a bX6))6εX2ε5X1U961139ε7X4XRelation between Incomeand Spending violateshomoskedasticityX7X9X5Spring 2005ε (Y9 ( a bX9))9As income increases sodo the errors (verticaldistance from thepredicted line)income31

Consequences of non-constant variance If “constant variance” is violated, LS estimatesare still unbiased but SEs, tests, ConfidenceIntervals, and Prediction Intervals are incorrectHowever,the degreedepends U9611Spring 200532

Violation of Normality Non-NormalityNicotine use is characterizedby a large number of peoplenot smoking at all andanother large number ofpeople who smoke everyday.Frequency ofNicotine useAn example of a bimodal distributionU9611Spring 200533

Consequence of non-Normality If “normality” is violated,LS estimates are still unbiasedtests and CIs are quite robustPIs are notOf all theassumptions, this isthe one that weneed to be leastworried aboutviolating.Why?U9611Spring 200534

Violation of Non-independenceResiduals of GNP andConsumption over TimeNon-Independence Highly CorrelatedThe independence assumption meansthat errors terms of two variables will notnecessarily influence one another.Technically, the RESIDUALS or errorterms are uncorrelated. The most common violation occurs withdata that are collected over time or timeseries analysis.Example: high tariff rates in one periodare often associated with very high tariffrates in the next period.Example: Nominal GNP andConsumptionU9611Spring 200535

Consequence of non-independenceIf “independence” is violated:- LS estimates are still unbiased- everything else can be misleadingPlottingcode islitter(5 micefrom eachof 5 litters)U9611Log Height Note that mice fromlitters 4 and 5 havehigher weight andheightSpring 2005Log Weight36

Robustness of least squares The “constant variance” assumption is important. Normality is not too important for confidence intervalsand p-values, but is important for prediction intervals. Long-tailed distributions and/or outliers can heavilyinfluence the results. Non-independence problems: serial correlation (Ch. 15)and cluster effects (we deal with this in Ch. 9-14).Strategy for dealing with these potential problemsPlots; Residual plots; Consider outliers (more in Ch. 11)Log Transformations (Display 8.6)U9611Spring 200537

Tools for model checking Scatterplot of Y vs. X (see Display 8.6 p. 213)* Scatterplot of residuals vs. fitted values**Look for curvature, non-constantvariance, and outliers Normal probability plot (p.224)It is sometimes useful—for checking if thedistribution is symmetric or normal (i.e. for PIs). Lack of fit F-test when there are replicates(Section 8.5).U9611Spring 200538

Scatterplot of Y vs. XCommand: graph twowayCase study: 7.01 page175U9611Y XSpring 200539

Scatterplot of residuals vs. fitted valuesCommand: rvfplot,Case study: 7.01 page175U9611yline(0) Spring 200540

Normal probability plot(p.224)Quantile normal plots comparequantiles of a variable distributionwith quantiles of a normal distributionhaving the samemean and standard deviation.They allow visual inspectionfor departures from normalityin every part of the distribution.Command: qnorm variable,Case study: 7.01, page 175U9611gridSpring 200541

Diagnostic plots of residuals Plot residuals versus fitted values almost always:For simple reg. this is about the same as residuals vs. xLook for outliers, curvature, increasing spread (funnel orhorn shape); then take appropriate action. If data were collected over time, plot residualsversus timeCheck for time trend andSerial correlation If normality is important, use normal probabilityplot.A straight line is expected if distribution is normalU9611Spring 200542

Voltage Example (Case Study 8.1.2) Goal: to describe the distribution ofbreakdown time of an insulating fluid as afunction of voltage applied to it. Y Breakdown timeX VoltageStatistical illustrationsRecognizing the need for a log transformation of theresponse from the scatterplot and the residual plotChecking the simple linear regression fit with a lack-of-fitF-testStata (follows)U9611Spring 200543

Simple regressionThe residuals vsfitted values plotpresentsincreasing spreadwithincreasingfitted valuesNext step:We try withlog(Y) log(time)U9611Spring 200544

Simple regression with Y loggedThe residuals vsfitted values plotdoes not presentany obviouscurvatureor trend in spread.U9611Spring 200545

Interpretation after log transformationsModelDependent IndependentVariableVariableInterpretation of β1Level-levelYX y β1 xLevel-logYlog(X) y (β1/100)% xLog-levellog(Y)X% y (100β1) xLog-loglog(Y)log(X)% y (β1)% xU9611Spring 200546

Dependent variable logged µ{log(Y) X} β0 β1X(if the distribution ofis the same as:log(Y), given X, is symmetric)Median {Y X } e β 0 β 1 X As X increases by 1, what happens?β 0 β1 ( x 1)Median {Y X x 1} e β 0 β1 xMedian {Y X x}e eβ1β1Median {Y X x 1} e Median {Y X x}U9611Spring 200547

Interpretation of Y logged “As X increases by 1, the median of Ychanges by the multiplicative factor ofe β1 .” Or, better:If β1 0: “As X increases by 1, the median of Yincreases by β1(e 1) *100% ”If β1 0: “As X increases by 1, the medianβ(1 e) * 100 %of Y decreases by”1U9611Spring 200548

Example: µ{log(time) voltage} β0 – β1 voltage1- e-0.5 .4U9611Spring 200549

µ{log(time) voltage} 18.96 - .507voltage1- e-0.5 .40-2Log of time until breakdown0246Breakdown time (minutes)50010001500200082500It is estimated that the median breakdown time decreasesby 40% with each 1kV increase in voltage2530Fitted valuesU9611VOLTAGE3540logarithm of breakdown time2530VOLTAGEFitted valuesSpring 20053540TIME50

If the explanatory variable (X) is logged If µ{Y log(X)} β0 β1log(X) then:“Associated with each two-fold increase(i.e doubling) of X is a β1log(2) changein the mean of Y.” U9611An example will follow:Spring 200551

Example with X logged(Display 7.3 – Case 7.1):Y pHX time after slaughter (hrs.)estimated model: µ{Y log(X)} 6.98 - .73log(X).-.73 log(2) -.5 Î “It is estimated that for each76.5pH65.55.56pH6.57doubling of time after slaughter (between 0 and 8 hours) themean pH decreases by .5.”0U9611.51ltimeFitted v alues1.5PH2Spring 2005024TIMEFitted v alues6PH852

Both Y and X logged µ{log(Y) log(X)} β0 β1log(X) is the same as: As X increases by 1, what happens?If β1 0: “As X increases by 1, the median of Yincreases by(elog( 2 ) β1 1) *100%”If β1 0: “As X increases by 1, the median of Ydecreases byU9611(1 elog( 2 ) β1) *100%Spring 2005”53

Example with Y and X loggedDisplay 8.1 page 207Y: number of species on an islandX: island areaµ{log(Y) log(X)} β0 – β1 log(X)U9611Spring 200554

Y and X loggedµ{log(Y) log(X)} 1.94 – .25 log(X)Since e.25log(2) .19“Associated with each doubling ofisland area is a 19% increase in themedian number of bird species”U9611Spring 200555

Example: Log-LogU9611In order to graph the Log-log plotwe need to generate two new variables(natural logarithms)Spring 200556

Lecture 2 Linear Regression: A Model for the Mean Sharyn O’Halloran. U9611 Spring 2005 2 Closer Look at: Linear Regression Model Least squares procedure File Size: 1MB

Related Documents:

independent variables. Many other procedures can also fit regression models, but they focus on more specialized forms of regression, such as robust regression, generalized linear regression, nonlinear regression, nonparametric regression, quantile regression, regression modeling of survey data, regression modeling of

Introduction of Chemical Reaction Engineering Introduction about Chemical Engineering 0:31:15 0:31:09. Lecture 14 Lecture 15 Lecture 16 Lecture 17 Lecture 18 Lecture 19 Lecture 20 Lecture 21 Lecture 22 Lecture 23 Lecture 24 Lecture 25 Lecture 26 Lecture 27 Lecture 28 Lecture

Lecture 1: Linear regression: A basic data analytic tool Lecture 2: Regularization: Constraining the solution Lecture 3: Kernel Method: Enabling nonlinearity Lecture 1: Linear Regression Linear Regression Notation Loss Function Solving the Regression Problem Geome

LINEAR REGRESSION 12-2.1 Test for Significance of Regression 12-2.2 Tests on Individual Regression Coefficients and Subsets of Coefficients 12-3 CONFIDENCE INTERVALS IN MULTIPLE LINEAR REGRESSION 12-3.1 Confidence Intervals on Individual Regression Coefficients 12-3.2 Confidence Interval

Lecture 9: Linear Regression. Goals Linear regression in R Estimating parameters and hypothesis testing with linear models Develop basic concepts of linear regression from a probabilistic framework. Regression Technique used for the modeling and analysis of numerical dataFile Size: 834KB

3 LECTURE 3 : REGRESSION 10 3 Lecture 3 : Regression This lecture was about regression. It started with formally de ning a regression problem. Then a simple regression model called linear regression was discussed. Di erent methods for learning the parameters in the model were next discussed. It also covered least square solution for the problem

Probability & Bayesian Inference CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition J. Elder 3 Linear Regression Topics What is linear regression? Example: polynomial curve fitting Other basis families Solving linear regression problems Regularized regression Multiple linear regression

Its simplicity and flexibility makes linear regression one of the most important and widely used statistical prediction methods. There are papers, books, and sequences of courses devoted to linear regression. 1.1Fitting a regression We fit a linear regression to covariate/response data. Each data point is a pair .x;y/, where