STAT 511 - Lecture : Simple Linear Regression Devore .

2y ago
21 Views
4 Downloads
731.20 KB
46 Pages
Last View : 18d ago
Last Download : 3m ago
Upload by : Axel Lin
Transcription

STAT 511Lecture : Simple linear regressionDevore: Section 12.1-12.4Prof. Michael LevineApril 26, 2020LevineSTAT 511

I A simple linear regression investigates the relationshipbetween the two variables that is not deterministic. Thevariable whose value is fixed by the experimenter is called theindependent, predictor or explanatory variable. For fixed x,the second variable is random; it is referred to as thedependent or response variable.I The data is usually given as n pairs (x1 , y1 ), . . . , (xn , yn ). Ascatter plot gives a good indication of the nature of therelationship between the two variables.LevineSTAT 511

A Linear ModelI The usual linear regression model isY β0 β1 x where ε N(0, σ 2 )I ε is the random error term or the random deviation whilethe line y β0 β1 x is called the true (or population)regression lineI This model assumes that E Y β0 β1 x while thedeterministic model assumes that y β0 β1 xLevineSTAT 511

IllustrationLevineSTAT 511

Appropriateness of the linear regression modelI Sometimes, such a model is suggested by physicalconsiderationsI More commonly, it is simply suggested from an inspection of ascatter plotLevineSTAT 511

Implications of the linear regression modelI Let x be a specific value of x,, corresponding mean andvariance are E (Y x ) and V (Y x )I E.g. x is the age of a child, Y is the size of the child’svocabularyI The meaning: E (Y x ) β0 β1 x and V (Y x ) σ 2LevineSTAT 511

Implications of the linear regression modelI Thus, the line Y β0 β1 x is the line of of mean valuesI The slope β1 is the expected change in E Y with one unitchange in xI σ 2 does not depend on x so the amount of variability in Ystays the same for all xLevineSTAT 511

IllustrationLevineSTAT 511

ExampleI The relationship between applied stress x and time-to-failurey is described by the simple linear regression model with trueregression line y 65–1.2x and σ 8I For any fixed value x of stress, time-to-failure has a normaldistribution with mean value 65–1.2x and standard deviation8I For x 20,E (Y x 20) 65 1.2 20 41I Thus, e.g. 50 41P(Y 50 x 20) P Z 1 Φ(1.13) .12928 LevineSTAT 511

IllustrationLevineSTAT 511

ExampleI Suppose that Y1 denotes an observation on time-to-failuremade with x 25 and Y2 denotes an independent observationmade with x 24I Y1 Y2 is normally distributed, E (Y1 Y2 ) β1 1.2;V (Y1 Y2 ) 2σ 2 128I Thus, 0 ( 1.2) P(Z .11) .4562P(Y1 Y2 0) P Z 11.314I Even though the slope is negative, it is not inconceivable thatY1 Y2LevineSTAT 511

Estimating Model ParametersI The usual way to estimate parameters of a linear regressionmodels is by using the Least Squares approach suggested byGaussI The vertical deviation of the point (xi , yi ) from the liney b0 b1 x isyi (b0 b1 xi )I The sum of squared deviations from the data points(x1 , y1 ), . . . , (xn , yn ) to the line isf (b0 , b1 ) nX[yi (b0 b1 xi )]2i 1LevineSTAT 511

Estimating Model ParametersI The point estimates of the true model coefficients β0 and β1are denoted β̂0 and β̂1 . They are the values that minimizef (b0 , b1 ).I In other words, they are such that f (β̂0 , β̂1 ) f (b0 , b1 ) forany b0 and b1 .I β̂0 and β̂1 are called the least squares estimatesI The estimated regression line is y β̂0 β̂1 xLevineSTAT 511

System of normal equationsIXXnb0 b1 (xi ) yiXXXb0 (xi ) b1 (xi2 ) xi yiI If not all xi are identical, there is a unique solution - leastsquaresLevineSTAT 511

SolutionsIPb1 β̂1 Sxy(xi x̄)(yi ȳ )P 2(xi x̄)SxxIb0 ȳ β̂1 x̄LevineSTAT 511

ExampleI The cetane number is a critical property in specifying theignition quality of a fuel used in a diesel engine.Determination of this number for a biodiesel fuel is expensiveand time-consuming.I The iodine value is the amount of iodine necessary to saturatea sample of 100 g of oilI x iodine value (g) and y cetane number for a sample of14 biofuels.LevineSTAT 511

ExampleI β̂1 .20938742I Thus, expected change in true average cetane numberassociated with 1 g decrease in iodine value is about .209I The estimated β̂0 ȳ β̂1 x̄ 75.212432LevineSTAT 511

ExampleLevineSTAT 511

Estimating σ 2I Estimating σ 2 is needed to get confidence intervals and/ortest hypotheses about coefficients of the regression modelI The fitted values are ŷ1 β̂0 β̂1 x1 , . . . , ŷn β̂0 β̂1 xnI The residuals are y1 ŷ1 , . . . , yn ŷnI The residuals are needed to estimate the variance of errors;specifically,SSEσ̂ 2 n 2Pwhere the error sum of squares is SSE ni 1 (yi ŷi )2LevineSTAT 511

Computational formula for SSEI The direct computation of SSE is rather involvedI A better option is to use the computation formulaSSE Syy β̂1 SxyI This formula does not involve computation of predicted valuesand residualsI It is, however, very sensitive to the rounding effect in β̂0 andβ̂1LevineSTAT 511

Variation in the dataLevineSTAT 511

R 2 - coefficient of determination II How much of the total variation can the linear regressionmodel explain? That total variation will be described by thetotal sum of squaresSST nX(yi ȳ )2i 1I It is always true that SSE SST, so we can defineR2 1 SSESSTwhich is a number between 0 and 1 that suggests how muchof the total variation is explained by the regression modelLevineSTAT 511

R 2 - coefficient of determination IIPSSRI Its alternative form is R 2 SSTwhere SSR ni 1 (ŷi ȳ )2is the regression sum of squaresI The same identity as before in ANOVA analysis is trueSST SSR SSEI Cetane number-iodine value example: high value of R 2LevineSTAT 511

Parameter estimatorsI An estimator of β1 isPnβ̂1 (x x̄)(Yi i 1Pn i2i 1 (xi x̄)Ȳ )I An estimator of β0 isβ̂0 Ȳ β̂0 x̄I An estimator of σ 2 isPnPnPn222i 1 Yi β̂0i 1 Yi β̂1i 1 xi Yiσ̂ S n 2LevineSTAT 511

β̂1 as a linear estimator of the slopeI Verify thatβ̂1 nXci Yii 1where ci xi x̄Sxx 2I Consequently, β̂1 N β1 , SσxxI The variance of β̂1 is estimated byLevine s .SxxSTAT 511

A confidence interval for β1I Note thatT β̂1 β1 tn 2S/ SxxI Thus, the 100%(1 α) CI for β1 isSβ̂1 tα/2,n 2 SxxLevineSTAT 511

ExampleI When damage to a timber structure occurs, it may be moreeconomical to repair the damaged area rather than replace theentire structureI The dependent variable is y rupture load (N) and theindependent variable is anchorage length (the additionallength of material used to bond at the junction), in mmLevineSTAT 511

LevineSTAT 511

ExampleI Main quantities are Sxx 18, 000 error df 10 2 8,s 2661.33. The estimated standard error is s 19.836SxxI The 95% confidence interval is123.64 (2.306)(19.836) (77.90, 169.38)LevineSTAT 511

Hypothesis testingI The most common is the model utility test H0 : β1 0 vs.Ha : β1 6 0I The test statistic value ist β̂1 β10sβ̂1I Then, if e.g. Ha : β1 β10 , we have the P-value as the areaunder the tn 2 curve to the right of tLevineSTAT 511

ExampleI Mopeds are very popular in Europe because of cost and easeof operationI They can be dangerous if performance characteristics aremodified. One of the features commonly manipulated is themaximum speedI simple linear regression analysis of the variables x test trackspeed (km/h) and y rolling test speedLevineSTAT 511

LevineSTAT 511

Regression and ANOVAI Note that t 2 f for the test of H0 : β1 0 vs. Ha : β1 6 0LevineSTAT 511

LevineSTAT 511

Inference about the mean µY ·x I For a given value x , the estimated average value of Y isβ̂0 β̂1 x I It can also be viewed as the prediction at the given point x I It is possible to represent the estimated average value of Y asβ̂0 β̂1 x nXi 1where di 1n (xPn x̄)(xi x̄)2i 1 (xi x̄)LevineSTAT 511di Yi

SummaryI E (Ŷ ) β0 β1 x , and the variance is (x x̄)22 1V (Ŷ ) σ nSxxI The estimated variance results from the above by replacing σ 2with s; Ŷ is also normally distributedI To construct a confidence interval or to test a hypothesis, justnote thatŶ (β̂0 β̂1 x )T tn 2SŶLevineSTAT 511

Inferences concerning µY ·x I The variableT β̂0 β̂1 x (β0 β1 x )Sβ̂0 β̂1 x Ŷ (β0 β1 x )SŶhas a t distribution with n 2 dfI The 100%(1 α) CI for E (Y x ) µY ·x isβ̂0 β̂1 x tα/2,n 2 sβ̂0 β̂1 x ŷ tα/2,n 2 sŶLevineSTAT 511

ExampleI Corrosion of steel reinforcing bars is the most importantdurability problem for reinforced concrete structuresI Representative data on x carbonation depth (mm) and y strength (MPa) for a sample of core specimens from abuilding in SingaporeI The scatter plot supports the use of simple linear regression;thus, let us obtain 95% CI for β0 β1 45 for x 45 mmLevineSTAT 511

ExampleI First, β̂1 .297561 and β̂0 27.182936 soŷ 27.182936 .297561 45 13.79I The estimatedrsŶ 2.8640145 36.6111)2 .7582184840.7778I Th 16 df t-critical value is 2.120 and so13.79 (2.120)(.7582) (12.18, 15.40)LevineSTAT 511

ExampleI The following output results from a request to fit the simplelinear regression model and calculate confidence intervals forthe mean value of strength at depths of 45 mm and 35 mmLevineSTAT 511

LevineSTAT 511

CI’s for multiple values of xI In some situations, a CI is desired not just for a single x valuebut for two or more x valuesI Suppose an investigator wishes a CI both for µY ·v and forµY ·w , where v and w are two different values of theindependent variableI The intervals are not independent because the same β̂0 , β̂1and S are used in each. We therefore cannot assert that thejoint confidence level for the two intervals is exactly 90% evenif we select α 0.05I It can be shown, though, that if the 100%(1 α) CI iscomputed both for x v and x w to obtain joint CIs forµY ·v ow and for µY ·w , then the joint confidence level on theresulting pair of intervals is at least 100%(1 2α).LevineSTAT 511

A prediction interval for a future value of YI Sometimes, an investigator may wish to obtain an interval ofplausible values for the value of Y associated with somefuture observation when the independent variable has value x I We may want to relate vocabulary size y to the age of a childx. The CI with x 6 would provide an estimate of trueaverage vocabulary size for all 6-year-old childrenI Alternatively, we might wish an interval of plausible values forthe vocabulary size of a particular 6-year-old childLevineSTAT 511

The error of predictionI The error of prediction is Y (β̂0 β̂1 x )I The variance of the prediction error is 1 (x x̄)2 2V (Y (β̂0 β̂1 x )) σ 1 nSxxI The expected value of the prediction error isE (Y (β̂0 β̂1 x )) 0 andT Y (β̂0 β̂1 x )q tn 2 2S 1 n1 (x S x̄)xxLevineSTAT 511

Prediction intervalI The prediction interval issβ̂0 β̂1 x s1 1 (x x̄)2 nSxxI This interval is always wider than the correspondingconfidence intervalLevineSTAT 511

ExampleI Let’s return to the carbonation depth-strength data andcalculate a 95% PI for a strength value that would result fromselecting a single core specimen whose carbonation depth is45 mmI The relevant quantities are ŷ 13.79, sŶ .7582, s 2.8640I For a prediction level of 95% based on n 2 16 df thecritical value is 2.120I The prediction interval is thenq13.79 2.120 (2.8640)2 (.7582)2 (7.51, 20.07)LevineSTAT 511

Lecture : Simple linear regression Devore: Section 12.1-12.4 Prof. Michael Levine April 26, 2020 Levine STAT 511. I A simple linear regression investigates the relationship between the two variables that is not deterministic. The vari

Related Documents:

Introduction of Chemical Reaction Engineering Introduction about Chemical Engineering 0:31:15 0:31:09. Lecture 14 Lecture 15 Lecture 16 Lecture 17 Lecture 18 Lecture 19 Lecture 20 Lecture 21 Lecture 22 Lecture 23 Lecture 24 Lecture 25 Lecture 26 Lecture 27 Lecture 28 Lecture

STAT 810: Alpha Seminar STAT 822: Statistical Methods ll STAT 821: Statistical Methods l STAT 883: Mathematical Statistics ll STAT 850: Computing Tools Elective STAT 882: Mathematical Statistics l Choose a faculty advisor and form a MS Supervisory Committee STAT 892*: TA Prep Choose an MS Comprehensive Exam option with the

Apr 28, 2016 · New England 511 Traffic Web Page 6 1 About NE 511 1.1 What is NE 511 website? NE511 (short for New England 511) website is a one-stop-shop for up-to-the-minute traffic information in the New England area covering the states of Maine, New Hampshire and Vermont. The Traveler Information System website helps travelers makeFile Size: 2MB

Weil-McLain 380-000-000 3 Drain valve, ¾” (Fittings shown are included with boiler.) Conbraco Hammond Valve Matco-Norca Watts 31-606-01 710 205F04 BD-2C 511-210-423 511-246-392 511-246-392 511-246-392 4 Circulator (Fittings shown are shipped loose with boiler.) Taco 007 511-405-113 5 Circulator gasket, universal (2 per

MET Grid-Stat Tool John Halley Gotway METplus Tutorial July 31 -August 2, 2019 NRL-Monterey, CA. 2 PB2NC ASCII2NC Gridded NetCDF Gridded Forecast Analysis Obs PrepBufr Point STAT ASCII NetCDF Point Obs ASCII . l Grid-Stat, Point-Stat, and Stat-Analysiscan output the ECLV line type.

Lecture 1: A Beginner's Guide Lecture 2: Introduction to Programming Lecture 3: Introduction to C, structure of C programming Lecture 4: Elements of C Lecture 5: Variables, Statements, Expressions Lecture 6: Input-Output in C Lecture 7: Formatted Input-Output Lecture 8: Operators Lecture 9: Operators continued

1 Art: 765874-00 Rev. A Rev. Date: 26-Feb-2020 i-STAT CHEM8 Cartridge Intended for US only. NAME i-STAT CHEM8 Cartridge INTENDED USE The i-STAT CHEM8 cartridge with the i-STAT 1 System is intended for use in the in vitro quantification of sodium, potassium, chloride, ionized calcium, glucose, blood urea nitrogen, creatinine, hematocrit, and total

American Revolution Lapbook Cut out as one piece. You will first fold in the When Where side flap and then fold like an accordion. You will attach the back of the Turnaround square to the lapbook and the Valley Forge square will be the cover. Write in when the troops were at Valley Forge and where Valley Forge is located. Write in what hardships the Continental army faced and how things got .