Chapter 7 Simple Linear Regression And Correlation

2y ago
46 Views
2 Downloads
384.25 KB
24 Pages
Last View : 1d ago
Last Download : 3m ago
Upload by : Aliana Wahl
Transcription

Chapter 7Simple linear regression and correlationDepartment of Statistics and Operations ResearchNovember 24, 2019

Plan1Correlation2Simple linear regression

Plan1Correlation2Simple linear regression

DefinitionThe measure of linear association ρ between two variables X and Yis estimated by the sample correlation coefficient r , whereSxyr Sxx Syywith Sxy nP(xi x)(yi y ), Sxx i 1nP(xi x)2 andi 1Syy nXi 1(yi y )2 .

ExampleLet consider the following grades of 6 students selected at randomMathematics gradeEnglish grade707492848063748765788390We haven 6,Sxy 115.33,Hencer pSxx 471.33,115.33(471.33)(491.33)and 0.24.Syy 491.33.

Properties of r1r 1 iff all (xi , yi ) pairs lie on straight line with positive slope,2r 1 iff all (xi , yi ) pairs lie on a straight line with negativeslope.

Plan1Correlation2Simple linear regression

The form of a relationship between the response Y (the dependentor the response variable) and the regressor X (the independentvariable) is in mathematically the linear relationshipY β0 β1 X εiwhere, β0 is the intercept, β1 the slope and εi , the error term in themodel, is a random variable with mean 0 and constant variance.An important aspect of regression analysis is to estimate theparameters β0 and β1 (i.e., estimate the so-called regressioncoefficients). The method of estimation will be discussed in thenext section. Suppose we denote the estimates b0 for β0 and b1 forβ1 . Then the estimated or fitted regression line is given byŶ b0 b1 xwhere Ŷ is the predicted or fitted value.

Least Squares and the Fitted ModelDefinitionGiven a set of regression data {(xi , yi ); i 1, 2, ., n} and a fittedmodel, ŷi b0 b1 xi , the i th residual ei is given byei yi ŷi , i 1, 2, ., n.

We shall find b0 and b1 , the estimates of β0 and β1 , so that thesum of the squares of the residuals is a minimum. Thisminimization procedure for estimating the parameters is called themethod of least squares. Hence, we shall find b0 and b1 so as tominimizeSSE nXi 1ei2 nX(yi ŷi )2 i 1SSE is called the error sum of squares.nX(yi b0 b1 xi )2i 1

TheoremGiven the sample {(xi , yi ); i 1, 2, ., n}, the least squaresestimates b0 and b1 of the regression coefficients β0 and β1 arecomputed from the formulasPnPn(x x)(yi y )i 1i 1 xi yi nx yPn iP b1 n222(x x)i 1 ii 1 xi nxb0 y b1 x

ExampleConsider the experimental data in Table, which were obtained from33 samples of chemically treated waste in a study conducted atVirginia Tech. Readings on x, the percent reduction in total solids,and y , the percent reduction in chemical oxygen demand, wererecorded. We denote byx: Solids Reductiony: Oxygen Demand

x 82725353040323432343738x (%),36373839393940414242434445464750y (%)34363837364539414044374446464951

The estimated regression line is given byŷ 3.8296 0.9036x.Using the regression line, we would predict a 31% reduction in thechemical oxygen demand when the reduction in the total solids is30%. The 31% reduction in the chemical oxygen demand may beinterpreted as an estimate of the population mean µY 30 or as anestimate of a new observation when the reduction in total solids is30%.

Properties of the Least Squares EstimatorsTheoremWe have12E (b0 ) β0 , E (b1 ) β1 ,σ2σ2 .V (b1 ) Pn2Sxxi 1 (xi x)TheoremAn unbiased estimate of σ 2 , named the mean squared error, isPn(yi ŷi )2SSE2σb i 1n 2n 2

Inferences Concerning the Regression CoefficientsTheoremAssume now that the errors εi are normally distributed. A100(1 α)% confidence interval for the parameter β1 in theregression lineσbσbb1 tα/2 β1 b1 tα/2 SxxSxxwhere tα/2 is a value of the t-distribution with n 2 degrees offreedom.

ExampleFind a 95% confidence interval for β1 in the regression line, basedon the pollution data of Example 10.SolutionWe show thatSSEσb n 22Pn ŷi )2 0.4299.n 2i 1 (yiTherefore, taking the square root, we obtain σb 3.2295. Also,Sxx nXi 1(xi x)2 4152.18.

Using Table of the t-distribution, we find that t0.025 2.045 for 31degrees of freedom. Therefore, a 95% confidence interval for β1 is3.22953.2295 β1 0.903643 (2.045) 0.903643 (2.045) 4152.184152.18which simplifies to0.8012 β1 1.0061.

Hypothesis Testing on the SlopeTo test the null hypothesis H0 that β1 β10 , we again use thet-distribution with n 2 degrees of freedom to establish a criticalregion and then base our decision on the value oft b1 β10 σb/ Sxxwhich is t-distribution with n 2 degrees of freedom.

ExampleUsing the estimated value b1 0.903643 of Example 10, test thehypothesis that β1 1 against the alternative that β1 1.SolutionThe hypotheses are H0 : β1 1 and H1 : β1 1. Sot 0.903643 1 1.92,3.2295/ 4152.18with n 2 31 degrees of freedom (P 0.03).Decision: P-value 0.05, suggesting strong evidence that β1 1

One important t-test on the slope is the test of the hypothesis H0 :β1 0 versus H1 : β1 6 0. When the null hypothesis is notrejected, the conclusion is that there is no significant linearrelationship between E (y ) and the independent variable x.Rejection of H0 above implies that a significant linear regressionexists.

Measuring Goodness-of-Fit: the Coefficient of DeterminationA goodness-of-fit statistic is a quantity that measures how well amodel explains a given set of data. A linear model fits well if thereis a strong linear relationship between x and y .

DefinitionThe coefficient of determination, R 2 , is given byR2 1 where SSE Pni 1 (yiSSESST ŷi )2 and SST Pni 1 (yi y )2 .

Note that if the fit is perfect, all residuals y ŷi are zero, and thusR 2 1. But if SSE is only slightly smaller than SST , R 2 0. Inthe example of table 10, the coefficient of determinationR 2 0.913, suggests that the model fit to the data explains 91.3%of the variability observed in the response, the reduction inchemical oxygen demand.

Chapter 7 Simple linear regression and correlation Department of Statistics and Operations Research November 24, 2019. Plan 1 Correlation 2 Simple linear regression. Plan 1 Correlation 2 Simple linear regression. De nition The measure of linear association ˆbetween two variables X and Y is estimated by the s

Related Documents:

independent variables. Many other procedures can also fit regression models, but they focus on more specialized forms of regression, such as robust regression, generalized linear regression, nonlinear regression, nonparametric regression, quantile regression, regression modeling of survey data, regression modeling of

Multiple Linear Regression Linear relationship developed from more than 1 predictor variable Simple linear regression: y b m*x y β 0 β 1 * x 1 Multiple linear regression: y β 0 β 1 *x 1 β 2 *x 2 β n *x n β i is a parameter estimate used to generate the linear curve Simple linear model: β 1 is the slope of the line

Probability & Bayesian Inference CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition J. Elder 3 Linear Regression Topics What is linear regression? Example: polynomial curve fitting Other basis families Solving linear regression problems Regularized regression Multiple linear regression

Its simplicity and flexibility makes linear regression one of the most important and widely used statistical prediction methods. There are papers, books, and sequences of courses devoted to linear regression. 1.1Fitting a regression We fit a linear regression to covariate/response data. Each data point is a pair .x;y/, where

Part One: Heir of Ash Chapter 1 Chapter 2 Chapter 3 Chapter 4 Chapter 5 Chapter 6 Chapter 7 Chapter 8 Chapter 9 Chapter 10 Chapter 11 Chapter 12 Chapter 13 Chapter 14 Chapter 15 Chapter 16 Chapter 17 Chapter 18 Chapter 19 Chapter 20 Chapter 21 Chapter 22 Chapter 23 Chapter 24 Chapter 25 Chapter 26 Chapter 27 Chapter 28 Chapter 29 Chapter 30 .

Chapter 12. Simple Linear Regression and Correlation 12.1 The Simple Linear Regression Model 12.2 Fitting the Regression Line 12.3 Inferences on the Slope Rarameter ββββ1111 NIPRL 1 12.4 Inferences on the Regression Line 12.5 Prediction Intervals for Future Response Values 1

LINEAR REGRESSION 12-2.1 Test for Significance of Regression 12-2.2 Tests on Individual Regression Coefficients and Subsets of Coefficients 12-3 CONFIDENCE INTERVALS IN MULTIPLE LINEAR REGRESSION 12-3.1 Confidence Intervals on Individual Regression Coefficients 12-3.2 Confidence Interval

MONDAY 11TH JANUARY, 2021 AT 6.00 PM VENUE VIRTUAL MEETING Dear Councillors, Please find enclosed additional papers relating to the following items for the above mentioned meeting which were not available at the time of collation of the agenda. Item No Title of Report Pages 1. FAMILY SERVICES QUARTERLY UPDATE 3 - 12 Naomi Kwasa 020 8359 6146 naomi.kwasa@Barnet.gov.uk Please note that this will .