Residual Analysis And Outliers - People.hsc.edu

2y ago
91 Views
12 Downloads
402.39 KB
46 Pages
Last View : 19d ago
Last Download : 3m ago
Upload by : Emanuel Batten
Transcription

Residual Analysis and OutliersLecture 48Sections 13.4 - 13.5Robb T. KoetherHampden-Sydney CollegeWed, Apr 11, 2012Robb T. Koether (Hampden-Sydney College)Residual Analysis and OutliersWed, Apr 11, 20121 / 31

Outline1Introduction2Residual Analysis3Nonlinear Regression4Outliers and Influential Points5AssignmentRobb T. Koether (Hampden-Sydney College)Residual Analysis and OutliersWed, Apr 11, 20122 / 31

Outline1Introduction2Residual Analysis3Nonlinear Regression4Outliers and Influential Points5AssignmentRobb T. Koether (Hampden-Sydney College)Residual Analysis and OutliersWed, Apr 11, 20123 / 31

IntroductionHow do we know that a linear regression model is the best choice?Robb T. Koether (Hampden-Sydney College)Residual Analysis and OutliersWed, Apr 11, 20124 / 31

IntroductionHow do we know that a linear regression model is the best choice?What other types of regression are there?Robb T. Koether (Hampden-Sydney College)Residual Analysis and OutliersWed, Apr 11, 20124 / 31

IntroductionHow do we know that a linear regression model is the best choice?What other types of regression are there?There are many other types.Robb T. Koether (Hampden-Sydney College)Residual Analysis and OutliersWed, Apr 11, 20124 / 31

IntroductionHow do we know that a linear regression model is the best choice?What other types of regression are there?There are many other types.How many would you like?Robb T. Koether (Hampden-Sydney College)Residual Analysis and OutliersWed, Apr 11, 20124 / 31

IntroductionHow do we know that a linear regression model is the best choice?What other types of regression are there?There are many other types.How many would you like?The linear model is by far the simplest, but it is not the only choice.Robb T. Koether (Hampden-Sydney College)Residual Analysis and OutliersWed, Apr 11, 20124 / 31

TI-83 - Nonlinear RegressionTI-83 Nonlinear RegressionThe TI-83 will do a variety of nonlinear regressions.Press STAT CALC.The list includesLinReg - Linear regression:ŷ a bx.QuadReg - Quadratic regression:ŷ ax 2 bx c.CubicReg - Cubic regression:ŷ ax 3 bx 2 cx d.Robb T. Koether (Hampden-Sydney College)Residual Analysis and OutliersWed, Apr 11, 20125 / 31

TI-83 - Nonlinear RegressionTI-83 Nonlinear RegressionAnd. . .QuartReg - Quartic regression:ŷ ax 4 bx 3 cx 2 dx e.LnReg - Logarithmic regression:ŷ a b ln x.ExpReg - Exponential regression:ŷ abx .Robb T. Koether (Hampden-Sydney College)Residual Analysis and OutliersWed, Apr 11, 20126 / 31

TI-83 - Nonlinear RegressionTI-83 Nonlinear RegressionAnd. . .PwrReg - Power regression:ŷ ax b .Logistic - Logistic regression:ŷ c.1 ae bxSinReg - Sinusoidal regression:ŷ a sin (bx c) d.Robb T. Koether (Hampden-Sydney College)Residual Analysis and OutliersWed, Apr 11, 20127 / 31

Outline1Introduction2Residual Analysis3Nonlinear Regression4Outliers and Influential Points5AssignmentRobb T. Koether (Hampden-Sydney College)Residual Analysis and OutliersWed, Apr 11, 20128 / 31

The Appropriateness of the Linear ModelWe can learn a bit about the nature of the model by examining theresiduals.This is called residual analysis.First, we need to find the residualsei yi ŷi .Then we draw a scatterplot of x versus e and see whether there isa pattern.Robb T. Koether (Hampden-Sydney College)Residual Analysis and OutliersWed, Apr 11, 20129 / 31

The Appropriateness of the Linear ModelTo do this on the TI-83, first find the predicted values ŷ and storethem in L3 :Y1 (L1 ) L3Then find the residuals and store them in L4 :L2 L3 L4Then draw a scatterplot of L1 (x) versus L4 (e).Robb T. Koether (Hampden-Sydney College)Residual Analysis and OutliersWed, Apr 11, 201210 / 31

The Residual PlotExample (Residual Plots)Free lunch rate vs. graduation rateGraduation Rate9080706050400Robb T. Koether (Hampden-Sydney College)102030405060Free Lunch RateResidual Analysis and Outliers7080Wed, Apr 11, 201211 / 31

The Residual PlotExample (Residual Plots)Free lunch rate vs. graduation rateGraduation Rate9080706050400Robb T. Koether (Hampden-Sydney College)102030405060Free Lunch RateResidual Analysis and Outliers7080Wed, Apr 11, 201211 / 31

The Residual PlotExample (Residual Plots)The residual plot20Residuals100-10-2010Robb T. Koether (Hampden-Sydney College)2030405060Free Lunch RateResidual Analysis and Outliers7080Wed, Apr 11, 201211 / 31

The Appropriateness of the Linear ModelIf the residual plot shows no clear pattern, but just a big blob ofpoints, then the linear model is appropriate.On the other hand, if the residual plot shows a distinct curvature,or any other distinct pattern, then the linear model may not beappropriate.Robb T. Koether (Hampden-Sydney College)Residual Analysis and OutliersWed, Apr 11, 201212 / 31

Outline1Introduction2Residual Analysis3Nonlinear Regression4Outliers and Influential Points5AssignmentRobb T. Koether (Hampden-Sydney College)Residual Analysis and OutliersWed, Apr 11, 201213 / 31

A Nonlinear ModelExample (A Nonlinear Model)Consider the following data.x122223344Robb T. Koether (Hampden-Sydney College)y2244578910x56677788Residual Analysis and Outliersy129127911910Wed, Apr 11, 201214 / 31

A Nonlinear ModelExample (A Nonlinear Model)The scatterplot121086420Robb T. Koether (Hampden-Sydney College)123456Residual Analysis and Outliers78Wed, Apr 11, 201215 / 31

A Nonlinear ModelExample (A Nonlinear Model)The regression line121086420Robb T. Koether (Hampden-Sydney College)123456Residual Analysis and Outliers78Wed, Apr 11, 201215 / 31

A Nonlinear ModelExample (A Nonlinear Model)The residual plot420-2-40Robb T. Koether (Hampden-Sydney College)123456Residual Analysis and Outliers78Wed, Apr 11, 201215 / 31

A Nonlinear ModelExample (A Nonlinear Model)The residual plot420-2-40Robb T. Koether (Hampden-Sydney College)123456Residual Analysis and Outliers78Wed, Apr 11, 201215 / 31

A Nonlinear ModelExample (A Nonlinear Model)Quadratic regression121086420Robb T. Koether (Hampden-Sydney College)123456Residual Analysis and Outliers78Wed, Apr 11, 201215 / 31

A Nonlinear ModelExample (A Nonlinear Model)Quadratic regression121086420Robb T. Koether (Hampden-Sydney College)123456Residual Analysis and Outliers78Wed, Apr 11, 201215 / 31

Outline1Introduction2Residual Analysis3Nonlinear Regression4Outliers and Influential Points5AssignmentRobb T. Koether (Hampden-Sydney College)Residual Analysis and OutliersWed, Apr 11, 201216 / 31

OutliersDefinition (Outlier)An outlier is a point with an unusually large residual (e.g., at least 2.5standard deviations from the mean).Definition (Influential Point)An influential point is a point that exerts a inordinate influence on theregression line.Robb T. Koether (Hampden-Sydney College)Residual Analysis and OutliersWed, Apr 11, 201217 / 31

OutliersAn outlier may or may not be influential.An influential point may or may not be an outlier.Robb T. Koether (Hampden-Sydney College)Residual Analysis and OutliersWed, Apr 11, 201218 / 31

Outliers and Influential PointsExample (Outliers and Influential Points)Consider the following data.x123444556Robb T. Koether (Hampden-Sydney College)y6556410343Residual Analysis and OutliersWed, Apr 11, 201219 / 31

Outliers and Influential PointsExample (Outliers and Influential Points)The scatterplot121086420Robb T. Koether (Hampden-Sydney College)123456Residual Analysis and Outliers78Wed, Apr 11, 201220 / 31

Outliers and Influential PointsExample (Outliers and Influential Points)The regression line is ŷ 7.0 0.5x.x123444556Robb T. Koether (Hampden-Sydney College)y6556410343ŷy ŷResidual Analysis and OutliersWed, Apr 11, 201221 / 31

Outliers and Influential PointsExample (Outliers and Influential Points)The regression line is ŷ 7.0 0.5x.x123444556Robb T. Koether (Hampden-Sydney College)y6556410343ŷ6.56.05.55.05.05.04.54.54.0y ŷResidual Analysis and OutliersWed, Apr 11, 201221 / 31

Outliers and Influential PointsExample (Outliers and Influential Points)The regression line is ŷ 7.0 0.5x.x123444556Robb T. Koether (Hampden-Sydney College)y6556410343ŷ6.56.05.55.05.05.04.54.54.0y ŷ 0.5 1.0 0.51.0 1.05.0 1.5 0.5 1.0Residual Analysis and OutliersWed, Apr 11, 201221 / 31

Outliers and Influential PointsThe mean residual is 0.0 (always) and the standard deviation ofthese residuals is 2.0.Thus, the residual 5.0 is 2.5 standard deviations above the mean,an outlier.But, is the point (4, 10) influential?Remove it and see what the effect is.Robb T. Koether (Hampden-Sydney College)Residual Analysis and OutliersWed, Apr 11, 201222 / 31

Outliers and Influential PointsExample (Outliers and Influential Points)Including the point (4, 10)121086420Robb T. Koether (Hampden-Sydney College)123456Residual Analysis and Outliers78Wed, Apr 11, 201223 / 31

Outliers and Influential PointsExample (Outliers and Influential Points)Excluding the point (4, 10)121086420Robb T. Koether (Hampden-Sydney College)123456Residual Analysis and Outliers78Wed, Apr 11, 201223 / 31

Outliers and Influential PointsThe regression line of the remaining points isŷ 6.615 0.564x.This is nearly the same asŷ 7.0 0.5x.Robb T. Koether (Hampden-Sydney College)Residual Analysis and OutliersWed, Apr 11, 201224 / 31

Outliers and Influential PointsNow change the point (4, 10) to the point (12, 12).x1234455612Robb T. Koether (Hampden-Sydney College)y6556434312Residual Analysis and OutliersWed, Apr 11, 201225 / 31

Outliers and Influential Points121086420123456789101112Is (12, 12) an outlier?Robb T. Koether (Hampden-Sydney College)Residual Analysis and OutliersWed, Apr 11, 201226 / 31

Outliers and Influential PointsThe regression line including (12, 12) isŷ 2.767 0.55x.Removing (12, 12) changes it toŷ 6.615 0.564x.Robb T. Koether (Hampden-Sydney College)Residual Analysis and OutliersWed, Apr 11, 201227 / 31

Outliers and Influential PointsExample (Outliers and Influential Points)Including the point (12, 12)12108642012Robb T. Koether (Hampden-Sydney College)345678Residual Analysis and Outliers9101112Wed, Apr 11, 201228 / 31

Outliers and Influential PointsExample (Outliers and Influential Points)Excluding the point (12, 12)12108642012Robb T. Koether (Hampden-Sydney College)345678Residual Analysis and Outliers9101112Wed, Apr 11, 201228 / 31

Outliers and Influential PointsYet the residual of (12, 12) is only 2.63.The standard deviation of the set of residuals is 2.12.(12, 12) is only 1.24 standard deviations above the mean.Therefore, (12, 12) is not an outlier, but it is influential.Robb T. Koether (Hampden-Sydney College)Residual Analysis and OutliersWed, Apr 11, 201229 / 31

Outline1Introduction2Residual Analysis3Nonlinear Regression4Outliers and Influential Points5AssignmentRobb T. Koether (Hampden-Sydney College)Residual Analysis and OutliersWed, Apr 11, 201230 / 31

AssignmentHomeworkRead Sections 13.4, 13.5, pages 823 - 834.Let’s Do It! 13.5, 13.6.Exercises 8, 9, 10, page 835.Robb T. Koether (Hampden-Sydney College)Residual Analysis and OutliersWed, Apr 11, 201231 / 31

Residual Analysis and Outliers Lecture 48 Sections 13.4 - 13.5 Robb T. Koether Hampden-Sydney College Wed, Apr 11, 2012 Robb T. Koether (Hampden-Sydney College) Residual Analysis and Outliers Wed, Apr 11, 2012 1 / 31

Related Documents:

influential outliers can have a severe distortion on the model of prediction: The aim of this study is to evaluate the influence of outliers using standardized residual and Cook’s distance on the prediction of ozone (O 3) concentrations level by excluding the point of outliers in the observation.

Outliers Summary Removing outliers in the tailgating study By removing the outliers, the pooled standard deviation drops from 44 to 12 As a result, our observed di erence is now 1.7 standard errors away from its null hypothesis expected value The p-value goes from 0.53 to 0.09 Patrick Breheny Introduction to Biostatistics (171:161) 17/26

Visualizing Big Data Outliers through Distributed Aggregation Leland Wilkinson Fig. 1. Outliers revealed in a box plot [72] and letter values box plot [36]. These plots are based on 100,000 values sampled from a Gaussian (Standard Normal) distribution. By definition, the data contain no probable outliers, yet the ordinary box plot shows

Signi cance Tests for Outliers and In uential Cases An Outlier Test Signi cance Tests for Outliers and In uential Cases An Outlier Test Recall that, with the outlier red point positioned at X 0;Y 6:1, the Studentized Residual was 3.59. This has a t distribution with n 2 degrees of freedom. The 2-sided p-value is 2*(1-pt(3.592,18)) [1] 0. .

to outliers, the weight being a continuous function of the magnitude of the residual. The modification rule considered below is of Huber's type. With these previous studies in mind, we now formu late procedures for treatment of outliers (in sec. 2) and consider their effectiveness (in secs. 3 and 4), for samples of any size.

Three stock solutions of residual solvents in DMSO were used: Residual Solvent Revised Method 467 Class 1 (p/n 5190-0490) Residual Solvent Revised Method 467 Class 2A (p/n 5190-0492) Residual Solvent Revised Method 467 Class 2B (p/n 5190-0491) The sample preparation procedures for each of the three classes are listed below:

Nov 01, 2018 · AWWA Standard C651-14 Section 4.11.3.3 Test for chlorine residual and comply with minimum chlorine residual AWWA Standard C651-14 Section 4.11.3.3 Test for chlorine residual and comply with minimum chlorine residual As applicable, comply with AWWA Standard C651-14 Test for chlorine residual and comply with minimum chlorine residualFile Size: 1MB

First Contact Practitioners and Advanced Practitioners in Primary Care: (Musculoskeletal) A Roadmap to Practice 12.9 Tutorial record 75 12.10 Tutorial evaluation 76 12.11 Multi-professional Supervision in Primary Care for First Contact & Advanced Practitioners - course overview 77