Lecture 17 Outliers & Influential Observations

3y ago

33 Views

2 Downloads

213.88 KB

29 Pages

Last View : 29d ago

Last Download : 3m ago

Upload by : Carlos Cepeda

Report this link

Download PDF

Transcription

Lecture 17Outliers & InfluentialObservationsSTAT 512Spring 2011Background ReadingKNNL: Sections 10.2-10.417-1

Topic Overview Statistical Methods for Identifying Outliers /Influential Observations CDI/Physicians Case Study Remedial Measures17-2

Outlier Detection in MLR We can have both X and Y outliers In SLR, outliers were relatively easy to detectvia scatterplots or residual plots. In MLR, it becomes more difficult to detectoutlier via simple plots.o Univariate outliers may not be as extremein a MLRo Some multivariate outliers may not bedetectable in single-variable analyses17-3

Using ResidualsDetecting Outliers in the Response (Y) Have seen how we can use residuals foridentifying problems with normality,constancy of variance, linearity. Could also use residuals to identify outlyingvalues in Y (large magnitude impliesextreme value) Residuals don’t really have a “scale”, so.What defines a large magnitude? Needsomething more standard17-4

Semi-studentized Residualsiid Recall thatε i N (0, σ 2 ) ,εi 0 N (0,1)σso:are “standardized errors” However, we don’t know the true errors orσ , so we use residuals and MSE . When you divide the residuals by MSE ,you have semi-studentized residuals. Slightly better than regular residuals, canuse them in the same ways we usedresiduals.17-5

Studentized Residuals Previous is a “quick fix” because thestandard deviation of a residual is actuallys {ei } MSE (1 hii ) Where hii are the ith elements on the maindiagonal of the hat matrix, between 0 and 1 Goal is to consider the magnitude of eachresidual, relative to its standard deviation. Studentized Residuals are ie eiMSE (1 hii ) t (n p)17-6

Studentized Deleted Residuals Another Refinement – each residual isobtained by regressing using all of dataexcept for the point in question Similar to what is done to compute PRESSstatistic:di Yi Yˆi(i ) Note: Formula available to avoid computingthe entire regression over and over.di ei / (1 hii )17-7

Studentized Deleted Resid. (2) Standard deviation for this residual isMSE(i )s {di } 1 hiidiei ti is called thes {di }MSE(i ) (1 hii )studentized deleted residual (SDR). Follows a T-distribution with n – p – 1degrees of freedom allowing us to knowwhat constitutes an “extreme value”.17-8

Studentized Deleted Resid (3) Alternative formula to calculate thesewithout rerunning the regression n timesn p 1ti eiSSE (1 hii ) ei2 SAS of course uses this, and matrices, to doall of the arithmetic quickly17-9

Using Studentized Residuals Both studentized and studentized deletedresiduals can be quite useful for identifyingoutliers Since we know they have a T-distribution,for reasonable size n, an SDR ofmagnitude 3 or more (in abs. value) will beconsidered an outlier. Any with magnitudebetween 2-3 may be close depending onsignificance level used (see tables). Many high SDR indicates inadequate model.17-10

Regular vs. “Deleted” Both generally tend to give similarinformation. “Deleted” perhaps is the preferred methodsince this method means that each datapoint is not used in computing its ownresidual and gives us something tocompare to as an “extreme value”.17-11

Formal Test for Outliers in Y Test each of the n residuals to determine if itis an outlier. Bonferroni used to adjust for the n tests –significance level becomes 0.05 / n. Compare studentized deleted residuals (inabsolute value) to a T-critical value usingthe above alpha, and n – p – 1 degrees offreedom SDR’s that are larger in magnitude than thecritical value identify outliers.17-12

CDI / Physicians Example(cdi outliers.sas) Note: We leave LA and Chicago in themodel this time. More “options” for the model statement /r produces analysis of the residuals /influence produces influence statistics Work with 5-variable model from last time(tot income, beds, crimes, hsgrad,unemploy)17-13

Example (2)proc reg data cdi outest fits;model lphys beds tot income hsgradcrimes unemploy /r;run; Produces several pages of output since eachresidual information is given for each ofthe 440 data points We’ll look at only a small part of thisoutput, for illustration17-14

1 0 1 2D-9.380 ****** 12.186-5.535 ****** 1.130-1.627 *** 0.0290.974 * 0.0060.773 * 0.0083.676 ****** 6.5410.611 * 0.001-0.676 * 0.0040.711 * 0.0050.633 * 0.002Note: 1 LA, 2 Cook, 6 Kings17-15

Leverage Values Outliers in X can be identified because theywill have large leverage values. Theleverage is just hii from the hat matrix. In general, 0 hii 1 and hii p Large leverage values indicate the ith case isdistant from the center of all X obs. Leverage considered large if it is bigger thantwice the mean leverage value, 2p / n . Leverages can also be used to identifyhidden extrapolation (page 400 of KNNL).17-16

Physicians Example /influence used in the model statement to getleverage values (called hat diag H in theoutput) Can also get these statistics into a datasetusing an OUTPUT statementproc reg data cdi;model lphys beds tot income hsgrad crimesunemploy /influence;output out diag student studresids h leveragerstudent studdelresid;proc sort data diag; by studdelresid;proc print data diag;var county studresids leverage studdelresid;17-17

OutputRemember we can compare leverage to 2p/n 0.031234437438439440countystudresidsLos Ange-9.380Cook-5.535Sarpy-3.378Livingst-2.174San Fran1.935New 2.181.9412.0552.3383.73017-18

Other Influence Statistics Not all outliers have a strong influence onthe fitted model. Some measures to detectthe influence of each observation are:o Cook’s Distance measures the influenceof an observation on all fitted valueso DFFits measures the influence of anobservation on its own fitted valueo DFBeta measures the influence of anobservation on a particular regressioncoefficient17-19

Cook’s Distance Assess the influence of a data point in ALLpredicted values Obtain from SAS using /r Large values suggest that an observation hasa lot of influence (can compare to anF(p, n-p) distribution).17-20

DFFits Assess the influence of a data point in ITSOWN prediction only Obtain from SAS using /influence Essentially measures difference betweenprediction of itself with/without using thatobservation in the computation Large absolute values (bigger than 1, orbigger than 2 p / n ) suggest that anobservation has a lot of influence on itsown prediction17-21

DFBetas One per parameter per observation Obtained using /influence in proc reg Assess the influence of each observation oneach parameter individually Absolute values bigger than 1 or 2/ n areconsidered large17-22

Exampleproc reg data cdi ;model lphys beds tot income hsgradcrimes unemploy /r influence;output out diag dffits dffitcookd cooksd;proc sort data diag; by descending cooksd;proc print data diag;var county dffit cooksd; run;17-23

Output12345678countyLos 02530.02180.021717-24

Conclusions Compare DFFits to 2 p / n 0.23 Could assess Cook’s Distance using F-distn. Los Angeles, Kings, and Cook counties havean overwhelming amount of influence,both on their own fitted values as well ason the regression line itself If look at DFBetas (only way to do this is toview the output from /influence), will seesimilar influence on the parameterscompare to 2 / n 0.316 .17-25

Influential Observations Big question now is, once we identify anoutlier, or influential observation, what dowe do with it? For a good understanding of the regressionmodel, the analysis IS needed. In ourexample, we now know that we have threecases holding a lot of influence. We maywant to. See what happens when we exclude these from themodel. Investigate these cases separately.17-26

What not to do. Never simply exclude / ignore a data pointjust because you don’t like what it does tothe results Never ignore the fact that you have one ortwo overly influential observations17-27

Some Remedial Measures See Section 11.3 Robust Regression procedures decrease theemphasis of outlying observations Doing this is slightly beyond the scope ofthe class, but it doesn’t hurt to be awarethat such methods exist.17-28

Upcoming in Lecture 18. Miscellaneous topics in MLR.o Chapter 8, Section 10.117-29

Leverage Values Outliers in X can be identified because they will have large leverage values. The leverage is just hii from the hat matrix. In general, 0 1 hii and h pii Large leverage values indicate the ith case is distant from the center of all X obs. Leverage considered large if it is bigger than

Related Documents:

CHEMICAL REACTION ENGINEERING

Introduction of Chemical Reaction Engineering Introduction about Chemical Engineering 0:31:15 0:31:09. Lecture 14 Lecture 15 Lecture 16 Lecture 17 Lecture 18 Lecture 19 Lecture 20 Lecture 21 Lecture 22 Lecture 23 Lecture 24 Lecture 25 Lecture 26 Lecture 27 Lecture 28 Lecture

99 Views

2y ago

The Effect of Influential Outliers on Regression Analysis ...

influential outliers can have a severe distortion on the model of prediction: The aim of this study is to evaluate the influence of outliers using standardized residual and Cook’s distance on the prediction of ozone (O 3) concentrations level by excluding the point of outliers in the observation.

21 Views

3y ago

Unit 6: Simple Linear Regression Lecture 2: Outliers and ...

Types of outliers in linear regression Recap Question True or False? 1 Inﬂuential points always change the intercept of the regression line. 2 Inﬂuential points always reduce R2. 3 It is much more likely for a low leverage point to be inﬂuential, than a high leverage point. 4 When the data set includes an inﬂuential point, the

47 Views

2y ago

Transformations and outliers - MyWeb

Outliers Summary Removing outliers in the tailgating study By removing the outliers, the pooled standard deviation drops from 44 to 12 As a result, our observed di erence is now 1.7 standard errors away from its null hypothesis expected value The p-value goes from 0.53 to 0.09 Patrick Breheny Introduction to Biostatistics (171:161) 17/26

14 Views

2y ago

Visualizing Big Data Outliers through Distributed …

Visualizing Big Data Outliers through Distributed Aggregation Leland Wilkinson Fig. 1. Outliers revealed in a box plot [72] and letter values box plot [36]. These plots are based on 100,000 values sampled from a Gaussian (Standard Normal) distribution. By deﬁnition, the data contain no probable outliers, yet the ordinary box plot shows

29 Views

2y ago

LECTURE NOTES on PROGRAMMING & DATA STRUCTURE Course Code : BCS101

Lecture 1: A Beginner's Guide Lecture 2: Introduction to Programming Lecture 3: Introduction to C, structure of C programming Lecture 4: Elements of C Lecture 5: Variables, Statements, Expressions Lecture 6: Input-Output in C Lecture 7: Formatted Input-Output Lecture 8: Operators Lecture 9: Operators continued

59 Views

1y ago

Residual Analysis and Outliers - people.hsc.edu

Residual Analysis and Outliers Lecture 48 Sections 13.4 - 13.5 Robb T. Koether Hampden-Sydney College Wed, Apr 11, 2012 Robb T. Koether (Hampden-Sydney College) Residual Analysis and Outliers Wed, Apr 11, 2012 1 / 31

92 Views

3y ago

2 INTELLIGENT AGENTS - People

ArtiﬁcialIntelligence: A Modern Approachby Stuart Russell and Peter Norvig, c 1995 Prentice-Hall,Inc. Section 2.3. Structure of Intelligent Agents 35 the ideal mapping for much more general situations: agents that can solve a limitless variety of tasks in a limitless variety of environments. Before we discuss how to do this, we need to look at one more requirement that an intelligent agent .

60 Views

3y ago

Recent Views

Vietnamese Insurance Market Report - Ditp

Insurance agents TOTAL INSURANCE AGENTS IN VIETNAMESE MARKET 6/2016 Until the end of June 2016, total insurance agents increased by 29.5% compared with same period last year to 437,738 agents. Prudential took the lead with 181,808 agents, followed by Bao Viet life with 94,129 agents and Dai-ichi Life with 53,811 agents. e. The number of new .

1y ago

167 Views

Attorney Registration - Certificates of Insurance Upload for Insurance .

Certificate of Insurance ("COI") upload feature Loginfor insurance agents and insurers. Summary: After self-registering for a username and password, agents and insurers will have access to a portal for the upload of Certificates of Insurance. This Guide is for: Insurance agents and insurers who are authorized to upload Certificates of Insurance

1y ago

179 Views

Insurance Act Insurance Agents Regulations - Prince Edward Island

Section 3 Insurance Act Insurance Agents Regulations Page 4 Updated August 1, 2005 t c Restricted life, accident and sickness insurance agents (3) Notwithstanding subsection (2), the Superintendent may, until July 1, 2006, issue a transitional restricted certificate of authority covering life, accident and sickness insurance to

1y ago

131 Views

Insurance Act 1978 - Bermuda Laws

INSURANCE MANAGERS, BROKERS, AGENTS, INSURANCE MARKETPLACE PROVIDERS AND SALESMEN Insurance managers, agents and insurance marketplace providers to maintain lists of insurers for which they act Insurance broker, agent, salesman or insurance marketplace provider deemed agent of insurer in cert

2y ago

280 Views

THE EFFECT OF INSURANCE AGENTS IN INSURANCE PENETRATION IN KENYA By D61 .

Insurance agents sell exclusively the products of a certain insurance company whereas insurance brokers are legally independent from insurance companies. Insurance brokers are often referred to as the insured's agent (Kogi & Maragia, 2011).

1y ago

182 Views

Personal insurance - Car & Business insurance King Price Insurance

The king's insurance options 5 Things you need to know 7 The stuff you need to do 14 How to claim 16 Our commitment to you 20 Car insurance 22 Car warranty 37 Shortfall cover 45 Scratch and dent 46 Tyre and rim 48 Motorbike insurance 53 Trailer and caravan insurance 64 Watercraft insurance 68 Home contents insurance 77 Buildings insurance 89

1y ago

673 Views

CODE OF CONDUCT FOR LICENSED INSURANCE AGENTS - ia

insurance agents when carrying on regulated activities. Secondly, the Code of Conduct supplements the duties and obligations which licensed insurance agents owe their principals (arising from their principal-agent relationship) by providing that agents should comply with the requirements set out by their

1y ago

130 Views

All about auto insurance - Option Consommateurs

of insurance companies with which they have agreements. Insurance agents: agents work for a specific insurance company. Before you decide to do business with either a broker or an agent, check out prices, the products being proposed and the quality of the service. Buying auto insurance 4 All about auto insurance

1y ago

230 Views

Gold Tier - MAPFRE Insurance

Foy Insurance of MA, LLC 198 Frank Consolati Insurance Agency, Inc. 198 County Insurance Agency, Inc. 198 Woodrow W Cross Agency 214 Woodland Insurance Agency, Inc. 214 Tegeler Insurance Services of CT, Inc. 214 Pantano/VonKahle Insurance Agency, Inc. 214 . Hanson Insurance Agency, Inc. 287 J.H. Slattery Insurance Agency, Inc. 287

1y ago

565 Views

Independent Insurance Agents & Brokers of Louisiana - IIABL

Independent Insurance Agents & Brokers of Louisiana Frequently Requested Louisiana Insurance Statutes Independent Insurance Agents & Brokers of Louisiana 9818 Bluebonnet Blvd. Baton Rouge, La 70810 (225) 819-8007 www.IIABL.com

1y ago

109 Views

Insurance and Indemnification Guidelines for State of .

the Contractor's insurance company issues the required insurance policies or endorses existing policies to match the insurance requirements of the contract. As proof of coverage, most insurance agents and brokers will provide a document called a certificate of insurance. While a certificate is evidence that the Contractor has an insurance policy,

1y ago

151 Views

SPECIAL REPORT Young Agents Survey - Insurance Journal

agents and agency owners in particular — better get ready to step up. This is good news for young professionals working in independent agencies today — those 40 years old and younger. According to Insurance Journal's Young Agents Survey 2015, 82.7 percent of young agents feel very optimistic or optimistic

1y ago

119 Views

Brokers and Agents and Health Insurance Exchanges A

National Association of Insurance Commissioners distinguishes their roles as follows: Brokers act on behalf of the consumer. They can be compensated by the consumer or receive compensation from an insurance company. Agents are loyal to an insurance company and sell, solicit, or negotiate insurance on behalf of the insurer.

1y ago

129 Views

Consumer Guide to Auto Insurance - csimt.gov

consumer guide to auto insurance contents introduction to auto insurance 1 understanding your auto insurance policy 2 required auto insurance 3 optional types of auto insurance 4-5 getting the right coverage 6 accidents and violations 7 how to shop for auto insurance 8 shopping tips 9 frequently asked questions 10-11 insurance complaints/when you have a problem 12

2y ago

805 Views

Industry Observations Insurance Industry

Jun 30, 2019 · 6/17/2019 Commercial Insurance Branch of Extraco Banks, N.A. Higginbotham Insurance Group, Inc. Insurance Brokers NA 6/13/2019 Links Insurance Services, LLC World Insurance Associates LLC Property and Casualty Insurance NA 6/13/2019 Abram Interstate Insurance Services, Inc. Risk Placement Services,

2y ago

619 Views

Lecture 17 Outliers & Influential Observations

It looks like you're using an ad-blocker