Lecture 2 Linear Regression: A Model For The Mean

2y ago

11 Views

2 Downloads

1.36 MB

56 Pages

Last View : 15d ago

Last Download : 3m ago

Upload by : Kelvin Chao

Report this link

Download PDF

Transcription

Lecture 2Linear Regression:A Model for the MeanSharyn O’Halloran

Closer Look at: Linear Regression ModelLeast squares procedureInferential toolsConfidence and Prediction Intervals U9611AssumptionsRobustnessModel checkingLog transformation (of Y, X, orboth)Spring 20052

Linear Regression: Introduction Data: (Yi, Xi) for i 1,.,n Interest is in the probabilitydistribution of Y as a function of X Linear Regression model: U9611Mean of Y is a straight line function of X,plus an error term or residualGoal is to find the best fit line thatminimizes the sum of the error termsSpring 20053

Estimated regression lineSteer example (see Display 7.3, p. 177)Intercept 6.987Equation for estimated regression line:6.5.73Fitted line 6.98-.73XY 6PH15.5Error term01ltimeFitted v aluesU9611Spring 20052PH4

Create a new variableltime log(time)Regression analysisU9611Spring 20055

Regression TerminologyRegression:Regression the mean of a response variable as afunction of one or more explanatory variables:µ{Y X}Regression model:model an ideal formula to approximatethe regressionSimple linear regression model:modelµ{Y X } β 0 β 1 X“mean of Y given X” or“regression of Y on X”U9611InterceptSpring 2005SlopeUnknownparameter6

Regression TerminologyYXDependent variableIndependent variableExplained variableExplanatory variableResponse variableControl variableY’s probability distribution is to beexplained by Xb0 and b1 are the regression coefficients(See Display 7.5, p. 180)Note: Y b0 b1 X is NOT simple regressionU9611Spring 20057

Regression Terminology: Estimated coefficientsβ 0 β 1Xβ 0 β 1Xβˆ 0 βˆ 1 Xβˆ 0 βˆ 1 Xβˆ 0β0 β1βˆ 1βˆ 0 βˆ 1ChooseU9611β̂ 0andβ̂ 1 to make the residuals smallSpring 20058

Regression Terminology Fitted value for obs. i is its estimatedmean: ˆY fiti µ{Y X } β 0 β1 X Residual for obs. i:resi Yi - fit i ei Yi Yˆ Least Squares statistical estimationmethod finds those estimates thatminimize the sum of squared residuals.nn2ˆ(y (β βx)) (y y) i 0 1i i2i 1i 1Solution (from calculus) on p. 182 of SleuthU9611Spring 20059

Least Squares Procedure The Least-squares procedure obtains estimates of the linearequation coefficients β0 and β1, in the modelyˆi β0 β1xi by minimizing the sum of the squared residuals or errors (ei)2ˆSSE e ( yi yi )2i This results in a procedure stated asSSE e ( yi ( β 0 β1 xi ))2i 2Choose β0 and β1 so that the quantity is minimized.U9611Spring 200510

Least Squares Procedure The slope coefficient estimator isnβ̂1 ( x X )( yii 1i Y )n2x X() ii 1 CORRELATIONBETWEEN X AND YsY rxysXSTANDARD DEVIATIONOF Y OVER THESTANDARD DEVIATIONOF XAnd the constant or intercept indicator isβˆ0 Y βˆ1 XU9611Spring 200511

Least Squares Procedure(cont.) Note that the regression line always goes throughthe mean X, Y.Think of thisregression line asRelation Between Yield and Fertilizerthe expected value100of Y for a given80value of X.That is, for any value of theindependent variable there isa single most likely value forthe dependent variableY i e l d (B u s h e l / A c r e ) 60Trend line402000100200300400500600700800Fertilizer (lb/Acre)U9611Spring 200512

Tests and Confidence Intervals for β0, β1 Degrees of freedom:(n-2) sample size - number of coefficients Variance {Y X}σ2 (sum of squared residuals)/(n-2) Standard errors (p. 184)Ideal normal model:the sampling distributions of β0 and β1 have theshape of a t-distribution on (n-2) d.f. Do t-tests and CIs as usual (df n-2)U9611Spring 200513

P valuesfor Ho 0ConfidenceintervalsU9611Spring 200514

Inference Tools Hypothesis Test and Confidence Interval for meanof Y at some X:Estimate the mean of Y at X X0 byµˆ {Y X 0 } βˆ 0 βˆ1 X 0Standard Error of βˆ0SE [ µˆ {Y X 0 }] σˆ 1 ( X 0 X )2 n( n 1) s x2Conduct t-test and confidence interval in the usualway (df n-2)U9611Spring 200515

Confidence bands for conditional meansconfidence bandsin simple regressionhave an hourglass shape,narrowest at the mean of Xthe lfitci commandautomaticallycalculate and graphthe confidence bandsU9611Spring 200516

Prediction Prediction of a future Y at X X0 Standard error of prediction:predictionPred(Y X 0 ) µˆ{Y X 0 }SE[Pred(Y X 0 )] σˆ ( SE[ µˆ (Y X 0 )])2Variability of Yabout its mean2Uncertainty inthe estimated mean 95% prediction interval:intervalPred (Y X 0 ) t df (.975) * SE[Pred(Y X 0 )]U9611Spring 200517

Residuals vs. predicted values plotAfter any regression analysiswe can automatically draw aresidual-versus-fitted plotjust by typingU9611Spring 200518

Predicted values (yhat)yhatAfter any regression,the predict command can createa new variable yhatcontaining predicted Y valuesabout its meanU9611Spring 200519

Residuals (e)the resid command can createa new variable econtaining the residualsU9611Spring 200520

The residual-versus-predicted-values plot could bedrawn “by hand” using these commandsU9611Spring 200521

Second type of confidence interval for regressionprediction: “prediction band”This express our uncertaintyin estimatingthe unknown value of Yfor an individual observationwith known X valueCommand:lftci withstdf option

Additional note: Predict can generate two kinds of standard errorsfor the predicted y value, which have two different applications.Confidence bands for individual-case predictions (stdf)-1001Distance1Distance2233Confidence bands for conditional means (stdp)-5000VELOCITY5001000-5000VELOCITY5001000

3Confidence bands for conditional means (stdp)Distance295% confidence intervalfor µ{Y 1000}01confidence band:banda set ofconfidence intervalsfor µ{Y X0}-5000VELOCITY5001000U9611Distance10Calibration interval:intervalvalues of X for which Y0is in aprediction interval-195% prediction intervalfor Y at X 100023Confidence bands for individual-case predictions (stdf)-500Spring 20050VELOCIT Y500100024

Notes about confidence and prediction bands Both are narrowest at the mean of XBeware of extrapolationThe width of the Confidence Interval is zero if n islarge enough; this is not true of the PredictionInterval.U9611Spring 200525

Review of simple linear regression1. Model withµ{Y X } β 0 β 1 Xconstant variance.2. Least squares:squareschoose estimatorsβ0 and β1to minimize the sum ofsquared residuals.var{Y X } σβˆ 1 n (Xi 12ni X )(Yi Y ) / ( X i X ) .i 1βˆ 0 Y βˆ1 Xresi Yi βˆ0 βˆ1 X i (i 1,., n)3. Propertiesof estimators.nσˆ resi /(n 2)22i 1SE ( βˆ1 ) σˆ / (n 1) s x2U961122ˆSpring2005ˆSE ( β 0 ) σ / (1 / n) X /(n 1) s x262

Assumptions of Linear Regression A linear regression model assumes:Linearity: µ {Y X} β0 β1XConstant Variance: var{Y X} σ2Normality Dist. of Y’s at any X is normalIndependence U9611Given Xi’s, the Yi’s are independentSpring 200527

Examples of Violations Non-LinearityThe true relation between the independent anddependent variables may not be linear. For example, consider campaign fundraising and theprobability of winning an election.P (w )The probability ofwinning increases witheach additional dollarspent and then levelsoff after 50,000.Probability ofWinning anElection 5 0 ,0 0 0U9611Spring 2005S p e n d in g28

Consequences of violation of linearity U9611:If “linearity”is violated, misleading conclusionsmay occur (however, the degree of the problemdepends on the degree of non-linearity)Spring 200529

Examples of Violations: Constant Variance Constant Variance or HomoskedasticityThe Homoskedasticity assumption implies that, onaverage, we do not expect to get larger errors insome cases than in others. Of course, due to the luck of the draw, some errors will turnout to be larger then others.But homoskedasticity is violated only when this happens ina predictable manner.Example: income and spending on certain goods. U9611People with higher incomes have more choices about whatto buy.We would expect that there consumption of certain goodsis more variable than for families with lower incomes.Spring 200530

Violation of constant varianceX10X8Spendingε8X6ε6ε (Y6 (a bX6))6εX2ε5X1U961139ε7X4XRelation between Incomeand Spending violateshomoskedasticityX7X9X5Spring 2005ε (Y9 ( a bX9))9As income increases sodo the errors (verticaldistance from thepredicted line)income31

Consequences of non-constant variance If “constant variance” is violated, LS estimatesare still unbiased but SEs, tests, ConfidenceIntervals, and Prediction Intervals are incorrectHowever,the degreedepends U9611Spring 200532

Violation of Normality Non-NormalityNicotine use is characterizedby a large number of peoplenot smoking at all andanother large number ofpeople who smoke everyday.Frequency ofNicotine useAn example of a bimodal distributionU9611Spring 200533

Consequence of non-Normality If “normality” is violated,LS estimates are still unbiasedtests and CIs are quite robustPIs are notOf all theassumptions, this isthe one that weneed to be leastworried aboutviolating.Why?U9611Spring 200534

Violation of Non-independenceResiduals of GNP andConsumption over TimeNon-Independence Highly CorrelatedThe independence assumption meansthat errors terms of two variables will notnecessarily influence one another.Technically, the RESIDUALS or errorterms are uncorrelated. The most common violation occurs withdata that are collected over time or timeseries analysis.Example: high tariff rates in one periodare often associated with very high tariffrates in the next period.Example: Nominal GNP andConsumptionU9611Spring 200535

Consequence of non-independenceIf “independence” is violated:- LS estimates are still unbiased- everything else can be misleadingPlottingcode islitter(5 micefrom eachof 5 litters)U9611Log Height Note that mice fromlitters 4 and 5 havehigher weight andheightSpring 2005Log Weight36

Robustness of least squares The “constant variance” assumption is important. Normality is not too important for confidence intervalsand p-values, but is important for prediction intervals. Long-tailed distributions and/or outliers can heavilyinfluence the results. Non-independence problems: serial correlation (Ch. 15)and cluster effects (we deal with this in Ch. 9-14).Strategy for dealing with these potential problemsPlots; Residual plots; Consider outliers (more in Ch. 11)Log Transformations (Display 8.6)U9611Spring 200537

Tools for model checking Scatterplot of Y vs. X (see Display 8.6 p. 213)* Scatterplot of residuals vs. fitted values**Look for curvature, non-constantvariance, and outliers Normal probability plot (p.224)It is sometimes useful—for checking if thedistribution is symmetric or normal (i.e. for PIs). Lack of fit F-test when there are replicates(Section 8.5).U9611Spring 200538

Scatterplot of Y vs. XCommand: graph twowayCase study: 7.01 page175U9611Y XSpring 200539

Scatterplot of residuals vs. fitted valuesCommand: rvfplot,Case study: 7.01 page175U9611yline(0) Spring 200540

Normal probability plot(p.224)Quantile normal plots comparequantiles of a variable distributionwith quantiles of a normal distributionhaving the samemean and standard deviation.They allow visual inspectionfor departures from normalityin every part of the distribution.Command: qnorm variable,Case study: 7.01, page 175U9611gridSpring 200541

Diagnostic plots of residuals Plot residuals versus fitted values almost always:For simple reg. this is about the same as residuals vs. xLook for outliers, curvature, increasing spread (funnel orhorn shape); then take appropriate action. If data were collected over time, plot residualsversus timeCheck for time trend andSerial correlation If normality is important, use normal probabilityplot.A straight line is expected if distribution is normalU9611Spring 200542

Voltage Example (Case Study 8.1.2) Goal: to describe the distribution ofbreakdown time of an insulating fluid as afunction of voltage applied to it. Y Breakdown timeX VoltageStatistical illustrationsRecognizing the need for a log transformation of theresponse from the scatterplot and the residual plotChecking the simple linear regression fit with a lack-of-fitF-testStata (follows)U9611Spring 200543

Simple regressionThe residuals vsfitted values plotpresentsincreasing spreadwithincreasingfitted valuesNext step:We try withlog(Y) log(time)U9611Spring 200544

Simple regression with Y loggedThe residuals vsfitted values plotdoes not presentany obviouscurvatureor trend in spread.U9611Spring 200545

Interpretation after log transformationsModelDependent IndependentVariableVariableInterpretation of β1Level-levelYX y β1 xLevel-logYlog(X) y (β1/100)% xLog-levellog(Y)X% y (100β1) xLog-loglog(Y)log(X)% y (β1)% xU9611Spring 200546

Dependent variable logged µ{log(Y) X} β0 β1X(if the distribution ofis the same as:log(Y), given X, is symmetric)Median {Y X } e β 0 β 1 X As X increases by 1, what happens?β 0 β1 ( x 1)Median {Y X x 1} e β 0 β1 xMedian {Y X x}e eβ1β1Median {Y X x 1} e Median {Y X x}U9611Spring 200547

Interpretation of Y logged “As X increases by 1, the median of Ychanges by the multiplicative factor ofe β1 .” Or, better:If β1 0: “As X increases by 1, the median of Yincreases by β1(e 1) *100% ”If β1 0: “As X increases by 1, the medianβ(1 e) * 100 %of Y decreases by”1U9611Spring 200548

Example: µ{log(time) voltage} β0 – β1 voltage1- e-0.5 .4U9611Spring 200549

µ{log(time) voltage} 18.96 - .507voltage1- e-0.5 .40-2Log of time until breakdown0246Breakdown time (minutes)50010001500200082500It is estimated that the median breakdown time decreasesby 40% with each 1kV increase in voltage2530Fitted valuesU9611VOLTAGE3540logarithm of breakdown time2530VOLTAGEFitted valuesSpring 20053540TIME50

If the explanatory variable (X) is logged If µ{Y log(X)} β0 β1log(X) then:“Associated with each two-fold increase(i.e doubling) of X is a β1log(2) changein the mean of Y.” U9611An example will follow:Spring 200551

Example with X logged(Display 7.3 – Case 7.1):Y pHX time after slaughter (hrs.)estimated model: µ{Y log(X)} 6.98 - .73log(X).-.73 log(2) -.5 Î “It is estimated that for each76.5pH65.55.56pH6.57doubling of time after slaughter (between 0 and 8 hours) themean pH decreases by .5.”0U9611.51ltimeFitted v alues1.5PH2Spring 2005024TIMEFitted v alues6PH852

Both Y and X logged µ{log(Y) log(X)} β0 β1log(X) is the same as: As X increases by 1, what happens?If β1 0: “As X increases by 1, the median of Yincreases by(elog( 2 ) β1 1) *100%”If β1 0: “As X increases by 1, the median of Ydecreases byU9611(1 elog( 2 ) β1) *100%Spring 2005”53

Example with Y and X loggedDisplay 8.1 page 207Y: number of species on an islandX: island areaµ{log(Y) log(X)} β0 – β1 log(X)U9611Spring 200554

Y and X loggedµ{log(Y) log(X)} 1.94 – .25 log(X)Since e.25log(2) .19“Associated with each doubling ofisland area is a 19% increase in themedian number of bird species”U9611Spring 200555

Example: Log-LogU9611In order to graph the Log-log plotwe need to generate two new variables(natural logarithms)Spring 200556

Lecture 2 Linear Regression: A Model for the Mean Sharyn O’Halloran. U9611 Spring 2005 2 Closer Look at: Linear Regression Model Least squares procedure File Size: 1MB

Related Documents:

Introduction to Regression Procedures

independent variables. Many other procedures can also ﬁt regression models, but they focus on more specialized forms of regression, such as robust regression, generalized linear regression, nonlinear regression, nonparametric regression, quantile regression, regression modeling of survey data, regression modeling of

161 Views

2y ago

CHEMICAL REACTION ENGINEERING

Introduction of Chemical Reaction Engineering Introduction about Chemical Engineering 0:31:15 0:31:09. Lecture 14 Lecture 15 Lecture 16 Lecture 17 Lecture 18 Lecture 19 Lecture 20 Lecture 21 Lecture 22 Lecture 23 Lecture 24 Lecture 25 Lecture 26 Lecture 27 Lecture 28 Lecture

100 Views

2y ago

ECE595 / STAT598: Machine Learning I Lecture 01: Linear ...

Lecture 1: Linear regression: A basic data analytic tool Lecture 2: Regularization: Constraining the solution Lecture 3: Kernel Method: Enabling nonlinearity Lecture 1: Linear Regression Linear Regression Notation Loss Function Solving the Regression Problem Geome

46 Views

2y ago

Lecture 14 Multiple Linear Regression and Logistic Regression

LINEAR REGRESSION 12-2.1 Test for Significance of Regression 12-2.2 Tests on Individual Regression Coefficients and Subsets of Coefficients 12-3 CONFIDENCE INTERVALS IN MULTIPLE LINEAR REGRESSION 12-3.1 Confidence Intervals on Individual Regression Coefficients 12-3.2 Confidence Interval

93 Views

2y ago

Lecture 9: Linear Regression - UW Genome Sciences

Lecture 9: Linear Regression. Goals Linear regression in R Estimating parameters and hypothesis testing with linear models Develop basic concepts of linear regression from a probabilistic framework. Regression Technique used for the modeling and analysis of numerical dataFile Size: 834KB

41 Views

2y ago

Lecture notes on CS725 : Machine learning - IIT Bombay

3 LECTURE 3 : REGRESSION 10 3 Lecture 3 : Regression This lecture was about regression. It started with formally de ning a regression problem. Then a simple regression model called linear regression was discussed. Di erent methods for learning the parameters in the model were next discussed. It also covered least square solution for the problem

21 Views

1y ago

LINEAR REGRESSION - York University

Probability & Bayesian Inference CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition J. Elder 3 Linear Regression Topics What is linear regression? Example: polynomial curve fitting Other basis families Solving linear regression problems Regularized regression Multiple linear regression

19 Views

1y ago

Linear regression, Logistic regression, and Generalized Linear Models

Its simplicity and ﬂexibility makes linear regression one of the most important and widely used statistical prediction methods. There are papers, books, and sequences of courses devoted to linear regression. 1.1Fitting a regression We ﬁt a linear regression to covariate/response data. Each data point is a pair .x;y/, where

11 Views

1y ago

Recent Views

Personal insurance - Car & Business insurance King Price Insurance

The king's insurance options 5 Things you need to know 7 The stuff you need to do 14 How to claim 16 Our commitment to you 20 Car insurance 22 Car warranty 37 Shortfall cover 45 Scratch and dent 46 Tyre and rim 48 Motorbike insurance 53 Trailer and caravan insurance 64 Watercraft insurance 68 Home contents insurance 77 Buildings insurance 89

1y ago

685 Views

Gold Tier - MAPFRE Insurance

Foy Insurance of MA, LLC 198 Frank Consolati Insurance Agency, Inc. 198 County Insurance Agency, Inc. 198 Woodrow W Cross Agency 214 Woodland Insurance Agency, Inc. 214 Tegeler Insurance Services of CT, Inc. 214 Pantano/VonKahle Insurance Agency, Inc. 214 . Hanson Insurance Agency, Inc. 287 J.H. Slattery Insurance Agency, Inc. 287

1y ago

577 Views

Consumer Guide to Auto Insurance - csimt.gov

consumer guide to auto insurance contents introduction to auto insurance 1 understanding your auto insurance policy 2 required auto insurance 3 optional types of auto insurance 4-5 getting the right coverage 6 accidents and violations 7 how to shop for auto insurance 8 shopping tips 9 frequently asked questions 10-11 insurance complaints/when you have a problem 12

2y ago

815 Views

Industry Observations Insurance Industry

Jun 30, 2019 · 6/17/2019 Commercial Insurance Branch of Extraco Banks, N.A. Higginbotham Insurance Group, Inc. Insurance Brokers NA 6/13/2019 Links Insurance Services, LLC World Insurance Associates LLC Property and Casualty Insurance NA 6/13/2019 Abram Interstate Insurance Services, Inc. Risk Placement Services,

2y ago

631 Views

Life Insurance Buyer's Guide Life Insurance - National Association of .

Life Insurance uers uide Naional ssociaion of Insurance Commissioners Compare the Different Types of Insurance Policies There are many types of life insurance pol-icies. You should choose a policy with fea-tures that fit your individual needs. Some things to consider are: Term Insurance vs. Cash Value In-surance. Term insurance is intended to

1y ago

531 Views

your guide to understanding auto ins in nh - New Hampshire

Hampshire Insurance Department does not mandate or set Auto Insurance Rates. Auto Insurance Rates will vary by insurance company. This guide is intended to give New Hampshire consumers basic information on auto insurance. It suggests ways to: Lower the cost of your auto insurance, shop for Auto insurance and, file an auto insurance claim.

1y ago

460 Views

18.01.41 - REPLACEMENT OF LIFE INSURANCE AND ANNUITIES - Idaho

Department of Insurance Replacement of Life Insurance and Annuities. Page 3. 04. Existing Life Insurance or Annuity. "Existing Life Insurance or Annuity" means any life insurance or annuity in force, including life insurance under a binding or conditional receipt or a lif e insurance policy or annuity that is within an unconditional refund period.

1y ago

412 Views

EXAMINATION REPORT OF THE ADMIRAL INSURANCE COMPANY AS OF . - Delaware

Berkley Regional Specialty Insurance Comp 31295 DE Carolina Casualty Insurance Company 10510 IA Clermont Insurance Company 33480 IA Continental Western Insurance Company 10804 IA Firemen's Insurance Com pany of Wash, D.C. 21784 DE Gemini Insurance Company 10833 DE Great Divide Insurance Company 25224 ND

1y ago

263 Views

American International Group, Inc. - Federal Reserve

American General Life Insurance Company AGL U.S. Life Insurance Company AGC Life Insurance Company AGC Life U.S. Life Insurance Company The United States Life Insurance Company in the City of New York U.S. Life U.S. Life Insurance Company The Variable Annuity Life Insurance Company VALIC U.S. Life Insurance Company

1y ago

275 Views

Japan's Insurance Market - Toa Re

with 61.6% of net premiums written, of which automobile insurance totaled 48.8% and compulsory automobile liability insurance totaled 12.8%. Fire insurance accounted for 13.7%, miscellaneous casualty insurance including liability insurance accounted for 11.6%, accident insurance accounted for 9.8%, and marine insurance accounted for 3.2%.

1y ago

188 Views

List of Insurance Companies by Insurance Manager - Cayman Islands dollar

2447 Batan Insurance Company SPC, Ltd. 29-Sep-03 1307714 BBG Insurance Services, Ltd. 09-Aug-16 1254 BCHS Insurance, Ltd. 07-Oct-98 1168 Bearacuda Re 01-Aug-97 2639 Bedrock Insurance Limited 24-Nov-05 2150 Bom Ambiente Insurance Company 14-Jun-00 2565 Boundless Insurance Company, Ltd. 01-Dec-04 769 Bucap Limited 03-Mar-89

1y ago

301 Views

Insurance Certificate 713705-3 and Assistance Program

Name of insurance product: Purchase Protection and Travel Insurance for National Bank of Canada Mastercard credit cards, group insurance policy no. 713705 (Schedule A Certificate number 3)/713705-3 Type of insurance product: Purchase insurance and extended warranty and travel insurance (group insurance) Assistance provider contact information

4m ago

59 Views

Oracle Insurance Performance Insight for General Insurance

for General Insurance Overview Oracle Insurance Performance Insight for General Insurance (OIPIGI) is a comprehensive business intelligence system created exclusively for the General Insurance/Property and Casualty (P&C) insurance industry. OIPIGI provides a complete set of web-based analytical and reporting components that enable users to

1y ago

181 Views

S OF GENERAL INSURANCE

General Insurance comprises of insurance of property against fire, burglary etc, personal insurance such as Accident and Health Insurance, and liability insurance which covers legal liabilities. Suitable general Insurance covers are necessary for every family. It is important to protect one’s property, which

3y ago

286 Views

Insurance Act 1978 - Bermuda Laws

INSURANCE MANAGERS, BROKERS, AGENTS, INSURANCE MARKETPLACE PROVIDERS AND SALESMEN Insurance managers, agents and insurance marketplace providers to maintain lists of insurers for which they act Insurance broker, agent, salesman or insurance marketplace provider deemed agent of insurer in cert

2y ago

288 Views

Lecture 2 Linear Regression: A Model For The Mean

It looks like you're using an ad-blocker