Lecture 18(a): Linear Regression: OLS, Ridge, LASSO Setup .

2y ago

118 Views

2 Downloads

1.15 MB

26 Pages

Last View : 2m ago

Last Download : 3m ago

Upload by : Evelyn Loftin

Report this link

Download PDF

Transcription

1Lecture 18(a): Linear Regression: OLS, Ridge,LASSOSetup and Practical ConsiderationsFoundations of Data Science:Algorithms and Mathematical FoundationsMihai Cucuringumihai.cucuringu@stats.ox.ac.ukCDT in Mathematics of Random SystemUniversity of OxfordOctober 2, 2020

2Advertising data set sales of a product in 200 different markets budgets for the product in each of those markets for threedifferent media: TV, radio, and newspaper goal: predict sales given the three media budgets input variables (denoted by X1 , X2 , . . .) X1 TV budget X2 radio budget X3 newspaper budget inputs known as such as predictors, independent variables,features, variables, covariates. the output variable (sales) is the response or dependent variable(denoted by Y )

3Advertising data set

4Linear Regression Is there a relationship between advertising budget and sales? How strong is the relationship between advertising budget andsales? Which media contribute to sales? How accurately can we estimate the effect of each medium onsales? How accurately can we predict future sales? Is the relationship linear? Is there synergy among the advertising media? (50k on TV 50kon radio 100k on either one) (interaction effect)

5ErrorsModel: Y β0 β1 XExample: sales β0 β1 radioDefine the residual sum of squares (RSS):n2RSS i(1)i 1 i yi β̂0 β̂1 xi , i 1, . . . , n(2)

510152025623. Linear VFigure:FIGUREThe least squares fit for the regression of sales onto TV. The fit is found by3.1. For the Advertising data, the least squares ﬁt for the regressionminimizingthe sumgreysegmentan error,of salesonto ofTV squaredis shown.errors.The ﬁt Eachis foundby lineminimizingtherepresentssum of squaredand theerrors.fit ,alinear fitEach grey line segment represents an error, and the ﬁt makes a comprocapturesthebyessencethe squares.relationship,it is somewhatin ofthe leftmiseaveragingoftheirIn thisalthoughcase a linearﬁt captures deficientthe essencethe relationship, although it is somewhat deﬁcient in the left of the plot.of the plot.

1050 5 10 50YY5103. Linear RegressionLeast64 squaresfit 107 2 10X12 2 1012XFigure:A simulateddataset. Left:the truetherelationship,FIGURE3.3. AsimulateddataTheset. redLeft:lineTherepresentsred line representstrue rela-f (X ) tionship,2 3X ,fwhichknownas thepopulationThe blue(X) is2 3X, whichis knownas theregressionpopulation line.regressionline. lineThe is theblue line line:is theit leastline; it is estimatethe least squares(X)observedbasedleast squaresis thesquaresleast squaresfor f (Xestimate) based foron ftheon the inobservedshownblack. Right:The populationregressionlineinisred,data, shownblack. data,Right:The inpopulationregressionline is againshownagainin red,andleastsquaresline blue,in darkblue.In squareslight blue,linesten leastand theleastshownsquareslinein thedarkblue.In lighttenleastaresquares lines are shown, each computed on the basis of a separate random set ofshown, each computed on the basis of a separate random set of observations. Eachobservations. Each least squares line is diﬀerent, but on average, the least squaresleast squaresdifferent,on average,the leastlines are linequiteiscloseto thebutpopulationregressionline. squares lines are quite closeto the population regression line.

8Recall the OLS estimatorsThe least squares coefficient estimates for simple linear regressionnβ̂1 i 1 (xi x̄)(yi ȳ )n i 1 (xi x̄)2β̂0 ȳ β̂1 x̄where ȳ 1nn iyi and x̄ 1nn i(3)(4)xi denote the sample means.The corresponding standard errors are given by2x̄22 1SE(β̂0 ) σ [ n n] i 1 (xi x̄)2σ2SE(β̂1 ) 2with σ Var( )n(5)2 i 1 (xi x̄)2(6)

9Confidence intervals 95 % confidence interval for β1β̂1 2 SE(β̂1 ) i.e., 95 % prob. the β1 lies in[β̂1 2 SE(β̂1 ), β̂1 2 SE(β̂1 ) similarly for β0Advertising data, the 95% confidence interval β0 [6.130, 7.935]: without any advertising sales will situatearound 6,130 and 7,940 units. β1 [0.042, 0.053]: each 1,000 increase in TV advertising average increase in sales by between 42 and 53 units.

10Hypothesis testing: the null hypothesisH0 There is no relationship between X and Yβ1 0H1 There is some relationship between X and Yβ1 0Y β0 β1 X compute the t-statistic given byt β̂1 0SE(β̂1 )i.e., the number of standard deviations β̂1 is away from 0 if no relationship between X and Y, t t-distribution with n-2degrees of freedom for n 30, t-distribution is similar to the Gaussian

11Hypothesis testing: the null hypothesis p-value: probability of observing any value equal to t or larger,assuming β1 0 Small p-value: unlikely to observe such a substantial associationbetween X and Y due to chance, (if X and Y were truly unrelated) Typical p-values for rejecting the null hypothesis: 5% or 1%

12Quality metrics RSE: measures lack of fit of the model to the data n 1 12 RSS RSE n 2 (yi ŷi )n 2(7)i 1 R 2 : measures the proportion of variance explained2R TSS RSSRSS 1 TSSTSS TSS (yi ȳ )2 , the total variance in the response Y RSS (yi ŷi )2 , the amount of variability that is left unexplainedafter the regression for simple linear regression: R 2 ρ2 , where ρ is the usualPearson correlation

13From Simple to Multiple Linear RegressionFigure: A 1,000 increase in radio spending an average increase in salesby 203 units. A 1,000 increase in newspaper spending an averageincrease in sales by around 55 units.

14Multiple Linear RegressionY β0 β1 X1 β2 X2 βp Xp ŷ βˆ0 β̂1 x1 β̂2 x2 β̂p xpŷi βˆ0 β̂1 xi,1 β̂2 xi,2 β̂p xi,p i 1, . . . , nsales β0 β1 TV β2 radio β3 newspaper

15Errors being minimized3.2 Multiple Linear Regression73YX2X1Figure: Ina three-dimensionalsetting, withtwopredictorsandandoneFIGURE3.4. In a three-dimensionalsetting,withtwo predictorsoneresponse,rethe leastsquares regressionline becomesa plane.Theplaneplane isis chosenchosenthe leastsponse,squaresregressionline becomesa plane.Thetominimize the sum of the squared vertical distances between each tancesbetweeneach(shown in red) and the plane.observation (shown in red) and the plane.The values β̂0 , β̂1 , . . . , β̂p that minimize (3.22) are the multiple least squares

16Multiple Linear RegressionFixing TV and newspaper advertising, spending an additional 1,000on radio sales increase 189 unitsNote βnewspaper is now very close to zero, with a small t-statistic andp-value.

17 corr(radio,newspaper) 0.35 newspaper gets ”credit” for the effect of radio on sales shark attacks vs ice cream sales at a given beach shows apositive relationship higher temperatures more people visit the beach more icecream sales and more shark attacks ice cream no longer significant after adjusting for temperature

18Hypothesis testing: the null hypothesisH0 β1 β2 . . . βp 0′H1 at least one of the βj s is non-zero compute the F-statistic given byF (TTS RSS)pRSS/(n p 1) TSS (yi ȳ )2 , the total variance in the response Y RSS (yi ŷi )2 , the amount of variability that is left unexplainedafter the regression Under the linear linear model assumption, one can showE[RSS/(n p 1)] σ2 If H0 is true: E[(TSS RSS)/p] σ 2 F 1 If H1 is true: E[(TSS RSS)/p] σ 2 F 1

19Figure: Least squares model for the regression of number of units sold onTV, newspaper, and radio advertising budgets in the Advertising data.

20Variable selectionWhich predictors are associated with the response? (in order to fit asingle model involving only those d predictors) Note: R 2 always increase as you add more variables to the model adjusted R 2 : 1 RSS/(n p 1) 1 (1 R 2 ) n 1n p 1TSS/(n 1) Mallow’s: Cp 1 (RSS 2pσ̂ 2 )n Akaike Information criterion AIC 1(RSSnσ̂ 22 2pσ̂ )pCannot consider all 2 models. Best Subset Selection: fit a separate least squares regression foreach possible k -combination of the p predictors, and select thebest one Forward selection: start with the null model and keep addingpredictors one by one Backward selection: start with all variables in the model, andremove the variable with the largest p-value

21Other considerations (see the textbook) prediction intervals extensions of the linear modelY β0 β1 X1 β2 X2 β3 X1 X2 sales β0 β1 TV β2 radio β3 ( radio TV ) β0 (β1 β3 radio ) TV β2 radio 2 R for this model 96.8% vs 89.7% for the model that uses TVand radio without an interaction term. The hierarchical principle: if we include X Y , you should alsoinclude the main effects X and Y (even if their p-values are notsignificant) Non-linear Relationships2Y β0 β1 X β2 X

22Potential Problems with Linear Regression Non-linearity of the response-predictor relationships Correlation of error terms Non-constant variance of error terms Outliers High-leverage points Collinearity

23(1) Non-linearity of the DataFigure: Residuals vs. predicted (or fitted) values for the Auto data set. Ineach plot, the red line is a smooth fit to the residuals. Left: Y X , Right:2Y X .

24(2) Time series of residualsFigure: Plots of residuals from simulated time series data sets generatedwith differing levels of correlation between error terms for adjacent timepoints

25(3) Residual plotsFigure: Red line: smooth fit to the residuals. Blue lines: track the outerquantiles of the residuals. Left: The funnel shape indicatesheteroscedasticity (variance of the errors is not constant). Right: Thepredictor has been log-transformed no evidence of heteroscedasticity.

26 Make sure you read the entire Chapter 3.

Lecture 18(a): Linear Regression: OLS, Ridge, LASSO Setup and Practical Considerations Foundations of Data Science: Algorithms and Mathematical Foundations Mihai Cucuringu mihai.cucuringu@stats.ox.ac.uk CDT in Mathematics of Rando

Related Documents:

Introduction to Regression Procedures

independent variables. Many other procedures can also ﬁt regression models, but they focus on more specialized forms of regression, such as robust regression, generalized linear regression, nonlinear regression, nonparametric regression, quantile regression, regression modeling of survey data, regression modeling of

161 Views

2y ago

CHEMICAL REACTION ENGINEERING

Introduction of Chemical Reaction Engineering Introduction about Chemical Engineering 0:31:15 0:31:09. Lecture 14 Lecture 15 Lecture 16 Lecture 17 Lecture 18 Lecture 19 Lecture 20 Lecture 21 Lecture 22 Lecture 23 Lecture 24 Lecture 25 Lecture 26 Lecture 27 Lecture 28 Lecture

100 Views

2y ago

ECE595 / STAT598: Machine Learning I Lecture 01: Linear ...

Lecture 1: Linear regression: A basic data analytic tool Lecture 2: Regularization: Constraining the solution Lecture 3: Kernel Method: Enabling nonlinearity Lecture 1: Linear Regression Linear Regression Notation Loss Function Solving the Regression Problem Geome

45 Views

2y ago

Lecture 14 Multiple Linear Regression and Logistic Regression

LINEAR REGRESSION 12-2.1 Test for Significance of Regression 12-2.2 Tests on Individual Regression Coefficients and Subsets of Coefficients 12-3 CONFIDENCE INTERVALS IN MULTIPLE LINEAR REGRESSION 12-3.1 Confidence Intervals on Individual Regression Coefficients 12-3.2 Confidence Interval

93 Views

2y ago

Lecture 9: Linear Regression - UW Genome Sciences

Lecture 9: Linear Regression. Goals Linear regression in R Estimating parameters and hypothesis testing with linear models Develop basic concepts of linear regression from a probabilistic framework. Regression Technique used for the modeling and analysis of numerical dataFile Size: 834KB

40 Views

2y ago

Lecture notes on CS725 : Machine learning - IIT Bombay

3 LECTURE 3 : REGRESSION 10 3 Lecture 3 : Regression This lecture was about regression. It started with formally de ning a regression problem. Then a simple regression model called linear regression was discussed. Di erent methods for learning the parameters in the model were next discussed. It also covered least square solution for the problem

21 Views

1y ago

LINEAR REGRESSION - York University

Probability & Bayesian Inference CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition J. Elder 3 Linear Regression Topics What is linear regression? Example: polynomial curve fitting Other basis families Solving linear regression problems Regularized regression Multiple linear regression

18 Views

1y ago

Linear regression, Logistic regression, and Generalized Linear Models

Its simplicity and ﬂexibility makes linear regression one of the most important and widely used statistical prediction methods. There are papers, books, and sequences of courses devoted to linear regression. 1.1Fitting a regression We ﬁt a linear regression to covariate/response data. Each data point is a pair .x;y/, where

10 Views

1y ago

Recent Views

A Message From Prosecutor Walsh Domestic Violence Awareness

The Summit ounty Prosecutor's Office has a new campaign to help inform the public about what it's like to be a prosecutor. Each month, one of our assistant prosecutors explains why they chose to be a prosecutor. This month Assistant Prosecutor Elliot Kolkovich discusses the reasons why he is a prosecutor. prosecutor is holding people

1y ago

114 Views

Check Your IPAC Prosecutor Page for Accuracy, Current Content - in

IPAC web site Find Your Prosecutor page? The main Find Your Prosecutor page lists the name, contact information and web site (if any) for each county. It also links to a page for each prosecutor that has space for a photograph and biography. The public uses IPAC's Find Your Prosecutor pages to find out about their local prosecutor. In the 4th .

1y ago

132 Views

Carolyn A. Murray Acting Essex County Prosecutor

2013 Annual Report of Essex County Prosecutor's Office Executive Staff Left to right, first row: Acting Essex County Prosecutor Carolyn A. Murray, New Jersey Attorney General Jeffery S. Chiesa, First Assistant Prosecutor Robert D. Laurino. Second row: Chief Assistant Prosecutor Keith Harvest, Public Information Officer

1y ago

115 Views

Office of the Public Prosecutor CODE OF ETHICS - Gov

the prosecutor has demonstrated actual bias or prejudice towards an accused, complainant or witness; ii) the prosecutor previously served as counsel for the other party, or . was a material witness in the prosecution; iii) the prosecutor, or a member of the prosecutor's family, has an interest in the outcome of the prosecution;

1y ago

107 Views

Role, Functions, and Duties of the Prosecutor & How to Succeed

prosecutor's office should exercise sound discretion and independent judgment in the performance of the prosecution function. (b) The primary duty of the prosecutor is to seek justice within the bounds of the law, not merely to convict. The prosecutor serves the public interest and should act with integrity and balanced judgment to increase

1y ago

115 Views

Mahoning County Prosecutor'S Office Annual Report

Chief Assistant Prosecutor Chief, Civil Division 330-740-2330 v 3 Karen Gaglione Assistant Chief, Civil Division 330-740-2330 Mahoning County Prosecutor's Office kgaglione@mahoningcountyoh.gov 21 W. Boardman Street, 6th Floor, Youngstown, OH 44503 (T) 330-740-2330 (F) 330-740-2008 Website: prosecutor .

1y ago

118 Views

The Kansas Prosecutor

The Associate Member Prosecutor of the Year Award is presented to a prosecutor for outstanding prosecution of a case or cases throughout the year from an office other than a County or District Attorney's office. Nominations may be made by either the prosecutor himself/herself or by a colleague. The nominee must be an associate member of the .

1y ago

121 Views

The Prosecutor - Montgomery County, Ohio

The Prosecutor is published as a public service by the Montgomery County Prosecutor's Office. For questions or comments about articles appearing in The Prosecutor, or to recommend topics you'd like to see, please contact: Mr. Greg Flannagan, Public Information Officer at 937-225-5610 or e-mail info@mcpo.com Office Staff Updates

1y ago

104 Views

A Message From Prosecutor Walsh The Role Of A Prosecutor

Assistant Prosecutor with the ivil Division and has worked with my office for over 14 years. Annie Spitali began her career in the hild Support Division in 1996 and is currently a hild Support Supervisor. And Heaven Guest has been with my office since 2003 and is currently an Appellate Prosecutor. Your awards are well deserved!

1y ago

111 Views

How Prosecutor Elections Fail Us - Ohio State University

the prosecutor has applied the criminal law according to public values. This article surveys the typical rhetoric in prosecutor election campaigns, drawing on a new database that collects news accounts of candidate statements during prosecutor elections. Those statements reflect the candidates' claims about

1y ago

111 Views

TRIAL STRATEGIES FOR THE PROSECUTION OF SEXUAL ABUSE IN .

Trial Preparation Sherry Sullivan is transported to the prosecutor’s office for a trial prep session. Sherry says she doesn’t want to talk about the rapes; she just wants the prosecutor to talk to her about what will happen during the trial. The prosecutor spends 45 minutes talking about the process. Sherry says she will talk to

3y ago

128 Views

English .: ICC-01/18 Date

No. ICC-01/18 2/30 16 March 2020 Document to be notified in accordance with regulation 31 of the Regulations of the Court to: The Office of the Prosecutor Ms. Fatou Bensouda, Prosecutor Mr James Stewart, Deputy Prosecutor

3y ago

130 Views

Paul C. Dedinsky, J.D., Ph.D. 5737 North Kent Avenue .

Sensitive Crimes Prosecutor, 1999–2001, Assistant to E. Michael McCann, serving as a Sexual Assault prosecutor. Misdemeanor, Domestic Violence and Delinquency Prosecutor, 1997–1999 Private Practice Law Offices, Milwaukee, WI, 1994–1997 Principal Attorney, specializing in criminal defe

2y ago

122 Views

Open letter to the Chief Prosecutor of the International .

whether that meant that the Prosecutor would defer to a national investigation or that the Prosecutor would at some point resume the investigation, either after the settlement negotiations were successful or if they failed, and what

2y ago

116 Views

DOCUMENT RESUME ED 188 049 Prosecutor's Responsibility

DOCUMENT RESUME. ED 188 049. CG 014 443. Prosecutor's Responsibility in Spouse Abuse Cases. INSTITUTION National District Attorneys Association, Chicago, . The role of the prosecutor in spouse assault cases was the subjct of a conference organized by the National-District Att

2y ago

185 Views

Lecture 18(a): Linear Regression: OLS, Ridge, LASSO Setup .

It looks like you're using an ad-blocker