San Jos E State University Math 261A: Regression Theory .

3y ago
83 Views
6 Downloads
1.61 MB
61 Pages
Last View : 9d ago
Last Download : 3m ago
Upload by : Lilly Kaiser
Transcription

San José State UniversityMath 261A: Regression Theory & MethodsMultiple Linear RegressionDr. Guangliang Chen

This lecture is based on the following textbook sections: Chapter 3: 3.1 - 3.5, 3.8 - 3.10Outline of this presentation: The multiple linear regression problem Least-square estimation Inference Some issues

Multiple Linear RegressionThe multiple linear regression problemConsider the body data again. To construct a more accurate model forpredicting the weight of an individual (y), we may want to add otherbody measurements, such as head and waist circumferences, as additionalpredictors besides height (x1 ), leading to multiple linear regression:y β0 β1 x1 β2 x2 · · · βk xk (1)where y: response, x1 , . . . , xk : predictors β0 , β1 , . . . , βk : coefficients : error termDr. Guangliang Chen Mathematics & Statistics, San José State University3/61

Multiple Linear RegressionAn example of a regression model with k 2 predictorsDr. Guangliang Chen Mathematics & Statistics, San José State University4/61

Multiple Linear RegressionRemark. Some of the new predictors in the model could be powers of theoriginal onesy β0 β1 x β2 x2 · · · βk xk or interactions of them,y β0 β1 x1 β2 x2 β12 x1 x2 or even a mixture of powers and interactions of themy β0 β1 x1 β2 x2 β11 x21 β22 x22 β12 x1 x2 These are still linear models (in terms of the regression coefficients).Dr. Guangliang Chen Mathematics & Statistics, San José State University5/61

Multiple Linear RegressionAn example of a full quadratic modelDr. Guangliang Chen Mathematics & Statistics, San José State University6/61

Multiple Linear RegressionThe sample version of (1) isyi β0 β1 xi1 β2 xi2 · · · βk xik i ,1 i n(2)where the i are assumed for now to be uncorrelated:Cov( i , j ) 0,i 6 jand have the same mean zero and variance σ 2 :E( i ) 0,Var( i ) σ 2 ,for all i(Like in simple linear regression, we will add the normality and independenceassumptions when we get to the inference part)Dr. Guangliang Chen Mathematics & Statistics, San José State University7/61

Multiple Linear RegressionLetting y1 y2 y . , . yn 1 x11 x12 · · · 1 x21 x22 · · · X . . 1 xn1 xn2 · · · x1k x2k ,. . xnk β0 β1 β . , . βk 1 2 . . . nwe can rewrite the sample regression model in matrix formX · β {z} y {z} {z} {z}n 1n pp 1(3)n 1where p k 1 represents the number of regression parameters (notethat k is the number of predictors in the model).Dr. Guangliang Chen Mathematics & Statistics, San José State University8/61

Multiple Linear RegressionLeast squares (LS) estimationThe LS criterion can still be used tofit a multiple regression modelŷ β̂0 β̂1 x1 · · · β̂k xkto the data as follows:min S(β̂) β̂nXnXi 1i 1(yi ŷi )2 e2iwhere for each 1 i n,ŷi β̂0 β̂1 xi1 · · · β̂k xikDr. Guangliang Chen Mathematics & Statistics, San José State University9/61

Multiple Linear RegressionLet e (ei ) Rn and ŷ (ŷi ) Xβ̂ Rn . Then e y ŷ.Correspondingly the above problem becomesmin S(β̂) kek2 ky Xβ̂k2β̂Theorem 0.1. If X0 X is nonsingular, then the LS estimator of β isβ̂ (X0 X) 1 X0 yRemark. The nonsingular condition holds true if and only if all the columnsof X are linearly independent (i.e. X is of full column rank).Dr. Guangliang Chen Mathematics & Statistics, San José State University10/61

Multiple Linear RegressionRemark. This is the same formula for β̂ (β̂0 , β̂1 )0 in simple linearregression. To demonstrate it, consider the toy data set of 3 points:(0, 1), (1, 0), (2, 2) used before. The new formula gives thatβ̂ (X0 X) 1 X0 y " 1 1 1 # 1 0"# 11 1 1 1 1 1 0 0 1 2" 3 33 5"0.5 0.51 20 1 22# 1 " #34#Dr. Guangliang Chen Mathematics & Statistics, San José State University11/61

Multiple Linear RegressionProof. We first need to derive some formulas about the gradient of afunction of multiple variables: x0 a a0 x a x x 2kxk x0 x 2x x x x0 Ax 2Ax x kBxk2 x0 B0 Bx 2B0 Bx x xDr. Guangliang Chen Mathematics & Statistics, San José State University12/61

Multiple Linear RegressionUsing the identity ku vk2 kuk2 kvk2 2u0 v, we writeS(β̂) kyk2 kXβ̂k2 2(Xβ̂)0 y y0 y β̂ 0 X0 Xβ̂ 2β̂ 0 X0 yApplying the formulas on the preceding slide, we obtain S β̂ 0 2X0 Xβ̂ 2X0 ySetting the gradient equal to zeroX0 Xβ̂ X0 y least squares normal equationsand solving for β̂ will complete the proof.Dr. Guangliang Chen Mathematics & Statistics, San José State University13/61

Multiple Linear RegressionRemark. The very first normal equation in the systemX0 Xβ̂ X0 yisnβ̂0 β̂1Xxi1 β̂2Xxi2 · · · β̂kXxik Xyiwhich simplifies toβ̂0 β̂1 x̄1 β̂2 x̄2 · · · β̂k x̄k ȳThis indicates that the centroid of the data, i.e., (x̄1 , . . . , x̄k , ȳ), is on theleast squares regression plane.Dr. Guangliang Chen Mathematics & Statistics, San José State University14/61

Multiple Linear RegressionRemark. The fitted values of the least squares model areŷ Xβ̂ X(X0 X) 1 X0 y Hy {zH}and the residuals aree y ŷ (I H)y.The matrix H Rn n is called the hat matrix, satisfyingH0 H (symmetric),H2 H (idempotent),H(I H) ODr. Guangliang Chen Mathematics & Statistics, San José State University15/61

Multiple Linear RegressionGeometrically, it is the orthogonal projection matrix onto the column spaceof X (subspace spanned by the columns of X):ŷ Hy X (X0 X) 1 X0 y Col(X) {z}β̂ŷ0 (y ŷ) (Hy)0 (I H)y y0 H(I H) y 0. {z O}ybe (I H)ybŷ Hyb0Col(X)Dr. Guangliang Chen Mathematics & Statistics, San José State University16/61

Multiple Linear RegressionExample 0.1 (body dimensions data1 ). Besides the predictor Height,we include Waist Girth as a second predictor to preform multiple linearregression for predicting Weight.(R demonstration in .htmlDr. Guangliang Chen Mathematics & Statistics, San José State University17/61

Multiple Linear RegressionInference in multiple linear regression Model parameters: β (β0 , β1 , . . . , βk )0 (intercept and slopes),σ 2 (noise variance) Inference tasks (for the parameters above): point estimation, interval estimation*, hypothesis testing* Inference of the mean response at x0 (1, x01 , . . . , x0k )0 :E(y x0 ) β0 β1 x01 · · · βk x0k x00 β*To perform these two inference tasks, we will additionally assume thatthe model errors i are normally and independently distributed with meaniid0 and variance σ 2 , i.e., 1 , . . . , n N(0, σ 2 ).Dr. Guangliang Chen Mathematics & Statistics, San José State University18/61

Multiple Linear RegressionExpectation and variance of a vector-valued random variable (X1 , . . . , Xn )0 Rn be a vector-valued random variable. DefineLet X (E(X1 , . . . , E(Xn ))0 Expectation: E(X) Variance (also called covariance matrix): Var(X1 )Cov(X1 , X2 ) · · · Cov(X2 , X1 )Var(X2 )··· Var(X) . . Cov(Xn , X1 ) Cov(Xn , X2 ) · · · Cov(X1 , Xn ) Cov(X2 , Xn ) . .Var(Xn )Dr. Guangliang Chen Mathematics & Statistics, San José State University 19/61

Multiple Linear RegressionPoint estimation in multiple linear regressionFirst, like in simple linear regression, the least squares estimator β̂ is anunbiased linear estimator for β.Theorem 0.2. Under the assumptions of multiple linear regression,E(β̂) β.That is, β̂ is a (componentwise) unbiased estimator for β:E(β̂i ) βi ,for all i 0, 1, . . . , kDr. Guangliang Chen Mathematics & Statistics, San José State University20/61

Multiple Linear RegressionProof. We haveβ̂ (X0 X) 1 X0 y (X0 X) 1 X0 (Xβ ) (X0 X) 1 X0 · Xβ (X0 X) 1 X0 · β (X0 X) 1 X0 .It follows thatE(β̂) β (X0 X) 1 X0 E( ) β {z } 0Dr. Guangliang Chen Mathematics & Statistics, San José State University21/61

Multiple Linear RegressionNext, we derive the variance of β̂:Var(β̂) (Cov(β̂i , β̂j ))0 i,j k .Theorem 0.3. Let C (X0 X) 1 (Cij )0 i,j k . ThenVar(β̂) σ 2 C.That is,Var(β̂i ) σ 2 CiiandCov(β̂i , β̂j ) σ 2 Cij .Dr. Guangliang Chen Mathematics & Statistics, San José State University22/61

Multiple Linear RegressionProof. Using the formula:Var(Ay) A · Var(y) · A0 ,we haveVar(β̂) Var((X0 X) 1 X0 y) {zA} (X0 X) 1 X0 · Var(y) · X(X0 X) 1 {zA0} {z } σ 2 I{zA0} σ 2 (X X) 1 .Dr. Guangliang Chen Mathematics & Statistics, San José State University23/61

Multiple Linear RegressionLastly, we can derive an estimator of σ 2 from the residual sum of squaresSSRes Xe2i kek2 ky Xβ̂k2Theorem 0.4. We haveE(SSRes ) (n p)σ 2 .This implies thatM SRes SSResn pis an unbiased estimator of σ 2 .Dr. Guangliang Chen Mathematics & Statistics, San José State University24/61

Multiple Linear RegressionRemark. The total and regression sums of squares are defined in the sameway as before:SSR XXŷi2 nȳ 2 kŷk2 nȳ 2SST XXyi2 nȳ 2 kyk2 nȳ 2(ŷi ȳ)2 (yi ȳ)2 They can be used to assess the adequacy of the model through thecoefficient of determinationR2 SSRSSRes 1 SSTSSTThe larger R2 (i.e., the smaller SSRes ), the better the model.Dr. Guangliang Chen Mathematics & Statistics, San José State University25/61

Multiple Linear RegressionExample 0.2 (Weight Height Waist Girth). For this model,M SRes 4.5292 20.512In contrast, for the simple linear regression model (Weight Height),M SRes 9.3082 86.639.Therefore, the multiple linear regression model has a smaller total fitting The coefficient of determination of thismodel is R2 0.8853, which is mucherror SSRes (n p)M SRes .higher than that of the smaller model.Dr. Guangliang Chen Mathematics & Statistics, San José State University26/61

Multiple Linear RegressionAdjusted R2R2 measures the goodness of fit of a single model and is not a fair criterionfor comparing models with different sizes k (e.g., nested models)The adjusted R2 criterion is moresuitable for such comparisons:bbbR2bbuuubu2RAdj 1 SSRes /(n p)SST /(n 1)ubuu2RAdjb2 , the better theThe larger the RAdjmodel.u k (#predictors)Dr. Guangliang Chen Mathematics & Statistics, San José State University27/61

Multiple Linear RegressionRemark. As p (i.e., k) increases, SSRes will either decrease or stay the same:– If SSRes does not change (or decreases by very little), then2 will decrease. The smaller model is betterRAdj2– If SSRes decreases relatively more than n p does, then RAdjwould increase. The larger model is better We can write instead2RAdj 1 n 1(1 R2 )n p2 R2 .This implies that RAdjDr. Guangliang Chen Mathematics & Statistics, San José State University28/61

Multiple Linear RegressionSummary: Point estimation in multiple linear BiasVarianceβσ2β̂ (X0 X) 1 X0 yResM SRes SSn punbiasedunbiasedσ 2 (X0 X) 1Remark. For the mean response at x0 (1, x01 , . . . , x0k )0 :E(y x0 ) β0 β1 x01 · · · βk x0k x00 βan unbiased point estimator isβ̂0 β̂1 x01 · · · β̂k x0k x00 β̂Dr. Guangliang Chen Mathematics & Statistics, San José State University29/61

Multiple Linear RegressionNextWe consider the following inference tasks in multiple linear regression: Hypothesis testing Interval estimationFor both tasks, we need to additionally assume that the model errors iare iid N (0, σ 2 ).Dr. Guangliang Chen Mathematics & Statistics, San José State University30/61

Multiple Linear RegressionHypothesis testing in multiple linear regressionDepending on how many regression coefficients are being tested together,we have ANOVA F Tests for Significance of Regression on All RegressionCoefficients Partial F Tests on Subsets of Regression Coefficients Marginal t Tests on Individual Regression CoefficientsDr. Guangliang Chen Mathematics & Statistics, San José State University31/61

Multiple Linear RegressionANOVA for Testing Significance of RegressionIn multiple linear regression, the significance of regression test isH0 : β1 · · · βk 0H1 : βj 6 0 for at least one jThe ANOVA test works very similarly: The test statistic isF0 M SRSSR /kH0 Fk,n pM SResSSRes /(n p)and we reject H0 ifF0 Fα,k,n pDr. Guangliang Chen Mathematics & Statistics, San José State University32/61

Multiple Linear RegressionExample 0.3 (Weight Height Waist Girth). For this multiple linear regression model, regression issignificant because the ANOVA Fstatistic isF0 1945and the p-value is less than 2.2e-16.Note that the p-values of the individual coefficients can no longer beused for conducting the significanceof regression test.Dr. Guangliang Chen Mathematics & Statistics, San José State University33/61

Multiple Linear RegressionMarginal Tests on Individual Regression CoefficientsThe hypothesis for testing the significance of any individual predictor xj ,given all the other predictors, to the model isH0 : βj 0vsH1 : βj 6 0If H0 is not rejected, then the regressor xj is insignificant and can bedeleted from the model (while preserving all other regressors).To conduct the test, we need to use the point estimator β̂j (which is linear,unbiased) and determine its distribution when H0 is true:β̂j N (βj , σ 2 Cjj ),j 0, 1, . . . , kDr. Guangliang Chen Mathematics & Statistics, San José State University34/61

Multiple Linear RegressionThe test statistic ist0 β̂j 0se(β̂j ) qβ̂jσ̂ 2 CH0 tn p(σ̂ 2 M SRes )jjand we reject H0 if t0 tα/2, n pExample 0.4 (Weight Height Waist Girth). Based on the previous Routput, both predictors are significant when the other is already includedin the model: Height: t0 17.30, p-value 2e-16 Waist Girth: t0 40.36, p-value 2e-16Dr. Guangliang Chen Mathematics & Statistics, San José State University35/61

Multiple Linear RegressionPartial F Tests on Subsets of Regression CoefficientsConsider the full regression model with k regressorsy Xβ Suppose there is a partition of the regression coefficients in β into twogroups (the last r and the preceding ones): "β #β1 Rp ,β2 β1 β0β1.βk r βk r 1 . r. Rp r , β2 . R βkDr. Guangliang Chen Mathematics & Statistics, San José State University36/61

Multiple Linear RegressionWe wish to testH0 : β2 0 (βk r 1 · · · βk 0) vsH1 : β2 6 0to determine if the last r predictors may be deleted from the model.Corresponding to the partition of β we partition X in a conformal way:X [X1 X2 ],X1 Rn (p r) ,X2 Rn r ,such that"#β1y Xβ [X1 X2 ] X1 β1 X2 β2 β2Dr. Guangliang Chen Mathematics & Statistics, San José State University37/61

Multiple Linear RegressionWe compare two contrasting models:(Full model) y Xβ (Reduced model) y X1 β1 The corresponding regression sums of squares are(df k) SSR (β) kXβ̂k2 nȳ 2 ,β̂ (X0 X) 1 X0 y(df k r) SSR (β1 ) kX1 β̂1 k2 nȳ 2 ,β̂1 (X01 X1 ) 1 X01 yThus, the regression sum of squares due to β2 given that β1 is already inthe model, called extra sum of squares, is(df r)SSR (β2 β1 ) SSR (β) SSR (β1 )Dr. Guangliang Chen Mathematics & Statistics, San José State University38/61

Multiple Linear RegressionNote that with the residual sums of squaresSSRes (β) ky Xβ̂k2 ,SSRes (β1 ) ky X1 β̂1 k2 ,β̂ (X0 X) 1 X0 yβ̂1 (X01 X1 ) 1 X01 ywe also haveSSR (β2 β1 ) SSRes (β1 ) SSRes (β)Finally, the (partial F ) test statistic isF0 SSR (β2 β1 )/r H0 Fr,n pSSRes (β)/(n p)and we reject H0 ifF0 Fα,r,n pDr. Guangliang Chen Mathematics & Statistics, San José State University39/61

Multiple Linear RegressionExample 0.5 (Weight Height Waist Girth). We use the extra sum ofsquares method to compare it with the reduced model (Weight Height):Dr. Guangliang Chen Mathematics & Statistics, San José State University40/61

Multiple Linear RegressionRemark. The partial F test on asingle predictor xj , β [β(j) ; βj ]based on the extra sum of squaresSSR (βj β(j) ) SSR (β) SSR (β(j) )can be shown to be equivalent tothe marginal t test for βj .For example, for Waist Girth, marginal t test: t0 40.36 partial F test: F0 1629.2Note that F0 t20 (thus same test).Dr. Guangliang Chen Mathematics & Statistics, San José State University41/61

Multiple Linear RegressionRemark. There is a decompositionof the regression sum of squaresSSR SSR (β1 , . . . , βk β0 )into a sequence of marginal extrasums of squares, each correspondingto a single predictor:SSR (β1 , . . . , βk β0 ) SSR (β1 β0 )From the above output:– SSR (β1 β0 ) 46370, the predictorheight is significant SSR (β2 β1 , β0 )– SSR (β2 β1 , β0 ) 33416, waist ···girth is significant given that height is SSR (βk βk 1 , . . . , β1 , β0 )already in the model– SSR (β1 , β2 β0 ) 79786Dr. Guangliang Chen Mathematics & Statistics, San José State University42/61

Multiple Linear RegressionSummary: hypothesis testing in regression ANOVA F test: H0 : β1 · · · βk 0. Reject H0 ifF0 M SRSSR /k Fα,k,n pM SResSSRes /(n p) Marginal t-tests: H0 : βj 0. Reject H0 if t0 tα/2, n p ,t0 β̂j 0se(β̂j ) pβ̂jσ̂ 2 Cjj Partial F test: H0 : β2 0. Reject H0 ifSSR (β2 β1 )/r Fα,r,n pSSRes (β)/(n p)Dr. Guangliang Chen Mathematics & Statistics, San José State University43/61

Multiple Linear RegressionInterval estimation in multiple linear regressionWe construct the following Confidence intervals for individual regression coefficients β̂j Confidence interval for the mean response Prediction intervalunder the additional assumption that the errors i are independently andnormally distributed with zero mean and constant variance σ 2 .Dr. Guangliang Chen Mathematics & Statistics, San José State University44/61

Multiple Linear RegressionConfidence intervals for individual regression coefficientsTheorem 0.5. Under the normality assumption, a 1 α confidence intervalfor the regression coefficient βj , 0 j k isqβ̂j tα/2,n p σ̂ 2 CjjDr. Guangliang Chen Mathematics & Statistics, San José State University45/61

Multiple Linear RegressionConfidence interval for the mean responseIn the setting of multiple linear regression, the mean response at a givenpoint x0 (1, x01 , . . . , x0k )0 isE(y x0 ) x00 β β0 β1 x01 · · · βk x0kA natural point estimator for E(y x0 ) is the following:ŷ0 x00 β̂ β̂0 β̂1 x01 · · · β̂k x0k .Furthermore, we can construct a confidence interval for E(y x0 ).Dr. Guangliang Chen Mathematics & Statistics, San José State University46/61

Multiple Linear RegressionSince ŷ0 is a linear combination of the responses, it is normally distributedwithE(ŷ0 ) x00 E(β̂) x00 βandVar(ŷ0 ) x00 Var(β̂)x0 σ 2 x00 (X0 X) 1 x0We can thus obtain the following result.Theorem 0.6. Under the normality assumption on the model errors, a1 α confidence interval on the mean response E(y x0 ) isqŷ0 tα/2, n p σ̂ 2 x00 (X0 X) 1 x0Dr. Guangliang Chen Mathematics & Statistics, San José State University47/61

Multiple Linear RegressionPrediction intervals for new observationsGiven a new location x0 , we would like to form a prediction interval onthe future observation of the response at that locationy0 x00 β 0where 0 N (0, σ 2 ) is the error.We have the following result.Theorem 0.7. Under the normality assumption on the model errors, a1 α prediction interval for the future observation y0 at the point x00 isqŷ0 tα/2, n p σ̂ 2 (1 x00 (X0 X) 1 x0 )Dr. Guangliang Chen Mathematics & Statistics, San José State University48/61

Multiple Linear RegressionProof. First, note that the mean of the response y0 at x0 , i.e., x00 β, isestimated by ŷ0 x00 β̂.Let Ψ y0 ŷ0 be the difference between the true response and the pointestimator for its me

San Jos e State University Math 261A: Regression Theory & Methods Multiple Linear Regression Dr. Guangliang Chen. This lecture is based on the following textbook sections: Chapter 3: 3.1 - 3.5, 3.8 - 3.10 Outline of this presentation: The multiple linear regression problem Least-square estimation Inference

Related Documents:

NOVEMBER 1-7 JOSHUA 18-19 SONGS: 12, 76, 122 1 2 Jos 18:10 Jos 19:1 Jos 19:9 3 Jos 18:2, 3 4 Jos 18:1-10 5 Ps 37:10, 11 6 7. www.jw.org 2021 Christian Congregation of Jehovah’s Witnesses

5.1.2. FILTRU TRECE - JOS La un filtru trece - jos de tip RC, semnalul de ieşire (Ue) se "culege" de pe condensator (figura 5.1. a). În jurul frecvenţei de tăiere, semnalul de ieşire (Ue) are amplitudinea 0,707 din amplitudinea semnalului de intrare (Ui) şi este defazat uşor spre dreapta faţă de acesta (figura 5.1. b) Band de trecere (B) pentru un filtru „trece - jos .

Department of Mathematics, San Jos e State University, San Jos e, California daniel.goldston@sjsu.edu A. H. Ledoan Department of Mathematics, University of Tennessee at Chattanooga, Chattanooga, Tennessee andrew-ledoan@utc.edu Received: 2/1/12, Accepted: 6/4/12, Published: 10/26/12 Abstract

Our Lady of Grace El Cajon University High School San Diego St. Therese San Diego University of San Diego San Diego Blessed Sacrament San Diego Given permission for further studies St. Mary Magdalene San Diego St. Francis Seminary San Diego University of San Dieg

Continuing Education o San Diego County, Housing and Community Development . City of San Diego o San Diego Reentry Roundtable o San Diego Regional Chamber of Commerce o San Diego Sheriff’s Department o San Diego State University, Institute for Public Health . o United Way of San Diego o Urban Street A

San Diego Coleman University San Diego Continuing Education San Marcos Palomar College San Ysidro Casa Familiar, Inc. San Francisco San Francisco City College of San Francisco John O'Connell HS Lowell High School SF Self Help for the Elderly San Joaquin Lodi Lincol

la Repub lica de Uruguay y el Dr. Jos e Luis Pin ar Manas de la Universidad CEU San Pablo de Espana. Tendra tambi en lugar en el congreso un taller, titulado Los retos de la proteccion de datos: la Ley 18331 de protecci on de datos personales, a cargo del Dr. Jos e Luis Pin ar Man as.

locked AutoCAD .DWG format electronically with a relevant index/issue sheet. Estates and Facilities currently use AutoCAD 2016. Drawings supplied on CD should be clearly labelled with the Project details, date and version of AutoCAD used. Drawings produced using BIM software (such as Revit) must be exported into AutoCAD DWG format before issue. The University will also require any original BIM .