2y ago

81 Views

11 Downloads

6.42 MB

19 Pages

Transcription

Lecture 14Multiple Linear Regression andLogistic eringGeorgiaTech1

Outline Multipleregression Logisticregression2

Simple linear regressionBased on the scatter diagram, it is probably reasonable to assume that the mean of therandom variable Y is related to X by the following simple linear regression model:Regressor or PredictorResponseεi(Yi β 0 β1 X i ε iε i Ν 0, σ 2Intercept)Slopei 1,2,!, nRandom errorwhere the slope and intercept of the line are called regression coefficients. The case of simple linear regression considers a single regressor or predictor x and adependent or response variable Y.3

Multiple linear regression Simplelinearregression:onepredictorvariablex sx1,x2, ,xk Example: simplelinearregressionpropertytax a*houseprice b multiplelinearregressionpropertytax a1*houseprice a2*housesize b Question:howtofitmultiplelinearregressionmodel?4

x2. Figureshowsplane formodelthe regression12-1 modelMULTIPLE LINEAR REGRESSION)dshowsthis12-1(a)plane forthe thisregressionE1Y 2 " 50 % 10E1Y2"50%10x%7x1E1Y 2 analysis2 50 % 10xpplications of regressioninvolvein which there are more than one"1 % 7xsituations2E1Y 2 " 50 % 10x1 % 7x2or or predictor variable. A regression modelwherethat edexpectedvalue of txE(!)heexpectedvalueoftheerror termisofzero;that isterm"0. hatisE(!)"0.The2alleda multipleregressionmodel.parameter#00.ispartialthe interceptof the plane. We somheexpectedvalueoftheerrortermis nand#2 partialregression#0 is theMultipleintercept of theplane. Wecall#E(!)regression1 sometimes1 and thatintheeffectivelife# ofa xcuttingtooldepends1 onthe cuttingspeed chcoefficients,measuresthe rtialregressionurestheplane.expectedY perunitchangex2becauseis in x1 #1 and2PMs,#1 measuresthe expectedYinperunitwhenx2 is1 whenJWCL232 c12 449-512.qxd1/15/1010:07inPage451changetoolangle.A multiplemodelthatmightis expected gein unitY perunitinchangeinconstant,xxdescribexx2thiss theandexpectedchangeYvariables)perchangeinxheldis heldant,measurestheinregressionexpectedchangeY perunitchangeinandwhenx1 is held the11 when2 when2expected2 isssigureathecontourof b)expectedchangein Y perunitinx8whenx1conisthatheld12-1(b)plotshowsa contourplotof changethe regressionmodel—is, lines showsof con- a contour plot of th2 lines "x "x !(12-1)Y#"0inthe1plot1 plotarestraightlines.s nstantE(Y)asafunctionofxandx.Noticethat the21212dependentvariableresponseY plotmaybeInrelatedtoorktheindependentor variable or responteral,variableor responseY mayorberelatedtok independentandx2the. Noticethattoolthecontourlinesin elife,xrepresentsthecuttingspeed,xrepresentsthe tool angle,12variables.Themodelel6nt variable or response Y may be related toregressork independentor 12-1MULTIPLEREGRESSION MODEvariables.Themodela random error term. This is a multiple #linearis usedfunctionparametersY2"x#1 k%% p is%a#linearxbecause%0 %%1Equationx k#%2x2!12-1k xk % !(12-2)0 % #1x1 % #2#(12-2) of thex2 unknownY"#0 % #1x1 % #2x2 %10nd"%2.# x % # x % p % # x % !4#01122kkmultiplemodellinear withregressionwith k regressorvariables.(12-2)parameters#j,ressionk regressorThe parameters#The240 modelvariables.j,8calleda multiplelinear regression model with10 Thisisamodelpregression, k, are calledthe is modeldescribeshyperplanein a hyperplane ingression s, called86jThe"0,parameters1, p ,thek,#aretheensionalthe regressorvariables{x}.Theparametertheregression coefficie2jjhe regressor variables{x}.Theparameter#representsjjE(Y) 120 This6 modelregressioncoefficients.describeshyperplanein spacehangeinresponseYperunitallchangein xj whenallthetheak-dimensionalremainingregressorsx (iof thej) regressor variablper unit change in xj whentheremainingregressorsx(i j)4 ii804he regressor variables 40{xj}. xThe parameter#j representsthenstant.expectedchange67inY per unit101change in x1j8410 response2282perchangeinxj whenalloftenthe remainingregressorsxifunctions.(i6 j) Thatplelinearregressionmodelsareused functions.as approximatingis, ftenusedasThatis,the4010 between Y and x2 , x ,4 p , 6x is unknown,67 4841010 2 but lelinear regressionmodelsare oftenusetween Y and x1, x2, p , xk is unknown,x1 but over8 certain10 0 sedasapproximatingfunctions.Thatis,thelinear regression model is an adequate approximation.true functional relationship between(b)Y (b)and x1, x2, ps thatinareinthan12-2mayE(Yoftenstill10xbe analyzedetweenY moreand rtainrangesThe regressionplaneforthethemodel) " 50 %variablesThelinearcontour plot.1,thank is eanalyzed1 % 7x2. (b)ofindependenttheregression ecubicpolynomialmodelhe nefor theE(Y50%polynomial10x1Models% 7x. (b)Thecontourplot.echniques.For lexinstructure thanessorvariable.5mplex in structure than Equation 12-2 may oftenstill be amThe regression model in Equation 12-1 describes a plane in the three-dimenechniques. For example, consider the2 cubic 3polynomial modelMultiple linear regression model

le linear regressionmodels areoften usedbetweenas approximatingfunctions.Thatis, but12k:07PMPage452onal relationshipY and variablesx1, x2, p , thexk islinearunknown,but overcertainrangesof thebetweenindependentregressionmodelis anadequate appendent variablesModelsthe linearis anthatregressionare more modelcomplexin adequatestructureapproximation.than Equation 12-2 may ofts that are more complex in structure than Equation 12-2 may often still be analyzedby multiple linear regression techniques. For example, consider the cubiclinear regression techniques. For example, consider the cubic polynomial modelin one regressor variable.essor variable.CubicpolynomialMore complex models can still beanalyzed using multiple linear regressionHAPTER 12 MULTIPLE LINEAR REGRESSION233Y"#%#x%#x%#x%!0 ! 123 (12-3)!Y"#%#x%#x%#x%0123E LINEAR REGRESSION22 interaction3includeeffectsmayalsobe analyzedby m2 weModels3 "thatIfletxx,x"x,x"x,Equation12-3canbewrittenas!2 12-3 can3 be written as" x, x2 " x , x3 " x ,1 Equationsioninteractionmethods. Aninteractiontwocan berepresentels that includeeffectsmay also betweenbe analyzedbyvariablesmultiple linearregresterm inthehods. An!interactionbetweenbe%representeda cross-productY"#3 x3(12-4)%!Y"#0model,% #1twox1such%variables#2asx2 %#can! #1x1 % #2xby2 %3 x#3 0%e model, such as Interactionemodelffectlinearmultiple linearregressionwith threevariables.whichis a multipleregressionmodelregressor variables.Yregressor! "0 "1x1with "three2x2 "12 x1x2 &Y ! "0 "1x1 "2x2 "12 x1x2 &(12-5)If we let x3 ! x1x2 and "3 ! "12, Equation 12-5 can be written asx3 ! x1x2 and "3 ! "12, Equation 12-5 can be written as &"1x1 "2x2 "3 x3 &Y ! "0 "1x1 "2x2 Y !"3 x"30 which model.is a linear regression model.a linear regressionand (b) showstheofthree-dimensionale 12-2(a) and (b)Figureshows 12-2(a)the three-dimensionalplotthe regression modelplot of the regr6

60810x1and the correspondingtwo-dimensionalcontour plot.Notice that, although12-1 MULTIPLELINEAR REGRESSION453 this model is a(a) MODELx2 the shape of the surface that is generated by the model is not linear.linear720 regression model,210In general, any regressionmodel that is linear in parameters (the "’s) is a linear regression25653model, regardless of the shape of the surface that it generates.586Figure 12-2 provides8 a nice graphical interpretation of an interaction. Generally, interaction100implies519 that the effect produced by changing one variable (x1, say) depends on the level of theother variable (x2). For example, Fig. 12-2 shows that changing x1 from 2 to 8 produces a much452 change in E(Y 6smaller) when x2 ! 2 than when x2 ! 10. Interaction effects occur frequently in1000175thetechniques385study and analysis of real-world systems, and regression methods are one of the4800that we can use to describethem.325250400318As a final n47560010550E(Y)625825162700400800 750224Y!" "x "x "x "x(12-6)12 "12 x1x2 10&011221122200x184200028610 x1610 x14 "8 , EquationIf we let x3 ! x 21, x4 ! x 2,0x5 ! x21x22, "34 ! "11,4"4 ! "22, and"25 !12-6 can bex212608(b)written as a multiple linear regression xmodelas follows:101(a)Figure 12-3 (a) Three-dimensional plot of the regressionY !10"0 "1x1 "2 x2 "3 x3 "4 2x4 "52x5 25&653model E(Y ) ! 800 # 10x1 # 7x2 & 8.5x 1 & 5x 2 # 4x1x2.5868 plot.(b) TheFigure 12-3(a)andcontour(b) showthe three-dimensional plot and the correspondingcontour plot for100model720x25194523856E1Y 2 ! 800 10x1 7x2 % 8.5x 21 % 5x 22 4x 1x1752 4318variable x . The observations areor levelofThese plotsindicate thatj the expected change in Y when325250400x475is changedby one unit (say) is a

68!b)10 x1 318"0 # a "j xij251j!1# i024i ! 1, 2, p , n(b) 625 550700800 750 26 8 400475 (12-7Data for multiple regressional plot of 117the regression model184 Figure 12-3 (a) Three-dimensional plot of the regression220#10x#7x&8.5x&5x(b)plot.812 # 84x1x2.124 The contour610 x1 model E(Y ) ! 8000246(b) The contour plot.(b)(b)Table 12-1 Data for Multiple Linear Regressionsional plot of the regression model Figure 12-3 (a) Three-dimensional plot of the regressionyx1x2.xmodelE(Y ) ! 800 # 10x1 k# 7x2 & 8.5x21 & 5x22 # 4x1x2x1x2. (b) The contour plot.y1x11x12. . plot.x1k(b)Thecontoure the ith observation or level of variable xj. The observations arey2x21x22.x2k1xi 1, xi 2o , p , xik,o yi 2,i o ! 1, 2, p , n ando n % kynxn1xn2.xnkenote the ith observation or level of variable xj. The observations areomary to present the data for multiple regression in a table such as Table 12-1.1xi1i 1, ,xxi2i ,2,pp, ,xxikik, ,yyi ),i 2,satisfiesi ! the1, 2,modelp , n in andn % 12-2,kData (xh observationEquationor customary to presentthe#dataformultipleregressioninxa tablesuch as Table 12-1.pyi ! ""x#"x##"# 01 i12 i2k ikiEach observation (xi1, xi2, pk , xik , yi ), satisfies the model in Equation 12-2, or! "0 # a "j xij # i(12i ! 1, 2, p , ny i ! "0j!1# "1xi1 # "2 xi 2 # p # "k xik # i ! "0 # a "j xij # ikTable 12-1j!1 i ! 1, 2, p , n Data for Multiple Linear Regression(1 8

TIPLE LINEARREGRESSIONTheleast squares function is54The CHAPTER12 MULTIPLE LINEAR REGRESSIONleast squaresfunction is2&L"east squares functionisi " a ayi % !0 % a !j xij baThe least squares function isLeast square estimate of coefficientsnnk2i"12L " a &i n " a nayi % !0 %k a !2j xij bn i"12" a ayi % !0 % aj"1!kj xij bL " a &n2i i"1minimize 2L with respect to ! , ! , p , ! . The n ni"1k2j"1We want to0 j"1 1ki"1(1L " a &i "i"1a ayi % !0 % a !j xij b! , p , !k must satisfyi"1Lrespecti"1j"1We want to minimizewith respectto 0!, 0,!!11,, pp , ,!k!. Theleastleastsquaresestimates estimatesof !0,We want to 1minimizeL withto !squaresof !0,k. The!1, p , !tkomust0 satisfy!1, Setp , d!erivativesnkk must satisfy Lˆleastant to minimize L with respect!k.aysquaresok The to !0, !n1", p%2,a!ˆ j xij b "estimates0i % !0%a Lˆˆ !(12-9a) 0 !ˆ ,!ˆ ", p%2ayi % !i"1nˆ a,!0 % a !kj xij b " 0 j"1k !0 !ˆ ,!ˆ , p 0, !ˆ 1, !k must satisfy Li"1j"1(12-9a) " %2 a ayi % !ˆ 0 % a !ˆ j xij b " 0andand !0 !ˆ 0,!ˆ 1, p , !ˆ ki"1j"1 01(12-8)least (12-8)squares estimate knnnkkk L L L ˆxij%ˆx" xb1,x2,bp"""%2 %2ayi % !ˆ 0ay% a!ˆˆj!b x%, k 0 (12-9b)ij " 0 ˆ j!a%2 !%" "ay%!!0 j " 1, 2, p , (12ki0jijj ! ,! , p, !i"1j"1i0jijijaaaa !0 !ˆp, n!ˆ kj ˆ 1!ˆ, 0pi"1j"1,!ˆ 1,,!i"1j"1!ˆ 0,!kkandˆˆˆ LSimplifying Equation 12-9, we obtain theˆ normalequations j " 1, 2, p , k (12-9b) " %2 a ayi % !ˆ 0least% squares!j xij b xij " 0a !Normalejquationsi"1j"1 least squares!ˆ 0,!ˆ 1, p, !ˆEquationSimplifying12-9,we obtainthenormalequationsknnnn01kn!ˆ 0 # !ˆ 1 a xi1n# !ˆ 2 a xi 2 # p # !ˆ k a xikk" a yii"1i"1i"1i"1 Ln the leastnnnSimplifyingEquation ayˆ i 2 % x!0ˆ %#aˆ !pj xxij bˆ xij#"p0# !ˆj " 1, "2, p, k yi (12ˆ %2n!!!x"01i12i2kik!x#!x#!xx##!aaaa2 a i1 i2k a xi1xik " a xi1 yi !j !ˆ ,!ˆ , p, !ˆ 0 a i1 i"11 a i1j"1 i"1i"1 ni"1i"1i"1n i"1noˆnn!ˆ 0 #ˆ !ˆno 1n a xi1 no ˆ #n !ˆ 22a oxi 2 ˆ #no p # !xiko "n a yikannp #n !ˆ!x#!x#!xx#xi1xik " a xi1 yii"1i"1i"10i1ˆi11ˆ2i1i2ˆ2aˆk ya i"1paa!0 a xik # !1 a xik xi1 # !2 a xik xi2 #(12-10)#!x"xikkikiaa i"1i"1i"1i"1i"1i"1 ni"1ni"1i"1n i"1nnˆ2ˆ o!o !ˆ 2n ao nunknowno regressionox#!xxi1 xi2 one#o p# !ˆofk axx"xy0 Notei1i1k # hthea9k 1normalei"1quations,k i"1eterminedni"1nnnnˆ coefficients.ˆ ˆ The solution toˆtheˆnormal equationsˆ will be the leastˆsquares estimators 2of the01!i12 xˆ x i 2# !iˆ !ˆx #x x #k p # ik!ˆ 01ki"1 i"1i"1n ifying Equation 12-9, we obtain the least squares normal equations n! # ! a x #! ax #p#! ax "yx a" x y

fitting a multiple regression model, it is much more convenient to express the mathematil operations using matrix notation. Suppose that there are k regressor variables and n obrvations, (xi1, xi2, p , xik , yi), i ! 1, 2, p , n and that the model relating the regressors to the0 is 1 i12 i2k ikisponseMatrix form for pmultiple linear& '&x '& x ''& x ' i ! 1, 2, pregressionyi ! &0 ' &1xi1 ' &2 xi 2 ' p ' &k xik ' ii ! 1, 2, p , nem ofWriten equationscan bemultiplerthategressionas expressed in matrix notatis model is a system of n equations that can be expressed in matrix notation asX" ## y !y !X"herey! y1y2oyn 11X! x11x2111x11x21x12x22o1ox12xn1x22oxn2pppppx1kx2k xxnkx2ko 1k(12-11)"! &0&1o&k and ! &0&1 1 2o n anX In!general, y is an (n " 1) vector of the observations, X is an (n"" ! of the levelsp) matrixthe independentthat the intercepto variableso (assumingoo is always multiplied byo a constantlue—unity), " is a ( p " 1) vector of the regression coefficients, and is a (n " 1) vector&1xxpxkn1n2nkrandom errors. The X matrix is often called the model matrix.10

the scalar form of the normal equations given earlier in Equations 12-10. To solve the normalvalue—unity), sides" is ofa Equations( p " 1)12-12vectorof theregressioncoefficients,and equations, multiply errors.The X matrix is often calledthe model matrix.squares estimateof " is 2 !12-1 MULTIPLE LINEAR REGRESSIL! ¿ ! 1y % X"2 ¿ 1y % X"2#Lvector of least squares estimators, "ato ifind theˆ , that minimWe wish!0Matrixnormalequationeast Square#" i!1thescalarfunctionform of the normalequationsin¿ EquationsL! a 2i ! ¿ given! 1yearlier% X"21y % X"2 12-10 Leastsquarenot give the ltingequatioi!1of Equations 12-12 by the inverse of X¿Xequations, multiply bothsides#Lst be solvedare!equations0CoefficientsatisfiesNote thatsquaresthere arep ! k " 1ofnormalin p ! k " 1 unknowns (the values ofestimate"#"is!1ˆ #leastEquations12-12are thematrix form.ˆ is the"X!y normal equations in(12-13)nsquarese least squares estimator"solutionfor"(X!X)in theequationsstimate of " ˆ isisthethe matrix X!Xalwaysnonsingular,above,%̂0, %̂1The, p , %̂leastsquares estimator"solutionfor "asinwastheassumedequationsk 2. Furthermore,on determinants and matrices for inverting these ma! so the methods described in textbooksSquaretricesthecandetailsbe usedoftotakingfind 1X¿X2. In practice,above;multipleregressionalmoste willLeastnot givethe 1derivativeshowever,thecalculationsresulting areequations#L(X!X)!1 X!yˆ ! X%yˆ #alwaysusing a computer.Estimateof "performedt mustbe solvedare"X%X"!0 to the scalar(12-12)It is easy to see that the matrix form of the normal equationsisidenticalform.#"Writing out Equation 12-12 in detail, we obtain NormalequationWe will not givedetailstaking nthe derivatives above;however, theˆn ! ofn theX%X"X%y(12-12)nNoten that thereare p ! k "p1 normalequations in p ! k " 1 unkna xi1 are a xi2a xikthat mustbe solved%̂0 is alwaysi!1i!1a yi nonsingular, asthematrixX!X%̂0, n%̂1, p , i!1%̂n k 2. Furthermore,i!1nn2npso athexi1methodsand matrices foa xi1describeda xi1xi2in textbooksa xi1xikon determinants%̂xy 11i!1i!1i!1i!1a i1 i regression X¿X2i!1HXHXHX!o performedoˆ o ! X%yEquationsalwaysusingo a computer. o X%X"onnnnn equations is identi2 of the normalItxikis easyxiktoxi1see thatxikthematrixformpxxiki2aaaa%̂ki!1i!1i!1i!1a xik yiWriting out Equation 12-12 in detail, we obtaini!1 11nnIf the indicated matrix multiplication is performed,the nscalar form of the normalequations

k always used in fitting multiple regression models. Table 12-4 preuters are almostfittedmodelis%̂ x(12-14)ŷannotated!%̂"i !for1,the2, leastp , squaresnregression modelfor wire bondi0 outputj i j Minitaba fromj!1 part of the table contains the numerical estimates of the regresth data. The upperˆŷ!X"cients. The computer also calculates several other quantities that reflect importannd abouttheisregression model. In subsequent sections, we will define and explain thmodel Fittedmodeln this output.een the! observation yˆ i and the fitted value ŷi is a residual, sayŷ ! X"g !2 Fitted model 1) vector of residuals is denoted byimple linear! regression, it is important to estimate !2, the variance of the error termiple regression model. Recall that in simple linear regression the estimate of !2 wae!y l,say,ŷ Residualiiy dividing the sum of the squared residuals by n # 2. Now there are two parameector ofis denotedsimplelinearregressionmodel, sobyin multiple linear regression with p parameters! residualsmator for !2 is Estimatorvariancee !oyf ŷ(12-15)2ea in SSE! n#p n#pˆ22i 1(12-16)212unbiased estimator of ! . Just as in simple linear regression, the estimate of ! is usu

andregression,much of theitsyntaxthe samethat usedofforfitting simplelinearsimpleis nowismodeledasasa functionseveralexplanatoryregressionmultiplelinear regressionwith pregressionexplanatoryvariables.The models.function Tolm performcan be usedto performmultiple linearin Rthe command:andvariablesmuch of usethe syntaxis the same as that used for fitting simple linearregression models. To perform multiple linear regression with p explanatorylm(response explanatory 1 explanatory 2 explanatory p)variablesthe command: FitusemodelusingUse R for multiple linear regressionHerethe terms response and explanatory i in the function should be replaced by!lm(response explanatory 1 explanatory 2 explanatory p)the names of the response and explanatory variables, respectively, used in the analysis.ExampleHeretheterms response and explanatory i in the function should be replaced bythe names of the response and explanatory variables, respectively, used in theEx. Data was collected on 100 houses recently sold in a city. It consisted of theanalysis.sales price (in ), house size (in square feet), the numb

LINEAR REGRESSION 12-2.1 Test for Significance of Regression 12-2.2 Tests on Individual Regression Coefficients and Subsets of Coefficients 12-3 CONFIDENCE INTERVALS IN MULTIPLE LINEAR REGRESSION 12-3.1 Confidence Intervals on Individual Regression Coefficients 12-3.2 Confidence Interval

Related Documents: