ECON4150 - Introductory EconometricsLecture 14: Panel dataMonique de Haan(moniqued@econ.uio.no)Stock and Watson Chapter 10
2OLS: The Least Squares AssumptionsYi β0 β1 Xi uiAssumption 1: conditional mean zero assumption: E[ui Xi ] 0Assumption 2: (Xi , Yi ) are i.i.d. draws from joint distributionAssumption 3: Large outliers are unlikely Under these three assumption the OLS estimators are unbiased,consistent and normally distributed in large samples. Last week we discussed threats to internal validity In this lecture we discuss a method we can use in case of omittedvariables Omitted variable is a determinant of the outcome Yi Omitted variable is correlated with regressor of interest Xi
3Omitted variables Multiple regression model was introduced to mitigate omitted variablesproblem of simple regressionYi β0 β1 X 1i β2 X 2i β3 X 3i . βk Xki ui Even with multiple regression there is threat of omitted variables: some factors are difficult to measure sometimes we are simply ignorant about relevant factors Multiple regression based on panel data may mitigate detrimental effectof omitted variables without actually observing them.
4Panel dataCross-sectional data:A sample of individuals observed in 1 time period2010Panel data: same sample of individuals observed in multiple time periods201020112012
5Panel data; notation Panel data consist of observations on n entities (cross-sectional units)and T time periods Particular observation denoted with two subscripts (i and t)Yit β0 β1 Xit uit Yit outcome variable for individual i in year t For balanced panel this results in nT observations
6Advantages of panel data More control over omitted variables. More observations. Many research questions typically involve a time component.
7The effect of alcohol taxes on traffic deaths About 40,000 traffic fatalities each year in the U.S. Approximately 25% of fatal crashes involve driver who drunk alcohol. Government wants to reduce traffic fatalities. One potential policy: increase the tax on alcoholic beverages. We have data on traffic fatality rate and tax on beer for 48 U.S. states in1982 and 1988. What is the effect of increasing the tax on beer on the traffic fatality rate?
8Data from 19823210Traffic fatality rate4Traffic deaths and alcohol taxes in 19820.511.522.53Tax on beer (in real dollars)\ i,1982FatalityRate 2.01 (0.14)0.15 BeerTaxi,1982(0.18)
9Data from 19883210Traffic fatality rate4Traffic deaths and alcohol taxes in 19880.511.522.5Tax on beer (in real dollars)\ i,1988FatalityRate 1.86 (0.11)0.44 BeerTaxi,1988(0.16)
10Panel data: before-after analysis Both regression using data from 1982 & 1988 likely suffer from omittedvariable bias We can use data from 1982 and 1988 together as panel data Panel data with T 2 Observed are Yi1 , Yi2 and Xi1 , Xi2 Suppose model isYit β0 β1 Xit β2 Zi uitand we assume E(uit Xi1 , Xi2 , Zi ) 0 Zi are (unobserved) variables that vary between states but not over time (such as local cultural attitude towards drinking and driving) Parameter of interest is β1
11Panel data
12Panel data: before Consider cross-sectional regression for first period (t 1):Yi1 β0 β1 Xi1 β2 Zi ui1E [ui Xi1 , Zi ] 0 Zi observed: multiple regression of Yi1 on constant, Xi1 and Zi leads tounbiased and consistent estimator of β1 Zi not observed: regression of Yi1 on constant and Xi1 only results inunbiased estimator of β1 when Cov (Xi1 , Zi ) 0 What can we do if we don’t observe Zi ?
13Panel data: after We also observe Yi2 and Xi2 , hence model for second period is:Yi2 β0 β1 Xi2 β2 Zi ui2 Similar to argument before cross-sectional analysis for period 2 mightfail Problem is again the unobserved heterogeneity embodied in Zi
14Before-after analysis (first differences) We haveYi1 β0 β1 Xi1 β2 Zi ui1andYi2 β0 β1 Xi2 β2 Zi ui2 Subtracting period 1 from period 2 givesYi2 Yi1 (β0 β1 Xi2 β2 Zi ui2 ) (β0 β1 Xi1 β2 Zi ui1 ) Applying OLS to:Yi2 Yi1 β1 (Xi2 Xi1 ) (ui2 ui1 )will produce an unbiased and consistent estimator of β1 Advantage of this regression is that we do not need data on Z By analyzing changes in dependent variable we automatically control fortime-invariant unobserved factors
15Data from 1982 and 1988.2 .4 .6 .8 1.4 1.2 1 .8 .6 .4 .2 0Fatality rate 1988 Fatatlity rate 1982Traffic deaths and alcohol taxes: before after .6 .4 .20.2.4Beer tax 1988 Beer tax 1982\Fatalityi,1988 Fatalityi,1982 0.07(0.06) 1.04(0.42)(BeerTaxi,1988 BeerTaxi,1982 )
16Panel data with more than 2 time periods
17Panel data with more than 2 time periods Panel data with T 2Yit β0 β1 Xit β2 Zi uit ,i 1, ., n;t 1, ., T Yit is dependent variable; Xit is explanatory variable; Zi are statespecific, time invariant variables Equation can be interpreted as model with n specific intercepts (one foreach state)Yit β1 Xit αi uit ,withαi β0 β2 Zi αi , i 1, ., n are called entity fixed effects αi models impact of omitted time-invariant variables on Yit
183.532.521.51.50Predicted fatality rate4State specific intercepts0.511.52beer taxAlabamaArkansaArizonaCalifornia2.53
19Fixed effects regression modelLeast squares with dummy variablesHaving data on Yit and Xit how to determine β1 ? Population regression model: Yit β1 Xit αi uit In order to estimate the model we have to quantify αi Solution: create n dummy variables D1i , ., Dni with D1i 1 if i 1 and 0 otherwise, with D2i 1 if i 2 and 0 otherwise,. Population regression model can be written as:Yit β1 Xit α1 D1i α2 D2i . αn Dni uit
20Fixed effects regression modelLeast squares with dummy variables Alternatively, population regression model can be written as:Yit β0 β1 Xit γ2 D2i . γn Dni uitwith β0 α1 and γi αi β0 for i 1 Interpretation of β1 identical for both representations Ordinary Least Squares (OLS): choose β̂0 , β̂1 , γ̂2 ., γ̂n to minimizesquared prediction mistakes (SSR):n XT XYit β̂0 β̂1 Xit γ̂2 D2i . γ̂n Dnii 1 t 1 SSR is function of β̂0 , β̂1 , γ̂2 ., γ̂n 2
21Fixed effect regression modelLeast squares with dummy variablesn XT XYit β̂0 β̂1 Xit γ̂2 D2i . γ̂n Dni 2i 1 t 1OLS procedure: Take partial derivatives of SSR w.r.t. β̂0 , β̂1 , γ̂2 ., γ̂n Equal partial derivatives to zero resulting in n 1 equations with n 1unknown coefficients Solutions are the OLS estimators β̂0 , β̂1 , γ̂2 ., γ̂n
22Fixed effect regression modelLeast squares with dummy variables Analytical formulas require matrix algebra Algebraic properties OLS estimators (normal equations, linearity) sameas for simple regression model Extension to multiple X ’s straightforward: n k normal equations OLS procedure is also labeled Least Squares Dummy Variables (LSDV)method Dummy variable trap: Never include all n dummy variables and theconstant term!
23Fixed effect regression modelWithin estimation Typically n is large in panel data applications With large n computer will face numerical problem when solving systemof n 1 equations OLS estimator can be calculated in two steps First step: demean Yit and Xit Second step: use OLS on demeaned variables
24Fixed effect regression modelWithin estimation We have Ȳi 1TPTt 1Yit β1 Xit αi uitȲi β1 X̄i αi ūiYit , etc. is entity mean Subtracting both expressions leads toYit Ȳi (β1 Xit αi uit ) (β1 X̄i αi ūi )Ỹit β1 X̃it ũit Ỹit Yit Ȳi , etc. is entity demeaned variable αi has disappeared; OLS on demeaned variables involves solving onenormal equation only!
25Fixed effect regression modelWithin estimation
26Fixed effect regression modelWithin estimation Entity demeaning is often called the Within transformation Within transformation is generalization of "before-after" analysis to morethan T 2 periods Before-after: Yi2 Yi1 β1 (Xi2 Xi1 ) (ui2 ui1 ) Within: Yit Ȳi β1 (Xit X̄i ) (uit ūi ) LSDV and Within estimators are identical:\ itFatalityRate 0.66(0.19)(FatalityRate\it FatalityRate)BeerTaxit 0.66(0.19) State dummies(BeerTaxit BeerTax)
27Fixed effects regression modeltime fixed effects In addition to entity effects we can also include time effects in the model Time effects control for omitted variables that are common to all entitiesbut vary over time Typical example of time effects: macroeconomic conditions or federalpolicy measures are common to all entities (e.g. states) but vary overtime Panel data model with entity and time effects:Yit β1 Xit αi λt uit
28Fixed effects regression modeltime fixed effects OLS estimation straightforward extension of LSDV/Within estimators ofmodel with only entity fixed effects LSDV: create T dummy variables B1t .BTtYit β0 β1 Xit γ2 D2i . γn Dni δ2 B2t δ3 B3t . δT BTt uit Within estimation: Deviating Yit and Xit from their entity and time-periodmeans The effect of the tax on beer on the traffic fatality rate:\ itFatalityRate 0.64(0.20)BeerTaxit State dummies Time dummies
29Fixed effects regression modelstatistical properties OLSYit β1 Xit αi λt uitstatistical assumptions are:ASS #1: E (uit Xi1 , ., XiT , αi , λt ) 0ASS #2: (Xi1 , ., XiT , Yi1 , ., YiT ) are i.i.d. over the cross-sectionASS #3: large outliers are unlikelyASS #4: no perfect multicollinearityASS #5: cov (uit , uis Xi1 , ., XiT , αi , λt ) 0 for t 6 s
30Fixed effects regression modelstatistical properties OLSASS #1 to ASS #5 imply that: OLS estimator β̂1 is unbiased and consistent estimator of β1 OLS estimators approximately have a normal distributionremarks: ASS #1 is most important extension to multiple X ’s straightforwardYit β1 X 1it β2 X 2it . βk Xkit αi λt uit additional assumption ASS #5 implies that error terms are uncorrelatedover time (no autocorrelation)
31Fixed effects regression modelClustered standard errors Violation of assumption #5: error terms are correlated over time:(Cov (uit , uis ) 6 0) uit contains time-varying factors that affect the traffic fatality rate (butthat are uncorrelated with the beer tax) These omitted factors might for a given entity be correlated over time Examples: downturn in local economy, road improvement project Not correcting for autocorrelation leads to standard errors which areoften too low
32Fixed effects regression modelClustered standard errors Solution: compute HAC-standard errors (clustered se’s) robust to arbitrary correlation within clusters (entities) robust to heteroskedasticity assume no correlation across entities Clustered standard errors valid whether or not there isheteroskedasticity and/or autocorrelation Use of clustered standard errors problematic when number of entities isbelow 50 (or 42) In stata: command, cluster(entity)
33The effect of a tax on beer on traffic fatalitiesDependent variable: traffic fatality rate (number of deaths per 10 000)Beer taxState fixed effectsTime fixed effectsAdditional control variablesClustered standard s336yesyesyesyes336Note: * significant at 10% level, ** significant at 5% level, *** significant at 1% level. Control variables: Unemployment rate, per capitaincome, minimum legal drinking age.
34Panel data: an examplereturns to schoolingYit β1 Xit αi uit Yit is logarithm of individual earnings; Xit is years of completededucation αi unobserved ability Likely to be cross-sectional correlation between Xit and αi , hencestandard cross-sectional analysis with OLS fails However, in this case panel data does not solve the problem because Xittypically lacks time series variation (Xit Xi ) We have to resort to cross-sectional methods (instrumental variables) toidentify returns to schooling
35Panel data: Cigarette taxes and smoking Is there an effect of cigarette taxes on smoking behavior?Yit β1 Xit αi uit Yit number of packages per capita in state i in year t, Xit is real tax oncigarettes in state i in year t αi is a state specific effect which includes state characteristics which areconstant over time Data for 48 U.S. states in 2 time periods: 1985 and 1995
36Panel data: Cigarette taxes and smokingLpackpc log numberof packagesper capita in state i in year tMultipleregressionrtax real avr cigarette specific tax during fiscal year in state iLperinc log per capita real incomeLperinc log per capita real income. regress lpackpc rtax lperincSource SSdfMS------------- -----------------------------Model 1.769086552 .884543277Residual 3.8704938993 .041618214------------- -----------------------------Total 5.6395804595 .059364005Number of obsF( 2,93)Prob FR-squaredAdj R-squaredRoot MSE --------lpackpc Coef.Std. Err.tP t [95% Conf. Interval]------------- -------------rtax nc -.0139092.158696-0.090.930-.3290481.3012296cons -------------------
37Panel data: Cigarette taxes and smokingBefore-After estimation. gen diff rtax rtax1995- rtax1985. gen diff lpackpc lpackpc1995- lpackpc1985. gen diff lperinc lperinc1995- lperinc1985. regressdiff lpackpc diff rtax diff lperinc, noconsSource SSdfMS------------- -----------------------------Model 3.334750112 1.66737506Residual .52657178246 .011447213------------- -----------------------------Total 3.8613218948 .080444206Number of obsF( 2,46)Prob FR-squaredAdj R-squaredRoot MSE ---------diff lpackpc Coef.Std. Err.tP t [95% Conf. Interval]------------- -------------diff rtax -.0169369.0020119-8.420.000-.0209865-.0128872diff lperinc -1.011625.1325691-7.630.000-1.278473-.7447771
stateB14 7.798515.322485424.180.0007.1493858.447644stateB15 7.970896.311561825.580.0007.3437558.598038stateB16 7.76369.305466725.420.0007.1488178.378562stateB17 8.153021.337908924.130.0007.4728458.833196stateB18 7.981185.344570223.160.0007.2876018.674769stateB19 7.913551.304050626.030.0007.3015288.525573stateB20 8.184433.316191625.880.0007.5479728.820893stateB21 7.982302.326357924.460.0007.3253778.639226stateB22 7.940574.324093124.500.0007.2882088.59294stateB23 7.510587.28656826.210.0006.9337558.087418stateB24 7.528216.302243724.910.0006.9198318.136601. regresslpackpclperinc .325862stateB*, noconsstateB25 rtax7.82088624.000.0007.164968.476812stateB26 7.695812.300696825.590.0007.0905418.301083Source SSdfMS 24.60of obs 8.44447396stateB277.805769.31730620.000 Number7.167064------------- -----------------------------46) 7317.61stateB28 8.476793.336811225.170.000 F( 50,7.7988279.154759Model 2094.157285041.8831457Prob F 0.0000stateB29 8.16063.347140623.510.0007.4618728.859388Residual 324.000.000 R-squared6.678332 7.901178------------- -----------------------------AdjR-squared 0.9997stateB31 8.093636.334912224.170.0007.4194938.76778Total 2094.4205796 21.8168809MSE 8.780447.07565stateB328.100707.337692523.990.000 Root 7.420967stateB33 -------------------stateB34 7.852661.309228225.390.0007.2302178.475106lpackpc Coef.Std.Err.tP t [95%Conf. 8.556589------------- -------------stateB36 7.940046.325614224.380.0007.2846198.595473rtax B37 8.178333.319792925.570.0007.5346238.822043lperinc B38 7.611761.31008724.550.0006.9875888.235933stateB1 7.6636887.052229stateB397.644086 .3037711.3051323 25.2325.05 0.0000.0007.029887 8.2751488.258286stateB2 7.8344487.2453678.42353stateB407.846138 .2926539.3163243 26.7724.80 0.0000.0007.209418.482865stateB3 7.678433.312152524.600.0007.0501038.306763stateB41 7.801418.315223824.750.0007.1669068.435931stateB4 7.66627.339222122.600.0006.9834518.349088stateB42 7.045477.301486223.370.0006.4386177.652337stateB5 7.817715.336954823.200.0007.139468.49597stateB43 7.816716.345850722.600.0007.1205548.512877stateB6 8.2614117.5491618.97366stateB447.99247 .3538431.3153114 23.3525.35 0.0000.0007.3577818.627159stateB7 8.1894837.504586stateB457.844359 .3402545.3193189 24.0724.57 0.0000.0007.201603 8.8743798.487114stateB8 7.989006.324298224.630.0007.3362288.641784stateB46 7.92666.315417525.130.0007.2917588.561563stateB9 7.754668.322856724.020.0007.1047918.404545stateB47 7.644741.293682626.030.0007.0535898.235894stateB10 7.837622.312155825.110.0007.2092858.465959stateB48 7.825943.327569423.890.0007.166588.485306stateB11 7.459151.303682424.560.0006.847878.070432stateB12 7.993558.333973523.930.0007.3213058.665812stateB13 7.952852.321327224.750.0007.3060548.5996538Panel data: Cigarette taxes and smokingLeast squares with dummy variables (no constant term).
stateB15 .1541802.08334241.850.071-.0135794.3219398stateB16 7 .3363049.08921013.770.000.1567343.5158755stateB18 .1644693.08029522.050.046.0028434.3260952stateB19 .0968347.09506111.020.314-.0945133.2881827stateB20 .3677169.10126533.630.001.1638804.5715534stateB21 .1655858.08792621.880.066-.0114005.3425721stateB22 .1238581.08098451.530.133-.0391553.2868715stateB23 24 -.2885003.0909945-3.170.003-.4716627-.1053379. regress lpackpc rtax lperinc stateB*stateB25 .0041703.07836670.050.958-.1535736.1619142stateB26 -.1209041.097897-1.240.223-.3179605.0761523Source SSdfMSNumber of obs 96stateB27 ------- -----------------------------F( 49,46) 19.17stateB28 .6600769.0801628.230.000.4987191.8214346Model 5.3762945549 .109720297Prob F 0.0000stateB29 .3439141.08476274.060.000.1732956.5145326Residual .26328589146 .005723606R-squared 0.9533stateB30 -------- -----------------------------Adj R-squared 0.9036stateB31 .2769205.08183113.380.001.112203.441638Total 5.6395804595 .059364005Root MSE .07565stateB32 ------------------stateB33 .1457052.08166721.780.081-.0186823.3100927lpackpc Coef.Std. Err.tP t [95% Conf. Interval]stateB34 ----- -------------stateB35 .1030576.08926661.150.254-.0766268.2827421rtax B36 .1233301.08412961.470.149-.0460139.2926741lperinc B37 .3616172.09447483.830.000.1714493.551785stateB1 38 2 .0177322.10052720.180.861-.1846185.220083stateB39 3 -.138283.090497-1.530.133-.320444.043878stateB40 .0294217.08317690.350.725-.1380046.1968481stateB4 41 5 .0009988.0788870.010.990-.1577924.1597901stateB42 6 .4446946.08766635.070.000.2682314.6211578stateB43 (dropped)stateB7 .3727666.0788564.730.000.2140378.5314954stateB44 .1757536.08541442.060.045.0038233.347684stateB8 .1722899.0861122.000.051-.0010446.3456245stateB45 .0276429.09480940.290.772-.1631985.2184843stateB9 46 .1099444.09181561.200.237-.0748708.2947597stateB10 .0209059.09024350.230.818-.1607448.2025567stateB47 11 B48 .0092272.07871880.120.907-.1492255.16768stateB12 .1768425.08300812.130.039.009756.3439291cons 7.816716.345850722.600.0007.1205548.512877stateB13 -------------------stateB14 5 .1541802.08334241.850.071-.0135794.321939839Panel data: Cigarette taxes and smokingLeast squares with dummy variables with constant term.
40Panel data: Cigarette taxes and smokingWithin estimation. xtreg lpackpc rtax lperinc, fe i(STATE)Fixed-effects (within) regressionGroup variable: STATENumber of obsNumber of groups 9648R-sq:Obs per group: min avg max 22.02within 0.8636between 0.0896overall 0.2354corr(u i, Xb) -0.5687F(2,46)Prob F ---------------------------------------lpackpc Coef.Std. Err.tP t [95% Conf. Interval]------------- -------------rtax nc -1.011625.1325691-7.630.000-1.278473-.7447771cons ----- -------------sigma u .25232518sigma e .07565452rho .91751731(fraction of variance due to u -----------------------------F test that all u i 0:F(47, 46) 13.41Prob F 0.0000
Under these three assumption the OLS estimators are unbiased, . A sample of individuals observed in 1 time period 2010 Panel data: same sample of individuals observed in multiple time periods 2010 2011 . i are (unobserved) variables that vary between states but not over time (such as local cultural attitude towards drinking and driving)
Introduction of Chemical Reaction Engineering Introduction about Chemical Engineering 0:31:15 0:31:09. Lecture 14 Lecture 15 Lecture 16 Lecture 17 Lecture 18 Lecture 19 Lecture 20 Lecture 21 Lecture 22 Lecture 23 Lecture 24 Lecture 25 Lecture 26 Lecture 27 Lecture 28 Lecture
Harmless Econometrics is more advanced. 2. Introduction to Econometrics by Stock and Watson. This textbook is at a slightly lower level to Introductory Econometrics by Wooldridge. STATA 3. Microeconometrics Using Stata: Revised Edition by Cameron and Trivedi. An in-depth overview of econometrics with STATA. 4. Statistics with STATA by Hamilton .
Econometrics is the branch of economics concerned with the use of mathematical methods (especially statistics) in describing economic systems. Econometrics is a set of quantitative techniques that are useful for making "economic decisions" Econometrics is a set of statistical tools that allows economists to test hypotheses using
What is Econometrics? (cont'd) Introductory Econometrics Jan Zouhar 7 econometrics is not concerned with the numbers themselves (the concrete information in the previous example), but rather with the methods used to obtain the information crucial role of statistics textbook definitions of econometrics: "application of mathematical statistics to economic data to lend
Ramu Ramanathan, Introductory Econometrics with Applications - plik instalacyj. William H. Green, Econometric Analysis - plik instalacyjnym Jeffrey M. Wooldridge, Introductory Econometrics - plik wooldridge_data.exe (2.21Mb) Damodar Gujarati, Basic Econometrics - plik gujarati_data.exe (341Kb)
There are 2 types of nonlinear regression models 1 Regression model that is a nonlinear function of the independent variables X 1i;:::::;X ki Version of multiple regression model, can be estimated by OLS. 2 Regression model that is a nonlinear function of the unknown coefficients 0; 1;::::; k Can't be estimated by OLS, requires different .
There are 2 types of nonlinear regression models 1 Regression model that is a nonlinear function of the independent variables X 1i;:::::;X ki Version of multiple regression model, can be estimated by OLS. 2 Regression model that is a nonlinear function of the unknown coefficients 0; 1;::::; k Can't be estimated by OLS, requires different .
Intra-day Trading Defined What is Intra-day Trading? 1) A style that covers a holding period of several minutes to hours. 2) Three forms of Intra-day Trading: Scalping Momentum 3) This style of trading has become widely accepted recently. 4) Day Traders use 5- & 15-Min. charts to make entries and exits. 5) Day Trading is best used on active, highly liquid stocks.