Stata Illustration Simple And Multiple Linear Regression - UMass

1y ago
12 Views
2 Downloads
1.21 MB
27 Pages
Last View : 1d ago
Last Download : 3m ago
Upload by : Arnav Humphrey
Transcription

Stata Version 13 – Spring 2015Illustration: Simple and Multiple Linear RegressionStata version 13IllustrationSimple and Multiple Linear RegressionFebruary 2015I- Simple Linear Regression . . .1. Introduction to Example . .2. Preliminaries: Descriptives . .3. Model Fitting (Estimation) 4. Model Examination 5. Checking Model Assumptions and Fit . .223789II – Multiple Linear Regression . .1. A General Approach to Model Building . .2. Introduction to Example . .3. Preliminaries: Descriptives . . .4. Handling of Categorical Predictors: Indicator Variables .5. Model Fitting (Estimation) .6. Checking Model Assumptions and Fit 12121314181924 \1. Teaching\stata\stata version 13 – SPRING 2015\stata v 13 first session.docxPage 1 of 27

Stata Version 13 – Spring 2015Illustration: Simple and Multiple Linear RegressionI – Simple Linear Regression1. Introduction to ExampleSource:Chatterjee, S; Handcock MS and Simonoff JS A Casebook for a First Course in Statistics and Data Analysis.New York, John Wiley, 1995, pp 145-152.Setting:Calls to the New York Auto Club are possibly related to the weather, with more calls occurring during badweather. This example illustrates descriptive analyses and simple linear regression to explore this hypothesis in adata set containing information on calendar day, weather, and numbers of calls.Stata Data Set:ers.dtaIn this illustration, the data set ers.dta is accessed from the PubHlth 640 website directly.It is then saved to your current working directory.Simple Linear Regression Variables:Outcome Y callsPredictor X low.Launch Stata and input Stata data set ers.dta. ***** Set working directory to directory of choice. ***** Command is cd/YOURDIRECTORY. cd/Users/cbigelow/Desktop. ***** Input data from url on internet. ***** Command is use “http:FULLURL”. use "http://people.umass.edu/biep640w/datasets/ers". save "ers.dta", replace(note: file ers.dta not found)file ers.dta saved. ***** Save the inputted data to the directory you have chosen above. ***** Command is save “NAME”, replace. save "ers.dta", replace(note: file ers.dta not found)file ers.dta saved \1. Teaching\stata\stata version 13 – SPRING 2015\stata v 13 first session.docxPage 2 of 27

Stata Version 13 – Spring 2015Illustration: Simple and Multiple Linear Regression2. Preliminaries: Descriptives. * Describe data set. codebook, kdayyearsundaysubzeroObs 10-2000000Max12447894753405541111111LabelWe see that this data set has n 28 observations on several variables. For this illustrationof simple linear regression, we will consider just two variables: calls and lowBEWARE – Stata is case sensitive!. ***** Numerical Summaries. ***** tabstat XVARIABLE YVARIABLE, stat(n mean sd min max). tabstat low calls, stat(n mean sd min max)stats lowcalls--------- -------------------N 2828mean 21.754318.75sd 13.27383 2692.564min -21674max 418947------------------------------ \1. Teaching\stata\stata version 13 – SPRING 2015\stata v 13 first session.docxPage 3 of 27

Stata Version 13 – Spring 2015Illustration: Simple and Multiple Linear Regression. ***** Scatterplot. ***** graph twoway (scatter YVARIABLE XVARIABLE, symbol(d)), title("TITLE"). graph twoway (scatter calls low, symbol(d)), title("Calls to NY Auto Club 1993-1994")The scatterplot suggests, as we might expect, that lower temperatures are associated with more calls to the NY Auto Club.We also see that the data are a bit messy. \1. Teaching\stata\stata version 13 – SPRING 2015\stata v 13 first session.docxPage 4 of 27

Stata Version 13 – Spring 2015Illustration: Simple and Multiple Linear Regression. ***** Scatterplot with Lowess Regression. ***** graph twoway (scatter YVARIABLE XVARIABLE, symbol(d)) (lowess YVARIABLE XVARIABLE,bwidth(.99) lpattern(solid)), title("TITLE") subtitle("TITLE"). graph twoway (scatter calls low, symbol(d)) (lowess calls low, bwidth(.99)lpattern(solid)), title("Calls to NY Auto Club 1993-1994")Unfamiliar with LOWESS regression? LOWESS regression stands for “locally weighted scatterplotsmoother”.It is a technique for drawing a smooth line through the scatter plot to obtain a sense forthe nature of thefunctional form that relates X to Y, not necessarily linear. The method involves the following. At each observation (x,y), theobserved data point is fit to a line using some “adjacent” points. It’s handy for seeing where in the data linearity holds andwhere it no longer holds. \1. Teaching\stata\stata version 13 – SPRING 2015\stata v 13 first session.docxPage 5 of 27

Stata Version 13 – Spring 2015Illustration: Simple and Multiple Linear Regression. ***** Shapiro Wilk Test of Normality of Y (Reject normality for small p-value). ***** swilk YVARIABLE. swilk callsShapiro-Wilk W test for normal dataVariable ObsWVzProb z------------- alls 280.829165.1593.3780.00037The null hypothesis of normality of Y calls is rejected (p-value .00037). Tip- sometimes the cure is worse thanthe original violation. For now, we’ll charge on. ***** Histogram with Overlay Normal for Assessment of Normality of Outcome. ***** histogram YVARIABLE, frequency normal title("TITLE"). histogram calls, frequency normal title("Distribution of Y Calls w Overlay Normal")(bin 5, start 1674, width 1454.6)No surprise here, given that the Shapiro Wilk test rejected normality. This graph confirms non-linearity of thedistribution of Y calls. \1. Teaching\stata\stata version 13 – SPRING 2015\stata v 13 first session.docxPage 6 of 27

Stata Version 13 – Spring 2015Illustration: Simple and Multiple Linear Regression3. Model Fitting (Estimation). ***** Fit and ANOVA Table. ***** regress YVARIABLE XVARIABLE. regress calls lowSource SSdfMS------------- -----------------------------Model 1002337191100233719Residual 95513596.226 3673599.85------------- -----------------------------Total 19574731527 7249900.56Number of obsF( 1,26)Prob FR-squaredAdj R-squaredRoot MSE --------calls Coef.Std. Err.tP t [95% Conf. Interval]------------- -------------low -145.15427.78868-5.220.000-202.2744-88.03352cons ------------------Remarks The fitted line iscallsˆ 7,475.85 - 145.15*[low]2 R .51 indicates that 51% of the variability in calls is explained.The overall F test significance level “PROB F” .0001 suggests that the straight line fit performs better in explaining variability in calls than does Y average # callsFrom this output, the analysis of variance is the following:SourceModel“Regression”Df1Sum of SquaresMSS (Yˆ Y ) 100,233,7192ii 1Residual“Error”Total, correctedMean Squaren(n-2) 26 (nRSS i 1n(n-1) 27TSS Yi Yˆi) (Y Y )i 1i \1. Teaching\stata\stata version 13 – SPRING 2015\stata v 13 first session.docx22 95,513,596.2MSS/1 100,233,719RSS/(n-2) 3,673,599.85 195,747,315Page 7 of 27

Stata Version 13 – Spring 2015Illustration: Simple and Multiple Linear Regression4. Model Examination. * Scatterplot with overlay fit and overlay 95% confidence band. ***** graph twoway (scatter YVARIABLE XVARIABLE, symbol(d)) (lfit YVARIABLE XVARIABLE)(lfitci YVARIABLE XVARIABLE), title("TITLE") subtitle("TITLE"). graph twoway (scatter calls low, symbol(d)) (lfit calls low) (lfitci calls low),title("Calls to NY Auto Club 1993-1994") subtitle("95% Confidence Bands")Remarks The overlay of the straight line fit is reasonable but substantial variability is seen, too.There is a lot we still don’t know, including but not limited to the following --Case influence, omitted variables, variance heterogeneity, incorrect functional form, etc. \1. Teaching\stata\stata version 13 – SPRING 2015\stata v 13 first session.docxPage 8 of 27

Stata Version 13 – Spring 2015Illustration: Simple and Multiple Linear Regression5. Checking Model Assumptions and Fit.***** Residuals Analysis - Normalilty of residuals***** Look for points falling on the line**** predict NAME, residualspredict ehat, residuals. ***** pnorm NAME, title("TITLE"). pnorm ehat, title("Normality of Residuals of Y calls on X low")Not bad actually! \1. Teaching\stata\stata version 13 – SPRING 2015\stata v 13 first session.docxPage 9 of 27

Stata Version 13 – Spring 2015.Illustration: Simple and Multiple Linear Regression***** Residuals Analysis - Cook Distances***** Look for even band of Cook Distance values with no extremes***** predict NAMECOOK, cooksdpredict cookhat, cooksdgenerate id n. ****** graph twoway (scatter NAMECOOK id, symbol(d)), title("TITLE IN QUOTES")subtitle("TITLE IN QUOTES"). graph twoway (scatter cookhat id, symbol(d)), title("Calls to NY Auto Club 1993-1994")subtitle("Cooks Distances")Remarks For straight line regression, the suggestion is to regard Cook’s Distancevalues 1 as significant.Here, there are no unusually large Cook Distance values.Not shown but useful, too, are examinations of leverage and jackknife residuals. \1. Teaching\stata\stata version 13 – SPRING 2015\stata v 13 first session.docxPage 10 of 27

Stata Version 13 – Spring 2015.Illustration: Simple and Multiple Linear Regression***** Check Linearity, Heteroscedascity & Independence Using Jacknife Residuals***** note - Stata calls these studentized***** predict NAMEPREDICTED, xb***** predict NAMEJACKNIFE, rstudentpredict yhat, xbpredict jack, rstudent. ****** graph twoway (scatter NAMEJACKNIFE NAMEPREDICTED, symbol(d)), title("TITLE")subtitle("TITLE"). graph twoway (scatter jack yhat, symbol(d)), title("Calls to NY Auto Club 1993-1994")subtitle("Jacknife Residuals v Predicted")Remarks Recall – A jackknife residual for an individual is a modification of the solution for astudentizedresidual in which the mean square error is replaced by the mean square error obtained afterdeleting that individual from the analysis.Departures of this plot from a parallel band about the horizontal line at zero are significant.The plot here is a bit noisy but not too bad considering the small sample size. \1. Teaching\stata\stata version 13 – SPRING 2015\stata v 13 first session.docxPage 11 of 27

Stata Version 13 – Spring 2015Illustration: Simple and Multiple Linear RegressionII – Simple Linear Regression1. A General Approach for Model DevelopmentThere are no rules nor single best strategy. In fact, different study designs and different research questions callfor different approaches for model development. Tip – Before you begin model development, make a list of yourstudy design, research aims, outcome variable, primary predictor variables, and covariates.As a general suggestion, the following approach as the advantages of providing a reasonably thoroughexploration of the data and relatively little risk of missing something importantPreliminary – Be sure you have: (1) checked, cleaned and described your data, (2) screened the data formultivariate associations, and (3) thoroughly explored the bivariate relationships.Step 1 – Fit the “maximal” model.The maximal model is the large model that contains all the explanatory variables of interest as predictors. Thismodel also contains all the covariates that might be of interest. It also contains all the interactions that might be ofinterest. Note the amount of variation explained.Step 2 – Begin simplifying the model.Inspect each of the terms in the “maximal” model with the goal of removing the predictor that is the leastsignificant. Drop from the model the predictors that are the least significant, beginning with the higher orderinteractions (Tip -interactions are complicated and we are aiming for a simple model). Fit the reduced model.Compare the amount of variation explained by the reduced model with the amount of variation explained by the“maximal” model.If the deletion of a predictor has little effect on the variation explainedThen leave that predictor out of the model.And inspect each of the terms in the model again.If the deletion of a predictor has a significant effect on the variation explainedThen put that predictor back into the model.Step 3 – Keep simplifying the model.Repeat step 2, over and over, until the model remaining contains nothing but significant predictor variables.Beware of some important caveats!!!Sometimes, you will want to keep a predictor in the model regardless of its statisticalsignificance (an example is randomization assignment in a clinical trial)The order in which you delete terms from the model mattersYou still need to be flexible to considerations of biology and what makes sense. \1. Teaching\stata\stata version 13 – SPRING 2015\stata v 13 first session.docxPage 12 of 27

Stata Version 13 – Spring 2015Illustration: Simple and Multiple Linear Regression2. Introduction to ExampleSource:Matthews et al. Parity Induced Protection Against Breast Cancer 2007.Research Question:What is the relationship of Y p53 expression to parity and age at first pregnancy, after adjustment for thepotentially confounding effects of current age and menopausal status. Age at first pregnancy has been groupedand is either 24 years or 24 years.Launch Stata and input Stata data set ers.dta. ***** Just to be safe!save "ers.dta", replacefile ers.dta savedsave the ers.dta data again. ***** Clear the workspace. ***** Command is clear. clear. ***** Input data from url on internet. ***** Command is use “http:FULLURL”. use “http://people.umass.edu/biep691f/data/p53paper small.dta”, replace. ***** Save the inputted data to the directory you have chosen above. ***** Command is save “NAME”, replace. save "p53paper small.dta", replace \1. Teaching\stata\stata version 13 – SPRING 2015\stata v 13 first session.docxPage 13 of 27

Stata Version 13 – Spring 2015Illustration: Simple and Multiple Linear Regression3. Introduction to Example. *. *****Explore the data for shape, range, outliers and completeness. summarize p53 pregnum agefirst agecurr menopVariable ObsMeanStd. Dev.MinMax------------- -----p53 673.2514931.05445416pregnum 671.6567161.12212203agefirst 671.044776.726820302agecurr 6739.6268713.697861575menop 67.2835821.454138201Data are complete; n 67 for every variable. Y p53 has a limited range, so that the assumption ofnormality is a bit dicey, but we’ll proceed anyway. Current age (agecurr) ranges 15 to 75. *. ***** Pairwise correlations for all the variables. pwcorr p53 pregnum agefirst agecurr menop, star(0.05) sig p53 pregnum agefirst agecurrmenop------------- --------------------------------------------p53 1.0000correlation(pregnum, p53) 0.4419 pregnum 0.4419* 1.0000 0.0002p-value for null (zero correlation) .0002 " Reject null.agefirst 0.20210.5765* 1.0000 0.10110.0000 agecurr 0.13400.5416* 0.4765* 1.0000 0.27980.00000.0000 menop 0.04500.4021* 0.2823* 0.7285* 1.0000 0.71780.00070.02070.0000 Only one correlation with Y p53 is statistically significant r(p53, pregnum) .44 with p-value .0002.Note that some of the predictors are statistically significantly correlated with each other:r(agefirst, pregnum) .58 with p-value .0001. \1. Teaching\stata\stata version 13 – SPRING 2015\stata v 13 first session.docxPage 14 of 27

Stata Version 13 – Spring 2015Illustration: Simple and Multiple Linear Regression. *. ***** Pairwise scatterplots for all the variables. set scheme lean2. graph matrix p53 pregnum agefirst agecurr menop, half maxis(ylabel(none) xlabel(none))title("Pairwise Scatter Plots") note("matrixplot.png", size(vsmall))Admittedly, it’s a little hard to see a lot going on here. \1. Teaching\stata\stata version 13 – SPRING 2015\stata v 13 first session.docxPage 15 of 27

Stata Version 13 – Spring 2015Illustration: Simple and Multiple Linear Regression. *. ***** Graphical assessment of linearity of Y p53 in predictors, with line and lowess fits. *. ***** pregnum. graph twoway (scatter p53 pregnum, symbol(d)) (lfit p53 pregnum) (lowess p53 pregnum),title("Assessment of Linearity") subtitle("Y p53, X pregnum") note("pregnum.png", size(vsmall))Looks reasonably linear.Probably okay to model Y p53 to X pregnum as is, instead of with dummies. *. ***** agefirst. graph twoway (scatter p53 agefirst, symbol(d)) (lfit p53 agefirst) (lowess p53 agefirst), title("Assessment ofLinearity") subtitle("Y p53, X agefirst") note("agefirst.png", size(vsmall))This does not look linear.So we will create dummies for age at 1st pregnancy. \1. Teaching\stata\stata version 13 – SPRING 2015\stata v 13 first session.docxPage 16 of 27

Stata Version 13 – Spring 2015Illustration: Simple and Multiple Linear Regression. *. ***** agecurr. graph twoway (scatter p53 agecurr, symbol(d)) (lfit p53 agecurr) (lowess p53 agecurr),title("Assessment of Linearity") subtitle("Y p53, X agecurr") note("agecurr.png", size(vsmall))Looks reasonably linear, albeit pretty flat.Probably okay to model Y p53 to X agecurr as is. \1. Teaching\stata\stata version 13 – SPRING 2015\stata v 13 first session.docxPage 17 of 27

Stata Version 13 – Spring 2015Illustration: Simple and Multiple Linear Regression4. Handling of Categorical Predictors: Indicator Variables. *. ***** Create Dummy variables for age at first pregnancy: early, late. generate early 0. replace early 1 if agefirst 1(32 real changes made)Check. generate late 0. replace late 1 if agefirst 2(19 real changes made). tab2 agefirst early- tabulation of agefirst by earlyAge at 1st earlyPregnancy 01 Total--------------- ---------------------- ---------never pregnant 160 16age le 24 032 32age 24 190 19--------------- ---------------------- ---------Total 3532 67Check using tab2 confirms that the new variable, early, is well defined. tab2 agefirst late- tabulation of agefirst by lateAge at 1st latePregnancy 01 Total--------------- ---------------------- ---------never pregnant 160 16age le 24 320 32age 24 019 19--------------- ---------------------- ---------Total 4819 67Ditto.The new variable, late, is well defined. label variable early "Age le 24". label variable late "Age gt 24" \1. Teaching\stata\stata version 13 – SPRING 2015\stata v 13 first session.docxPage 18 of 27

Stata Version 13 – Spring 2015Illustration: Simple and Multiple Linear Regression5. Model Fitting (Estimation).* --------------------------------* Model Estimation Set I: Determination of best model in the predictors of interest.* Goal is to obtain best parameterization before considering -----------------------------------------------. *. ***** Maximal model: Regression of Y p53 on all:. regress p53 pregnum early lateSource SSdfMS------------- -----------------------------Model 14.89671163 4.96557054Residual 58.48688963 .928363317------------- -----------------------------Total 73.383600666 1.11187274pregnum [early, late]Number of obsF( 3,63)Prob FR-squaredAdj R-squaredRoot MSE -------p53 Coef.Std. Err.tP t [95% Conf. Interval]------------- -------------pregnum .3764082.20087111.870.066-.0250006.7778171early .160762.55558870.290.773-.94949351.271017late -.0677176.5017357-0.130.893-1.070356.9349211cons ------------------The fitted line is p53 2.57 (0.38)*pregnum (0.16)*early – (0.07)*late.20% of the variability in Y p53 is explained by this model (R-squared .20)This model is statistically significantly better than the null model (p-value of F test .0024)NOTE!! We see a consequence of the multi-collinearity of our predictors [early, late], pregnum[early, late] have NON-significant t-statistic p-values: early and latepregnum has a t-statistic p-value that is only marginally significant.*.***** 2 df Partial F-test ( Null: [early, late] are not significant, controlling for pregnum). testparm early late( 1)( 2)early 0late 0F(2,63) Prob F 0.310.7381Not significant (p-value .74). Conclude that, in the adjusted model containing pregnum, [early,late] are not statistically significantly associated with Y p53. \1. Teaching\stata\stata version 13 – SPRING 2015\stata v 13 first session.docxPage 19 of 27

Stata Version 13 – Spring 2015Illustration: Simple and Multiple Linear Regression.*.***** 1 df Partial F-test (Null: pregnum is not significant, controlling for [early, late] ). testparm pregnum( 1)pregnum 0F(1,63) Prob F 3.510.0656Marginally statistically significant (p value .0656). The null hypothesis is rejected. Concludethat, in the model that contains [early, late], pregnum is marginally statistically significantlyassociated with Y p53.*.***** Save results from model above to “model1” for tabulation later. eststo model1. *. ***** Regression of Y p53 on pregnum only. [early, late] dropped. regress p53 pregnumSource SSdfMS------------- -----------------------------Model 14.330079114.330079Residual 59.053521665 .908515716------------- -----------------------------Total 73.383600666 1.11187274Number of obsF( 1,65)Prob FR-squaredAdj R-squaredRoot MSE --------p53 Coef.Std. Err.tP t [95% Conf. Interval]------------- -------------pregnum .4152523.10455723.970.000.2064372.6240675cons -------------------The fitted line is p53 2.56 (0.41)*pregnum.19.5% of the variability in Y p53 is explained by this model (R-squared .1953)This model is statistically significantly more explanatory that the null model (p-value .0002).*.***** Save results from model above to “model2” for tabulation later. eststo model2 \1. Teaching\stata\stata version 13 – SPRING 2015\stata v 13 first session.docxPage 20 of 27

Stata Version 13 – Spring 2015Illustration: Simple and Multiple Linear Regression. *. ***** Regression of Y p53 on design variables [early, late] only. regress p53 early latepregnum dropped.Source SSdfMS------------- -----------------------------Model 11.63683382 5.81841692Residual 61.746766764.96479323------------- -----------------------------Total 73.383600666 1.11187274 Number of obsF( 2,64)Prob FR-squaredAdj R-squaredRoot ----------p53 Coef.Std. Err.tP t [95% Conf. Interval]------------- -------------early 1.042969.3007483.470.001.44215551.643782late .645477.33328391.940.057-.02033421.311288cons -------------------.*.***** Save results from model above to “model2” for tabulation later. eststo model3.*.***** SUMMARY of Model Estimation Set I. esttab, r2 se ------Standard errors in parentheses* p 0.05, ** p 0.01, *** p 0.001Choose model “(2)” as a good “minimally adequate” model:Y p53 and X pregnum.This is why.(1) Model “(1)” is the maximal model. R-squared .20(2) Model “(2)” drops [early,late]. R-squared is minimally lower: R-squared .195(3) Model “(3)” drops pregnum. R-square drop is more substantial: R-squared .159 \1. Teaching\stata\stata version 13 – SPRING 2015\stata v 13 first session.docxPage 21 of 27

Stata Version 13 – Spring 2015.Illustration: Simple and Multiple Linear Regression* --------------------------------* Model Estimation Set II: Regression of Y p53 on parity with adjustment for* ----------------------------------------------. *. ***** Preliminary:. eststo clearClear the saved models. *. ***** Maximal model: Regression of Y p53 on pregnum covariates. regress p53 pregnum agecurr menopSource SSdfMS------------- -----------------------------Model 15.98270393 5.32756796Residual 57.400896763 .911125345------------- -----------------------------Total 73.383600666 1.11187274Number of obsF( 3,63)Prob FR-squaredAdj R-squaredRoot MSE -------p53 Coef.Std. Err.tP t [95% Conf. Interval]------------- -------------pregnum .4923299.12456633.950.000.2434041.7412557agecurr -.0047726.0136385-0.350.728-.032027.0224819menop -.2797843.3776867-0.740.462-1.034531.4749624cons -----------------. eststo model1. *. ***** Regression of Y p53 on pregnum menop only. regress p53 pregnum menopAgecurr dropped.Source SSdfMS------------- -----------------------------Model 15.87113362 7.93556682Residual 57.512466964 .898632296------------- -----------------------------Total 73.383600666 1.11187274Number of obsF( 2,64)Prob FR-squaredAdj R-squaredRoot MSE -------p53 Coef.Std. Err.tP t [95% Conf. Interval]------------- -------------pregnum .4750472.11357034.180.000.2481644.70193menop -.3674811.280619-1.310.195-.9280819.1931197cons -------------------. eststo model2 \1. Teaching\stata\stata version 13 – SPRING 2015\stata v 13 first session.docxPage 22 of 27

Stata Version 13 – Spring 2015Illustration: Simple and Multiple Linear Regression.*.***** SUMMARY of Model Estimation Set II. esttab, r2 se ----Standard errors in parentheses* p 0.05, ** p 0.01, *** p 0.001Choose as “candidate” final model Y p53 and X pregnum. \1. Teaching\stata\stata version 13 – SPRING 2015\stata v 13 first session.docxPage 23 of 27

Stata Version 13 – Spring 2015Illustration: Simple and Multiple Linear Regression6. Checking Model Assumptions and Fit.*.***** PRELIMINARY to checks: Model checks require that you have just fit the model you are checking. regress p53 pregnumSource SSdfMS------------- -----------------------------Model 14.330079114.330079Residual 59.053521665 .908515716------------- -----------------------------Total 73.383600666 1.11187274Number of obsF( 1,65)Prob FR-squaredAdj R-squaredRoot MSE --------p53 Coef.Std. Err.tP t [95% Conf. Interval]------------- -------------pregnum .4152523.10455723.970.000.2064372.6240675cons -------------------.*.***** Save the predicted values of Y in a new variable called yhat. predict yhat(option xb assumed; fitted values).*.***** Plot Observed versus Predicted – Ideally, points will fall on the X Y line. graph twoway (scatter yhat p53, symbol(d)) (lfit yhat p53) (lfitci yhat p53), title("Model Check")subtitle("Plot of Observed v Predicted") xlabel(1(1)6) ylabel(1(1)6) note("plot1.png", size(vsmall)) \1. Teaching\stata\stata version 13 – SPRING 2015\stata v 13 first session.docxPage 24 of 27

Stata Version 13 – Spring 2015Illustration: Simple and Multiple Linear Regression. *. ***** Normality of Residuals - Look for normality.***** Preliminary: Save the residuals in a new variable called residuals. predict residuals, resid.*.***** pnorm plot check of normality of residuals in middle range – Ideally points fall on X Y line. pnorm residuals, title("Model Check") subtitle("Standardized Normality Plot of Residuals")note("plot2.png", size(vsmall))Very reasonable.No worries here.*.***** qnorm plot check of normality of residuals in the tails – Ideally, points fall on X Y line. qnorm residuals, title("Model Check") subtitle("Quantile-Normal Plot of Residuals")note("plot3.png", size(vsmall))A little off the line in the tails, but okay. \1. Teaching\stata\stata version 13 – SPRING 2015\stata v 13 first session.docxPage 25 of 27

Stata Version 13 – Spring 2015Illustration: Simple and Multiple Linear Regression. *. ***** Shapiro Wilk test of normality of residuals (Null: residuals are normal). swilk residualsShapiro-Wilk W test for normal dataVariable ObsWVzProb z-------------

Stata Version 13 - Spring 2015 Illustration: Simple and Multiple Linear Regression \1. Teaching\stata\stata version 13 - SPRING 2015\stata v 13 first session.docx Page 12 of 27 II - Simple Linear Regression 1. A General Approach for Model Development There are no rules nor single best strategy. In fact, different study designs and .

Related Documents:

Stata is available in several versions: Stata/IC (the standard version), Stata/SE (an extended version) and Stata/MP (for multiprocessing). The major difference between the versions is the number of variables allowed in memory, which is limited to 2,047 in standard Stata/IC, but can be much larger in Stata/SE or Stata/MP. The number of

There are several versions of STATA 14, such as STATA/IC, STATA/SE, and STATA/MP. The difference is basically in terms of the number of variables STATA can handle and the speed at which information is processed. Most users will probably work with the “Intercooled” (IC) version. STATA runs on the Windows, Mac, and Unix computers platform.

To open STATA on the host computer, click on the “Start” Menu. Then, when you look through “All Programs”, open the “Statistics” folder you should see a folder that says “STATA”. Click on the folde r and it will open up three STATA programs (STATA 10, STATA 11, and STATA 12). These are all the

Categorical Data Analysis Getting Started Using Stata Scott Long and Shawna Rohrman cda12 StataGettingStarted 2012‐05‐11.docx Getting Started Using Stata – May 2012 – Page 2 Getting Started in Stata Opening Stata When you open Stata, the screen has seven key parts (This is Stata 12. Some of the later screen shots .

Stata/IC and Stata/SE use only one core. Stata/MP supports multiple cores, but only commands are speeded up. . I am using Stata 14 and not Stata 15) Setting up the seed using dataset lename. type can be F create creates a dataset with empty seeds for each variation. If option fill is used, then seeds are random numbers.

STATA/IC, STATA/SE, and STATA/MP. The difference is basically in terms of the number of variables STATA can handle and the speed at which information is processed. Most users will probably work with the “Intercooled” (IC) version. STATA runs on the Windows (2000, 2003, XP, Vista, Server 2008, or Windows 7), Mac, and Unix computers platform.

Stata/MP, Stata/SE, Stata/IC, or Small Stata. Stata for Windows installation 1. Insert the installation media. 2. If you have Auto-insert Notification enabled, the installer will start auto-matically. Otherwise, you will want to navigate to your installation media and double-click on Setup.exe to start the installer. 3.

5.3.3.5 Dana Pensiun Lembaga Keuangan 80 5.3.3.6 Pegadaian 84 5.3.3.7 Asuransi 85 BAB VI PASAR UANG DAN PASAR MODAL 93 6.1 Instrumen-instrumen Pasar Uang 95 1. Treasury Bills (T-Bills) 95 2. Bankers Acceptance 96 3. Bill of Exchange 98 4. Repurchase Agreement 99 5. CPPP (Commercial Paper Promissory Note) 101 vi