DOCUMENT RESUME ED 382 635 TM 023 064 AUTHOR

2y ago
21 Views
2 Downloads
302.44 KB
21 Pages
Last View : 13d ago
Last Download : 3m ago
Upload by : Axel Lin
Transcription

DOCUMENT RESUMEED 382 635AUTHORTITLEPUB DATENOTEPUB TYPEEDRS PRICEDESCRIPTORSIDENTIFIERSTM 023 064Thompson, BruceStepwise Regression and Stepwise DiscriminantAnalysis Need Not Apply.20 Apr 9522p.; Paper presented at the Annual Meeting of theAmerican Educational Research Association (SanFrancisco, CA, April 18-22, 1995).ReportsEvaluative/Feasibility (142)Speeches /Conference Papers (150)MFO1 /PCO1 Plus Postage.*Educational Research; *Error cf Measurement;Heuristics; *Psychological Testing; *Regression(Statistics); *Research Methodology; SamplingResearch' Replication; *Stepwise RegressionABSTRACTStepwise methods are frequently employed ineducational and psychological research, both to select useful subsetsof variables and to evaluate the order of importance of variables.Three problems witn stepwise applications are explored in somedetail. First, computer packages use incorrect degrees of freedom intheir stepwise computations, resulting in artifactually greaterlikelihood of obtaining spurious statistical significance. Second,stepwise methods do not correctly identify the best variable set of agiven size, as illustrated by a concrete heuristic example. Third,stepwise methods tend to capitalize on sampling error, and thus tendto yield results that are not replicable. (Contains 22 references, 4tables, and 1 figure.) ***************************Reproductions supplied by EDRS are the best that can be madefrom the original ******************************

akr)4DCA00stepbad.wplPERMISSION TO REPRODUCE THISMATERIAL HAS BEEN GRANTED BYU.S. DEPARTMENT OF EDUCATION(Mac* of Educatronat Research and ornptuvementEDUCATIONAL RESOURCES INFORMATIONCENTER (ERIC).19Xi)e E 71,epti loS orltp, 14 document has been reproduced asrecanted from the person or orgentrattonof Igmating aC)O Mmor changes have Peen made to improveraproduchon Quaid),TO THE EDUCATIONAL RESOURCESINFORMATION CENTER (ERIC)."Points of view or cantons stated in this document 00 not necessaray represent MoatOERI posthon or poltcySTEPWISE REGRESSION AND STEPWISE DISCRIMINANT ANALYSISNEED NOT APPLYBruce ThompsonTexas A&M University 77843-4225andBaylor College of MedicinePaper presented at the annual meeting of the AmericanEducational Research Association (training session #25.16), SanFrancisco, April 20, 1995.2BEST COPY AVAILABLE

ationalandpsychological research, both to select useful subsets of variablesand to evaluate the order of importance of variables.Threeproblems with stepwise applications are explored in some detail.First, computer packages use incorrect degrees of freedom in greaterlikelihood of obtaining spurious statistical significance. Second,stepwise methods do not correctly identify the best variable set ofa given size,as illustrated by a concrete heuristic example.Third, stepwise methods tend to capitalize on sampling error, andthus tend to yield results that are not replicable.3

Itis the practice within Educational and PsychologicalMeasurement and other journals to present occasional eralAPAstylerequirements. For example, Thompson (1994b) discussed requirementsinvolving bothstatisticalregarding score reliability.significance testing andlanguageThe present paper focuses on majorproblems with stepwise analyses, and suggests that these methodsought to be avoided in favor of more suitable alternatives.Huberty (1994) recently noted that, "It is quite common tofind the use of 'stepwise analyses' reported in empirically s havepresented scathing indictments of many of these applications (cf.Huberty, 1989; Snyder, 1991; Thompson, 1989).Three major problemscan be noted.The heuristic examples emploved here to illustrate these threeproblems involve stepwise regression analysis.However, since allcommonly applied analytic methods are correlational (Cohen, 1968),and are special cases of canonical correlation analysis (Knapp,1978; Thompson, 1991), the present discussion generalizes acrossthe full family of these various applications.Some researchers employ stepwise methods to select a subset ofbetter variables from among a larger constellation of predictors,for use in present or future research (i.e., so-called "variableselection").dynamics,The methods are also sometimes used to interpret dataunderapremisethatselectedvariablesare moreimportant thr'l predictors that are not selected, or that entry14

order reflects variable importanceordering").(i.e.,so-called "variableStepwise methods are not usually useful for eitherpurpose.Horrendously Wrona Degrees of FreedomProblemDegrees of freedom in statistical analyses reflect the numberof unique pieces of information present for a given researchsituation.These degrees of freedom constrain the number ofinquiries we may direct at our data, and are the currency we spendin analysis.Regrettably, commonly used statistical packages incorrectlycompute the degrees of freedom in stepwise analyses.incorrectdegreesoffreedompracticeinThe use ofoftenhasdireconsequences as regards the accuracy of our inferences.Table 1 presents an illustration.Presume that we have datafrom 101 subjects on a dependent variable ("Y") and 50 predictorvariables.fiveAfter five steps of stepwise regression analysis. theenteredvariabilitypredictorin the Yvariablesscoresmay(i.e.,"explain"20/100 20%20% ofR2),theasillustrated in Table 1.INSERT TABLE 1 ABOUT HERE.Computer packages compute the degrees of freedom correctly, asn-1.However, the degrees of freedom "explained"(also variouslycalled "model", "regression", "between", etc.) is computed as thenumber of "entered" predictor variables (i.e., 2y).2The degrees of

freedom "unexplained" (also variously called "error", "residual","within", etc.)is then computed asThese calculationsyield a statistically significant (a .05) result in the Table 1illustration.However, various researchers (cf. Snyder, 1991) have correctlynoted that these degrees of freedom calculations for the explainedand unexplained variance partitions are simply wrong.enteredpredictorhadvariablesbeenrandomlyIf the fiveselected,anexplained degrees of freedom of 5 might be arguably correct.But our five predictors were selected by,at each step,looking at the results for all the predictor variables not yetentered!Viewed differently,at each step all 50 predictorsvariables were entered, though we may have constrained the b and 0weights for most of the predictors to be 0 at each step (Cliff,1987, p.187).Thus, the computer packages are erroneously notcharging us any degrees of freedom for consulting our data in thismanner.This statistical welfare system may cause us to radicallyoverestimate the atypicality ofartifactually small n,.u-a-ALCULATEDour results,i.e.,create anTable 1 dramatically il)ustrates howthe use of the incorrect degrees of freedom can(a)radicallyinflate MScaLALNED, (b) radically dsflateMSUNExpLAINED, and consequently(c) very radically inflateCALCULATEDLCULATEDe g4.75 versus 0.25).Nowonder Cliff (1987, p. 185) noted that "most computer programs for[stepwise] multiple regression are positively satanic in theirtemptations toward Type I errors."313

CaveatsOf course, it is important in evaluating statistical should/would" error (Hudson, 1969; Hume, 1957).oraAs Strike (1979)explains,To deduce a proposition with an "ought" in it frompremises containing only "is" assertions is to getsomething in the conclusion not contained in thepremises, something impossible in a valid deductiveargument. (p. 13)The fact that most researchers "are" using the wrong degrees offreedom in their stepwise analyses does not mean that we therefore"should" abandon these methods.Instead, logically we ought simplyto use the correct degrees of freedom.We need not even somehow persuade the software companies tofix their computer programs; we need only use the printed sums-of-squares instead with the correct degrees of freedom we deriveourselves to then recalculate the remaining statistical tests.Doing so merely requires a willingness to believe that computerprograms are not infallible, because computer programs were writtenby fallible people and not by higher beings.It is important to note that all stepwise applications are notequally evil as regards the inflation of Type Iexample,thestepwise results afterone steperror.forFora probleminvolving only two predictors might not be so seriously distorted.Some readers may protest that no one would ever invoke stepwise4

methods with a small number of predictor variables. However, acolleague only a few days ago described a manuscript for which hewas serving asand in that study submitted to aa referee,prominent national journal the authors conducted several dozenstepwise methods for problems each involving only three predictorvariables!The seriousness of problems with wrong degrees of freedombeing used,as with most statisticalsituationally conditional.(andlife)issues,isStepwise methods will be somewhat lessevil, for example, when (a) the sample size is very large, (b) thenumber of predictor variables is small, and/or(c)the sum ofsquares explained remains near zero across steps.Does Not Identify the Best Predictor Set of Size "q"ProblemUnfortunately,many researcherserroneously believe thatconducting two or five steps of analysis will identify the bestpredictor set of size two or five. This simply is not what stepwisemethods typically do.Ignoring for present purposes the variable deletion aspect ofa true stepwise analysis, at step number five forward stepwisemethods address the question, "Given the four predictors alreadyentered, which one additional predictor will most improve theanalysis?".thefirstconditionalThus, the question is conditioned on the presence lysituation-specificof.hespecificvariables already entered and (b) only those variables used in the58

particular study but not yet entered.If the first variable entered was different, so the variableentered in the remaining steps might differ. Furthermore,even ifthe first four entered variables remained constant, deleting oradding predictors from the study certainly might also yield adifferent answer to the context-specific stepwise question.Butif we wish to determine the best set of predictorvariables of size q, the question, "what is the best set of q 5predictors?", does not ask a conditional question invoking a linearsequence of variable entry.Of course, if we desire this secondquestion to be answered, it is not reasonable to invoke the answerto a question one is not posing!Thus, the five predictors entered in five steps of forwardentry will not typically answer the question as to what are thebest q 5 predictors, and it is even conceivable that none of thefive variables selected by stepwise will be included in the bestsubset of five predictors.Figure 1 presents the Venn diagram of a heuristic example tomake this dynamic concrete. Since Venn diagrams are -dimensionalonlyfigurativerelationships among three orphenomena,portrayalsmore variablesofthey must besimultaneous(Craeger,1969).However, bivariate relationships can be literally presented in thismanner.INSERT FIGURE 1 ABOUT HERE.The example involvesadependent variable,69and four

predictor variables.Table 2 presents sums-of-squares variancepartitions associated with Figure 1, e.g., XI explains 100 of ualdifferences (i.e., variability) in the Y scores. Table 3 translatesthe sums of squares into correlation coefficients.INSERT TABLES 2 AND 3 ABOUT HERE.Table 4 presents the regression analyses for the data. If astepwise analysis was conducted, predictor X1 would be enteredfirst, because this variable has the largest squared bivariatecorrelation (r2 25%) with Y.In the second step, predictor X2would be entered, and the resulting R2 would be 45.00%.INSERT TABLE 4 ABOUT HERE.However, if an all-possible-subsets analysis is conducted withthe same data, the best predictor set of size q 2 is determined tobe predictors X2 and X4, with an R2 of 47.5%.The best predictor setof size q 2 does not include either of the two predictors enteredin the two steps of the stepwise analysis!CaveatsAgain,few behaviors either in life or in statistics arealways wrong.Some behaviors are only usually wrong, and we haveto think about whether special exceptions have arisen.This iswhat makes teaching methodology so difficult--we must teach ourstudents to think rather than only to memorize universal principlesof lock-step rote behaviors.First, our two questions ("which one additional predictor.?"7

and "what is the best set.?") are logically equivalent when weare investigating the subset, q 1. Stepwise analysis does correctlyidentify the best single predictor.Second, the two types of analyses do yield the same answerswhenever the predictors are perfectly uncorrelated. Thisoccurswhen we use orthogonally-rotated principal components scores in ananalysis, for example.Of course, 30 steps of stepwise with suchpredictors tells us nothing we don't already know, if we alreadyknow the 30 correlation coefficient involving Y and each of the 30uncorrelated component scores.Tendency to Yield Non-replicable ResultsProblemStepwise methods tend to yield conclusions that will notreplicate in future research. This is because stepwise methods tendto capitalize outrageously on sampling error.Sampling error isvariability in sample data that is unique to the given sample, andtherefore cannot be reproduced in subsequent samples.Snyder(1991) presents an excellent heuristic example of these dynamics.At a given step, the determination of which single variable toenter will enter variable X1 over variables X2, X3, and X4, even ifX1 is only infinitesimally superior to the other three variables.Itis entirely possible that this infinitesimaladvantage ofvariable X1 over another variable is sampling error, given that thecompetitive advantage of X1 is so small.Stepwise analysis is a linear series of conditional decisions,not unlike the choices one makes in working through a maze.84JL.1&An

early mistake in the sequence will corrupt the remaining choices.IfX1is incorrectly entered first in the analysisinfinitesimaladvantage representing onlyasmalldue to anamountofsampling error, all remaining conditional entry decisions may alsotherefore be incorrect.Since small differences may reflect sampling error, but thesesmall differences can greatly effect the sample results, stepwisesample results often do not generalize.120-121)suggested that,Thus, Cliff (1987, pp."a large proportion of the publishedresults using this method probably present conclusions that are notsupported by the data."CaveatsObviously, less sampling error tends to be present in datasets involving (a) larger samples,(b) fewer predictor variables,and (c) larger effect sizes, as reflected in the factors involvedin most statistical corrections for positive bias in fectThus,sizesuseof(Snyderstepwisecircumstances might be somewhat less sinful.&Lawson,methodsin1933;theseAnd again, if thepredictor variables are uncorrelated, the analysis is not distortedby the sampling error in the relationships among the predictors.SummaryStepwise methods do not do what most researchers believe themethods do.Stepwise methods are especially problematic whenstatistical significance tests are invoked to determine stoppingpositions, because the methods have all the problems associated9I4'

with conventional statistical significance applications (Carver,1978; Cohen, 1994; Thompson, 1993, 1994a, 1994b, 1994c), in spades.As a general proposition, there are readily available softwareprograms to assist with appropriate variable selection efforts byconducting almost instantly-available and painless all-possiblesubsets analyses.Thus, stepwise analyses should be eschewed infavor of programs such as those offered by McCabe (1975) ,the Morrisprogram distributed within Huberty's (1994) book, or SAS originsofexplained variance, i.e., variable ordering, a useful alternativeis simply to consult standardized weights (called different namesacross analyses to confuse graduate students, e.g., beta weights,factor pattern coefficients, standardized discriminant functioncoefficients)1985).and structure izes a variety of other helpfulvariable ordering strategies for the discriminant analysis case.1013

ReferencesCarver, R.P.(1978). The case against statistical significancetesting. Harvard Educational Review, 48, 378-399.Cliff, N. (1987). Analyzing multivariate data. San Diego: HarcourtBrace Jovanovich.Cohen, J.(1968). Multiple regression as a general data-analyticsystem. Psychological Bulletin, 70, 426-443.Cohen,(1994).J.Theearthisround(2 .05).AmericanPsychologist, 49, 997-1003.Craeger, J. (1969). The interpretation of multiple regression viaoverlapping rings. American Educational Research Journal, hods--betteralternatives. In B. Thompson (Ed.), Advances in social sciencemethodology (Vol. 1, pp. 43-70). Greenwich, CT: JAI Press.Huberty, C. (1994). Applied discriminant analysis. New York: Wileyand Sons.Hudson, W.D. (1969). The is/ought Question. London: MacMillan.Hume, D.(1957). An inquiry concerning human understanding.NewYork: The Liberal Arts Press.Knapp, T.R.(1978). Canonical correlation analysis:parametricsignificancetestingsystem.A generalPsychologicalBulletin, 85, 410-416.McCabe,G.P.(1975).Computations for variable selection indiscriminant analysis. Technometrics, 17, 103-109.Snyder, P.(1991). Three reasons why stepwise regression methods1114

should not be used by researchers.Advances(1991).ineducationalIn B.Thompson (Ed.),research:Substantivefindings, methodological developments (Vol. 1, pp. 99-105).Greenwich, CT: JAI Press.Snyder, P., & Lawson, S. (1993). Evaluating results using correctedand uncorrected effect size estimates. Journal of ExperimentalEducation, 61, 334-349.Strike,K.A.(1979).An epistemologyofpracticalresearch.Educational Researcher, 8(1), 10-16.Thompson, B. (1989). Why won't stepwise methods die?. Measurementand Evaluation in Counseling and Development, 21(4), 146-148.Thompson, B. (1990). Finding a correction for the sampling error inmultivariate measures of relationship: A Monte Carlo study.Educational and Psychological Measurement, 50, 15-31.Thompson, B.(1991). A primer on the logic and use of canonicalcorrelation analysis. Measurement and Evaluation in Counselingand Development, 24(2), ticalsignificance testing, with comments from various journaleditors, Journal of Experimental Education, 61(4).Thompson,B.testing(1994a).(AnThe concept of statistical significanceERIC/AEClearinghouseMeasurement Update, 4(1),5-6.Digest#EDO-TM-94-1).(ERIC Document ReproductionService No. ED 366 654)Thompson,B.(1994b).Guidelines for authors.Psychological Measurement, 54, 837-847.12Educational and

Th leofEmpiricallyreplicability of sample results.Journalreplicationevaluatingintheof Personality,62(2), 157-176.Thompson, B., & Borrello, G.M. (1985). The importance of gical Measurement, 45, 203-209.Educationaland

Table 1Hypothetical Five-Step Regression Modelvlith 101 Subjects and 50 Predictor 01.60000.25**4e-20.00%R2*Since Fcritical at infinite and infinite degrees of freedom equals1, an Fcalculated less than 1 can not be statistically significant.step.wkl 3/22/951417

Table 2Variance Partitions of the PredictiveAbilities of the Four Predictor VariablesSingle PartitionsPartitions in CombinationsPartitionSOSPredictor PartitionsTotalABCDEFGH205027X1X2321X3493066X4 E21B50A20D3 F 49C27B50G30 G30D 1003 80E21 91 99H66Table 3Pairwise r ValuesVariablePair, X2XI,X3XI,X4CommonSOSX2 , X3X2, X4X2,80X3 , X4X3 ,YX4 ,Note. r2 Common SOS / 66.4472.0000.4770.4975For example, r2x1,e 100/400 .2500,while rxix the square root of r2x" the square root of .2500 .5000.15

Table 4Calculation of O's and R2's for theSix Pairwise Combinations of the Four e.13 (r1 - (r2 * rxx))predictor pair X1 and X3, 01 (.5000 - (.4770 * .2739))(.5000 .3694R2 0(rl) 0(r2).1306)//0(r1) /3 (r2) .2500 .2000 .4500.1997 .1753 .3750.1808 .1778 .3586.1022 .1534 .2556.1821 .2300 .4121.2275 .2475 .4750(1- rxx2).(1.27392)For example,for(1 - .0750).9250 .3993For example, for predictor pair X1 and X3, R2 /(.3993 * .5000) (.3676 * .4770).1997 .1753 step6666.wkl 3/24/9516.3750

graph666.wp1Figure Caption.Figure 1Venn Diagram of Relationships Among Five VariablesI

CDrr611IsIDrt.rrCDrfIDat"aOCD0011m 0(-t-0oL.3caIIGIOCDin0P-1110octCAtref.OII C)1'21CDr 13/aO4PI00 tit0 CA0-11-4toaIDcoPIas0Q1-1-."coPiI,rs0 0,%)10aPitcoga,ILI1'4 MJiPt?4COA14 00to oMt016.in,a co Hm 0'Platis04g0 ig0f.3fDII PitomPICDtIooo0 rtettoD:01CDea, Pt1-3rrx0IVI-3

DOCUMENT RESUME. ED 382 635. TM 023 064. AUTHOR Thompson, Bruce TITLE Stepwise Regression and Stepwise Discriminant. Analysis Need Not Apply. PUB DATE 20 Apr 95 NOTE 22p.; Paper presented at the Annual Meeting of the. American Educational Research Association (San Francisco, CA, April 18

Related Documents:

Depth 700 mm 700 mm 700 mm 700 mm 700 mm 700 mm 700 mm 700 mm 700 mm Width 635 mm 635 mm 635 mm 635 mm 680 mm 635 mm 680 mm 635 mm 680 mm CAPACITY GROSS VOLUME IN LITRES (AS 1430) Refrigerator PC 232 litres 250 litres 283 litres 280 litres 271 litres 314 litres 314 litres 229 litres 342 litres

635.0484 Organic gardening (built with 635.04 plus 8 from add table under 633-635 plus 4 from 631.584 Organic farming, as instructed under 81-87 Special cultivation methods in add table under 633-635) Rule of zero example (3) Step by step organic

DOCUMENT RESUME ED 130 635 IR 004 167 AUTHOR Cooney, Joan Ganz TITLE, The Electric Company; Television and Reading,. s, 1971-1980: A Mid-Experiment Appraisal. INSTITUTIONt, Children's Television Workshop, New York, N.Y. PUB DATE Sep 76

DOCUMENT RESUME. ED 403 382 CE 066 017. . Session 4 Customer Service/Service Quality - keys to. superior service quality. ACE PARKING. . Attendant Supervisor Time Signature Print your name. Tin out and attach Mocidmat sem Se

DOCUMENT RESUME ED 382 315 PS 022 994 AUTHOR Isenberg, Bob TITLE WCPSS Parent Survey. INSTITUTION Wake County Public Schools System, Raleigh, NC. Dept. of Evaluation and :research. PUB DATE 3 Feb 92 NOTE. 129p. PUB TYPE. Reports. Research/Technical (143) Tests /Evaluation Instruments (160) EDRS PRICE MF01/PC06 Plus Postage.

frisco 25 min mckinney 30 min ft worth 37 min connected to greater dfw 35e 35e 35e 121 161 161 114 183 183 190 75 75 75 635 635 635 30 30 30 irving coppell lewisville euless hurst arlington grapevine the colony flower mound carrollton addison plano allen frisco dallas love field dfw international airport pla

FRISCO 25 MIN MCKINNEY 30 MIN FT WORTH 37 MIN CONNECTED TO GREATER DFW 35E 35E 35E 121 161 161 114 183 183 190 75 75 75 635 635 635 30 30 30 . Set on the serene 362-acre North Lake, The Sound features 30,000 SF of restaurants and retail, a lakefront amphitheatre, events lawn, indoor .

SETTING UP AUTOCAD TO WORK WITH ARCHITECTURAL DRAFTING STYLE You will need to make some changes to AutoCAD to use it as a drafting tool for architectural drawings. AutoCAD “out-of-the-box” is set up primarily for mechanical drafting: drawing small parts for machinery using the metric system because that is the type of drafting is the most practiced in the world. However, making drawings of .