To Explain Or To Predict?

3y ago

32 Views

2 Downloads

293.35 KB

22 Pages

Last View : 16d ago

Last Download : 3m ago

Upload by : Elise Ammons

Report this link

Download PDF

Transcription

Statistical Science2010, Vol. 25, No. 3, 289–310DOI: 10.1214/10-STS330 Institute of Mathematical Statistics, 2010To Explain or to Predict?Galit ShmueliAbstract. Statistical modeling is a powerful tool for developing and testingtheories by way of causal explanation, prediction, and description. In manydisciplines there is near-exclusive use of statistical modeling for causal explanation and the assumption that models with high explanatory power areinherently of high predictive power. Conflation between explanation and prediction is common, yet the distinction must be understood for progressingscientific knowledge. While this distinction has been recognized in the philosophy of science, the statistical literature lacks a thorough discussion of themany differences that arise in the process of modeling for an explanatory versus a predictive goal. The purpose of this article is to clarify the distinctionbetween explanatory and predictive modeling, to discuss its sources, and toreveal the practical implications of the distinction to each step in the modeling process.Key words and phrases: Explanatory modeling, causality, predictive modeling, predictive power, statistical strategy, data mining, scientific research.focus on the use of statistical modeling for causal explanation and for prediction. My main premise is thatthe two are often conflated, yet the causal versus predictive distinction has a large impact on each step of thestatistical modeling process and on its consequences.Although not explicitly stated in the statistics methodology literature, applied statisticians instinctively sensethat predicting and explaining are different. This articleaims to fill a critical void: to tackle the distinction between explanatory modeling and predictive modeling.Clearing the current ambiguity between the two iscritical not only for proper statistical modeling, butmore importantly, for proper scientific usage. Both explanation and prediction are necessary for generatingand testing theories, yet each plays a different role indoing so. The lack of a clear distinction within statisticshas created a lack of understanding in many disciplinesof the difference between building sound explanatorymodels versus creating powerful predictive models, aswell as confusing explanatory power with predictivepower. The implications of this omission and the lackof clear guidelines on how to model for explanatoryversus predictive goals are considerable for both scientific research and practice and have also contributed tothe gap between academia and practice.I start by defining what I term explaining and predicting. These definitions are chosen to reflect the dis-1. INTRODUCTIONLooking at how statistical models are used in different scientific disciplines for the purpose of theorybuilding and testing, one finds a range of perceptionsregarding the relationship between causal explanationand empirical prediction. In many scientific fields suchas economics, psychology, education, and environmental science, statistical models are used almost exclusively for causal explanation, and models that possesshigh explanatory power are often assumed to inherently possess predictive power. In fields such as naturallanguage processing and bioinformatics, the focus is onempirical prediction with only a slight and indirect relation to causal explanation. And yet in other researchfields, such as epidemiology, the emphasis on causalexplanation versus empirical prediction is more mixed.Statistical modeling for description, where the purposeis to capture the data structure parsimoniously, andwhich is the most commonly developed within the fieldof statistics, is not commonly used for theory buildingand testing in other disciplines. Hence, in this article IGalit Shmueli is Associate Professor of Statistics,Department of Decision, Operations and InformationTechnologies, Robert H. Smith School of Business,University of Maryland, College Park, Maryland 20742,USA (e-mail: gshmueli@umd.edu).289

290G. SHMUELItinct scientific goals that they are aimed at: causal explanation and empirical prediction, respectively. Explanatory modeling and predictive modeling reflect theprocess of using data and statistical (or data mining)methods for explaining or predicting, respectively. Theterm modeling is intentionally chosen over models tohighlight the entire process involved, from goal definition, study design, and data collection to scientific use.1.1 Explanatory ModelingIn many scientific fields, and especially the socialsciences, statistical methods are used nearly exclusively for testing causal theory. Given a causal theoretical model, statistical models are applied to data inorder to test causal hypotheses. In such models, a setof underlying factors that are measured by variables Xare assumed to cause an underlying effect, measuredby variable Y . Based on collaborative work with socialscientists and economists, on an examination of someof their literature, and on conversations with a diversegroup of researchers, I conjecture that, whether statisticians like it or not, the type of statistical models usedfor testing causal hypotheses in the social sciences arealmost always association-based models applied to observational data. Regression models are the most common example. The justification for this practice is thatthe theory itself provides the causality. In other words,the role of the theory is very strong and the relianceon data and statistical modeling are strictly through thelens of the theoretical model. The theory–data relationship varies in different fields. While the social sciencesare very theory-heavy, in areas such as bioinformatics and natural language processing the emphasis ona causal theory is much weaker. Hence, given this reality, I define explaining as causal explanation and explanatory modeling as the use of statistical models fortesting causal explanations.F IG . 1.To illustrate how explanatory modeling is typicallydone, I describe the structure of a typical article in ahighly regarded journal in the field of Information Systems (IS). Researchers in the field of IS usually havetraining in economics and/or the behavioral sciences.The structure of articles reflects the way empirical research is conducted in IS and related fields.The example used is an article by Gefen, Karahannaand Straub (2003), which studies technology acceptance. The article starts with a presentation of the prevailing relevant theory(ies):Online purchase intensions should be explained in part by the technology acceptance model (TAM). This theoretical modelis at present a preeminent theory of technology acceptance in IS.The authors then proceed to state multiple causal hypotheses (denoted H1 , H2 , . . . in Figure 1, right panel),justifying the merits for each hypothesis and grounding it in theory. The research hypotheses are given interms of theoretical constructs rather than measurablevariables. Unlike measurable variables, constructs areabstractions that “describe a phenomenon of theoretical interest” (Edwards and Bagozzi, 2000) and can beobservable or unobservable. Examples of constructs inthis article are trust, perceived usefulness (PU), andperceived ease of use (PEOU). Examples of constructsused in other fields include anger, poverty, well-being,and odor. The hypotheses section will often include acausal diagram illustrating the hypothesized causal relationship between the constructs (see Figure 1, leftpanel). The next step is construct operationalization,where a bridge is built between theoretical constructsand observable measurements, using previous literature and theoretical justification. Only after the theoretical component is completed, and measurements arejustified and defined, do researchers proceed to the nextCausal diagram (left) and partial list of stated hypotheses (right) from Gefen, Karahanna and Straub (2003).

TO EXPLAIN OR TO PREDICT?step where data and statistical modeling are introducedalongside the statistical hypotheses, which are operationalized from the research hypotheses. Statistical inference will lead to “statistical conclusions” in terms ofeffect sizes and statistical significance in relation to thecausal hypotheses. Finally, the statistical conclusionsare converted into research conclusions, often accompanied by policy recommendations.In summary, explanatory modeling refers here tothe application of statistical models to data for testing causal hypotheses about theoretical constructs.Whereas “proper” statistical methodology for testingcausality exists, such as designed experiments or specialized causal inference methods for observationaldata [e.g., causal diagrams (Pearl, 1995), discoveryalgorithms (Spirtes, Glymour and Scheines, 2000),probability trees (Shafer, 1996), and propensity scores(Rosenbaum and Rubin, 1983; Rubin, 1997)], in practice association-based statistical models, applied to observational data, are most commonly used for that purpose.1.2 Predictive ModelingI define predictive modeling as the process of applying a statistical model or data mining algorithm to datafor the purpose of predicting new or future observations. In particular, I focus on nonstochastic prediction(Geisser, 1993, page 31), where the goal is to predictthe output value (Y ) for new observations given theirinput values (X). This definition also includes temporalforecasting, where observations until time t (the input)are used to forecast future values at time t k, k 0(the output). Predictions include point or interval predictions, prediction regions, predictive distributions, orrankings of new observations. Predictive model is anymethod that produces predictions, regardless of its underlying approach: Bayesian or frequentist, parametricor nonparametric, data mining algorithm or statisticalmodel, etc.1.3 Descriptive ModelingAlthough not the focus of this article, a third type ofmodeling, which is the most commonly used and developed by statisticians, is descriptive modeling. Thistype of modeling is aimed at summarizing or representing the data structure in a compact manner. Unlike explanatory modeling, in descriptive modeling thereliance on an underlying causal theory is absent or incorporated in a less formal way. Also, the focus is at themeasurable level rather than at the construct level. Unlike predictive modeling, descriptive modeling is not291aimed at prediction. Fitting a regression model can bedescriptive if it is used for capturing the association between the dependent and independent variables ratherthan for causal inference or for prediction. We mentionthis type of modeling to avoid confusion with causalexplanatory and predictive modeling, and also to highlight the different approaches of statisticians and nonstatisticians.1.4 The Scientific Value of Predictive ModelingAlthough explanatory modeling is commonly usedfor theory building and testing, predictive modeling isnearly absent in many scientific fields as a tool for developing theory. One possible reason is the statisticaltraining of nonstatistician researchers. A look at manyintroductory statistics textbooks reveals very little inthe way of prediction. Another reason is that predictionis often considered unscientific. Berk (2008) wrote, “Inthe social sciences, for example, one either did causalmodeling econometric style or largely gave up quantitative work.” From conversations with colleagues invarious disciplines it appears that predictive modelingis often valued for its applied utility, yet is discarded forscientific purposes such as theory building or testing.Shmueli and Koppius (2010) illustrated the lack of predictive modeling in the field of IS. Searching the 1072papers published in the two top-rated journals Information Systems Research and MIS Quarterly between1990 and 2006, they found only 52 empirical paperswith predictive claims, of which only seven carried outproper predictive modeling or testing.Even among academic statisticians, there appears tobe a divide between those who value prediction as themain purpose of statistical modeling and those who seeit as unacademic. Examples of statisticians who emphasize predictive methodology include Akaike (“Thepredictive point of view is a prototypical point of viewto explain the basic activity of statistical analysis” inFindley and Parzen, 1998), Deming (“The only useful function of a statistician is to make predictions”in Wallis, 1980), Geisser (“The prediction of observables or potential observables is of much greater relevance than the estimate of what are often artificialconstructs-parameters,” Geisser, 1975), Aitchison andDunsmore (“prediction analysis. . . is surely at the heartof many statistical applications,” Aitchison and Dunsmore, 1975) and Friedman (“One of the most common and important uses for data is prediction,” Friedman, 1997). Examples of those who see it as unacademic are Kendall and Stuart (“The Science of Statisticsdeals with the properties of populations. In considering

292G. SHMUELIa population of men we are not interested, statisticallyspeaking, in whether some particular individual hasbrown eyes or is a forger, but rather in how many of theindividuals have brown eyes or are forgers,” Kendalland Stuart, 1977) and more recently Parzen (“The twogoals in analyzing data. . . I prefer to describe as “management” and “science.” Management seeks profit. . .Science seeks truth,” Parzen, 2001). In economics thereis a similar disagreement regarding “whether prediction per se is a legitimate objective of economic science, and also whether observed data should be usedonly to shed light on existing theories or also for thepurpose of hypothesis seeking in order to develop newtheories” (Feelders, 2002).Before proceeding with the discrimination betweenexplanatory and predictive modeling, it is important toestablish prediction as a necessary scientific endeavorbeyond utility, for the purpose of developing and testing theories. Predictive modeling and predictive testingserve several necessary scientific functions:1. Newly available large and rich datasets often contain complex relationships and patterns that are hardto hypothesize, especially given theories that exclude newly measurable concepts. Using predictive modeling in such contexts can help uncoverpotential new causal mechanisms and lead to thegeneration of new hypotheses. See, for example,the discussion between Gurbaxani and Mendelson(1990, 1994) and Collopy, Adya and Armstrong(1994).2. The development of new theory often goes hand inhand with the development of new measures (VanMaanen, Sorensen and Mitchell, 2007). Predictivemodeling can be used to discover new measures aswell as to compare different operationalizations ofconstructs and different measurement instruments.3. By capturing underlying complex patterns and relationships, predictive modeling can suggest improvements to existing explanatory models.4. Scientific development requires empirically rigorous and relevant research. Predictive modeling enables assessing the distance between theory andpractice, thereby serving as a “reality check” tothe relevance of theories.1 While explanatory powerprovides information about the strength of an underlying causal relationship, it does not imply its predictive power.1 Predictive models are advantageous in terms of negative em-piricism: a model either predicts accurately or it does not, and thiscan be observed. In contrast, explanatory models can never be confirmed and are harder to contradict.5. Predictive power assessment offers a straightforward way to compare competing theories by examining the predictive power of their respective explanatory models.6. Predictive modeling plays an important role inquantifying the level of predictability of measurablephenomena by creating benchmarks of predictiveaccuracy (Ehrenberg and Bound, 1993). Knowledgeof un-predictability is a fundamental component ofscientific knowledge (see, e.g., Taleb, 2007). Because predictive models tend to have higher predictive accuracy than explanatory statistical models,they can give an indication of the potential levelof predictability. A very low predictability levelcan lead to the development of new measures, newcollected data, and new empirical approaches. Anexplanatory model that is close to the predictivebenchmark may suggest that our understanding ofthat phenomenon can only be increased marginally.On the other hand, an explanatory model that is veryfar from the predictive benchmark would imply thatthere are substantial practical and theoretical gainsto be had from further scientific development.For a related, more detailed discussion of the valueof prediction to scientific theory development see thework of Shmueli and Koppius (2010).1.5 Explaining and Predicting Are DifferentIn the philosophy of science, it has long been debated whether explaining and predicting are one ordistinct. The conflation of explanation and prediction has its roots in philosophy of science literature, particularly the influential hypothetico-deductivemodel (Hempel and Oppenheim, 1948), which explicitly equated prediction and explanation. However, aslater became clear, the type of uncertainty associatedwith explanation is of a different nature than that associated with prediction (Helmer and Rescher, 1959).This difference highlighted the need for developingmodels geared specifically toward dealing with predicting future events and trends such as the Delphimethod (Dalkey and Helmer, 1963). The distinctionbetween the two concepts has been further elaborated(Forster and Sober, 1994; Forster, 2002; Sober, 2002;Hitchcock and Sober, 2004; Dowe, Gardner and Oppy,2007). In his book Theory Building, Dubin (1969,page 9) wrote:Theories of social and human behavior address themselves to two distinct goals of

293TO EXPLAIN OR TO PREDICT?science: (1) prediction and (2) understanding. It will be argued that these are separategoals [. . . ] I will not, however, conclude thatthey are either inconsistent or incompatible.Herbert Simon distinguished between “basic science”and “applied science” (Simon, 2001), a distinction similar to explaining versus predicting. According to Simon, basic science is aimed at knowing (“to describethe world”) and understanding (“to provide explanations of these phenomena”). In contrast, in applied science, “Laws connecting sets of variables allow inferences or predictions to be made from known values ofsome of the variables to unknown values of other variables.”Why should there be a difference between explainingand predicting? The answer lies in the fact that measurable data are not accurate representations of their underlying constructs. The operationalization of theoriesand constructs into statistical models and measurabledata creates a disparity between the ability to explainphenomena at the conceptual level and the ability togenerate predictions at the measurable level.To convey this disparity more formally, consider atheory postulating that construct X causes constructY , via the function F , such that Y F (X ). F is often represented by a path model, a set of qualitativestatements, a plot (e.g., a supply and demand plot), ormathematical formulas. Measurable variables X and Yare operationalizations of X and Y , respectively. Theoperationalization of F into a statistical model f , suchas E(Y ) f (X), is done by considering F in light ofthe study design (e.g., numerical or categorical Y ; hierarchical or flat design; time series or cross-sectional;complete or censored data) and practical considerations such as standards in the discipline. Because Fis usually not sufficiently detailed to lead to a single f ,often a set of f models is considered. Feelders (2002)described this process in the field of economics. In thepredictive context, we consider only X, Y and f .The disparity arises because the goal in explanatorymodeling is to match f and F as closely as possiblefor the statistical inference to apply to the

this type of modeling to avoid confusion with causal-explanatory and predictive modeling, and also to high-light the different approaches of statisticians and non-statisticians. 1.4 The Scientiﬁc Value of Predictive Modeling Although explanatory modeling is commonly used for theory building and testing, predictive modeling is

Related Documents:

Methods to Predict Structural Response due to Random Sound Pressure Fields

the task is then to predict structural response. The challenge is then to properly define the sound pressure fields in a model accurately enough and this should be done with correlation models. With the task to predict structural responses the applications could be used to: Predict environmental requirements Predict acoustic fatigue

7 Views

1y ago

Unit 1 : AngularJS Core Concepts

4 Explain the AngularJS Controllers 5) Explain the Controller Methods 6) Explain the Controllers In External Files 7) Explain the AngularJS Scope 8) Explain with example of ng-controller directive 9) Explain with example of ng-if directive 10) Explain with example of ngShow and ngHide . DNYANSAGAR ARTS AND COMMERCE COLLEGE, BALEWADI,PUNE-45 .

8 Views

8m ago

Students’ Critical Thinking Improvement through PDEODE and ...

Observe-Discuss-Explain (PDEODE). The Predict-Discuss-Explain-Observe -Discuss-Explain (PDEODE) strategy was first introduced by Savander-Ranne and Kolari in 2003, and was a modification of the Predict-Observe-Explain (POE) strategy [12]. The modification of PDEODE consists of additional learning activities

19 Views

3y ago

Craft Council of Newfoundland and Labrador - Webflow

work/products (Beading, Candles, Carving, Food Products, Soap, Weaving, etc.) ⃝I understand that if my work contains Indigenous visual representation that it is a reflection of the Indigenous culture of my native region. ⃝To the best of my knowledge, my work/products fall within Craft Council standards and expectations with respect to

309 Views

2y ago

Most IMP Questions of COA UNIT : 1 UNIT : 2 UNIT : 3 UNIT ...

3) Explain any four addressing mode. 4) Explain characteristics of RISC and CISC. 5) (3*4) (5*6) convert into RPN and show stack operations. UNIT : 4 1) Explain RAM, ROM, EPROM and EEPROM. 2) Explain Main Memory. 3) Explain Virtual Memory. 4) Explain cache memory with any one mapping t

82 Views

2y ago

MILLING MACHINES AND MILLING OPERATIONS

Describe the major components of milling machines. Describe and explain the use of workholding devices. Describe and explain the use of milling machine attachments. Explain indexing. Explain the selection and use of milling cutters. Explain milling machine setup and operation. Explain the use of feeds, speeds. and coolants in milling operations.

42 Views

2y ago

[4184]-101 - unipune.ac.in

Q.2)Explain contribution of Henry Fayol to the Field of Management. Q.3)Define Communication. Explain process of Communication in detail. Q.4)Define Personality. Explain building blocks of Personality. Explain any one Theory of Personality. Q.5)Define Managerial Decision-making. Explain its nature and purpose. Explain various Styles of Decision .

8 Views

1y ago

Chapter 14 Correlation and Regression

Chapter 14 Learning Outcomes (continued) 6 Explain/compute Spearman correlation coefficient (ranks) Explain/compute point-biserial correlation coefficient (one 7 dichotomous variable) 8 Explain/compute phi-coefficient for two dichotomous variables 9 Explain/compute linear regression equation to predict Y values 10 Eval

87 Views

2y ago

Recent Views

Grammar as a Foreign Language - List of Proceedings

Grammar as a Foreign Language Oriol Vinyals Google vinyals@google.com Lukasz Kaiser Google lukaszkaiser@google.com Terry Koo Google terrykoo@google.com Slav Petrov Google slav@google.com Ilya Sutskever Google ilyasu@google.com Geoffrey Hinton Google geoffhinton@google.com Abstract Synta

2y ago

445 Views

Attention is All you Need - NIPS

Google Brain avaswani@google.com Noam Shazeer Google Brain noam@google.com Niki Parmar Google Research nikip@google.com Jakob Uszkoreit Google Research usz@google.com Llion Jones Google Research llion@google.com Aidan N. Gomezy University of Toronto aidan@cs.toronto.edu Łukasz Kaiser Google Brain lukaszkaiser@google.com Illia Polosukhinz illia .

1y ago

303 Views

GSA Implementation of Google (G) Suite

Google Meet Classic Hangouts Google Chat Google Calendar Google Drive and Shared Drive Google Docs Google Sheets Google Slides Google Forms Google Sites Google Keep Apps Script D

2y ago

316 Views

Google Drive (Google Docs, Google Sheets, Google Slides)

Google Drive (Google Docs, Google Sheets, Google Slides) Employees are automatically issued a Kyrene Google account. Navigate to drive.google.com. Use Kyrene email address and network password to login. Launch in Chrome browser for best experience. Google Drive is a cloud storage sys

2y ago

388 Views

Quick Guide of Using Google Home to Control Smart Devices

Configuration needs Google Home app. Search "Google Home" in App Store or Google Play to install the app. 3.1 Set up Google Home with Google Home app You can skip this part if your Google Home is already set up. 1. Make sure your Google Home is energized. 2. Open the Google Home app by tapping the app icon on your mobile device. 3.

1y ago

326 Views

Elaboração de Provas Online usando o Formulário Google Docs

2 Após o login acesse o Google Drive ou o Google Docs e selecione a ferramenta Google Forms (Formulários). Clique na caixa de Ferramentas do Google, localizada no canto direito superior da tela e selecione o Google Drive. Na tela do Google Drive clique em New , opção More e selecione Google Forms. OBS: É possível acessar o google

11m ago

123 Views

ACS WASC Templates

File upload, Folder upload, Google Docs, Google Sheets, or Google Slides. You can also create Google Forms, Google Drawings, Google My Maps, etc. Share with exactly who you want — without email attachments. Search or sort your list of files, folders, and Google Docs. Preview files and Google Docs.

2y ago

366 Views

Google Drive - San Bernardino City Unified School District

Google Apps All of the Google applications that are available upon logging into Google.com (G , Gmail, Gphotos, Gdrive, etc.). Google Suite Google’s online cloud based office companion applications (Docs, Sheets, Slides). Google Drive Google’s online cloud storage and file sharing/collaboration application.

2y ago

378 Views

Single Sign On for Google Apps with NetScaler Unified Gateway

Google Apps for Work is a suite of cloud computing productivity and collaboration applications provided by Google on a subscription basis. It includes Google’s popular web applications including Gmail, Google Drive, Google Hangouts, Google Calendar and Google

2y ago

295 Views

Serviceteil

Google 84, 87, 124 Google 110 Google AdWords 101, 103 Google Alerts 127 Google Analytics 89 Google Maps 100, 110, 173 Google-Maps 63 Google Places 100, 103, 124 Graphiken 66 H Haftung 170 Haftungsausschluss 72 Hausfarbe 11 Headline 35 Heilmittelwerbegesetz 14, 69, 163 Heilversprechen 164 HONcode 78 HTML 58 HWG 31 I Imagefilm 31

2y ago

336 Views

Best practices for managing identities when you move to Google Cloud

Google Cloud. To provide t he informat ion an organizat ion would ne e d to transfer data and ownership from one Google Account to anot her for s ome of t he noncore Google s er vice s, such as Google Ads, Google Analyt ics, or DV360. Intende d audience Organizat ion administrators. Sta planning Google Cloud / Google Wor kspace migrat ion. Key .

1y ago

481 Views

MANAGERIAL FINANCE - GBV

of Managerial Finance page 2 Introduction to Managerial Finance 1 Starbucks—A Taste for Growth page 3 1.1 Finance and Business What Is Finance? 4 Major Areas and Opportunities in Finance 4 Legal Forms of Business Organization 5 Why Study Managerial Finance? Review Questions 9 1.2 The Managerial Finance Function 9 Organization of the Finance

3y ago

6.8K Views

Chapter 1 The roles of finance function in organisations

The roles of the finance function in organisations 4. The role of ethics in the role of the finance function Ethics is the system of moral principles that examines the concept of right and wrong. Ethics underpins an organisation’s sustained value creation. The roles that the finance function performs should be carried out in an .File Size: 888KBPage Count: 10Explore furtherRole of the Finance Function in the Financial Management .www.managementstudyguide.c Roles and Responsibilities of a Finance Department in a .www.pharmapproach.comRoles and Responsibilities of a Finance Department .www.smythecpa.comTop 10 – Functions of Business Finance in an om23 Functions and Duties of Accounting and Finance nded to you b

2y ago

335 Views

Introduction - Google Earth User Guide

Google Earth Community: Learn from other Google Earth users by asking questions and sharing answers on the Google Earth Community forums. Using Google Earth: This blog describes how you can use some of the interesting features of Google Earth. Selecting a Server Note: This section is relevant to Google Earth Pro and EC users.

3y ago

288 Views

Using Google Forms to Manage Officials Signups

Google Sheets, deleting a response from the form or sheet will not affect the other. Once the Google Form is linked to a Google Sheet, clicking on the spreadsheet icon will open the linked Google Sheet. Google Responses Sheet Google automatically creates and populates the sp

2y ago

276 Views

To Explain Or To Predict?

It looks like you're using an ad-blocker