SOME ISSUES IN THE MEASUREMENT-STATISTICS

2y ago
17 Views
2 Downloads
474.95 KB
6 Pages
Last View : 1m ago
Last Download : 3m ago
Upload by : Jenson Heredia
Transcription

SOME ISSUES IN THEMEASUREMENT-STATISTICS CONTROVERSYJOHN GAITOYork UniversityABSTRACTSome problems generated by Stevens's pronouncement that measurement scales (nominal,ordinal, interval, ratio) determine specific statistical procedures are discussed. It appearsthat proponents of this view may think of the statistical analysis stage in research design asequivalent to the overall design process, or that the interpretation stage is included in thestatistical analysis stage. This aspect leads to the introduction of irrelevant empiricalconsiderations within conclusions emanating from a statistical analysis. Such pronouncements are faced with certain logical inconsistencies. For example, two or three procedureshaving different scales yield similar results. Within a statistical analysis there are differentcontexts or levels of number analysis of different scale nature; yet these differences are notconsidered in the Stevens approach. Furthermore, the Stevens admonitions can impedeprogress with theoretical and/or empirical problems. An example is provided in theintelligence measurement area to indicate how important developments have occurred whenthese admonitions were ignored.The controversy as to the independence or thenon-independence of measurement properties ina statistical analysis is one that generates muchintellectual stimulation, but wastes many journalpages. It began in the late 1940s and continued tothe early '60s. For over a decade there were anabundance of papers or books either accepting orrejecting the thesis of Stevens (1946) that thespecific measurement scale involved with data(nominal, ordinal, interval, ratio) determines thespecific operations of a statistical analysis. Forexample, Burke (1953), Lord (1953), Kaiser(1960), and Anderson (1961) were some of thosewho claimed independence of measurement andstatistical analysis operations. In 1980 Gaitosummarized the position of those who opposedthe Stevens thesis (hereafter referred to as theAM group — anti-measurement). On the otherhand, the major proponents of Stevens's argument (in extreme form) were Siegel (1956) andSenders (1958). Since that period, the issue hasbeen less prominent, but it surfaces periodically,especially in recent years in elementary psychological statistics books (e.g., Horvath, 1985;Pagano, 1981; Walker, 1985). Recently the proStevens approach (hereafter referred to as theThe author thanks Arnold Binder for reading an early draft ofthis paper and providing a number of useful suggestions.Requests for reprints should be addressed to Dr. JohnGaito, Department of Psychology, York University, Downsview, Ont. M3J IP3.PM group — pro-measurement) has been championed by Townsend and Ashby (1984). Theseauthors presented the usual arguments offered byPM personnel, that is, that statistical analysesinvolve measurement aspects. For example, foran ANOVA situation, PM individuals would require that the data conform to an interval scale, inaddition to the usual three assumptions relativeto random errors (normal distribution, independence, homogeneity of variance) as expressed inthe statement "The e's are NID (0, cr )." Furthermore, Townsend and Ashby maintain that for theAM group, "essentially 'anything goes' relativeto measurement stipulations" (p. 394).These statements are complemented by comments that 1 have heard within various settings tothe effect that the AM position takes meaning outof research aspects, implying that the AM groupare not interested in the relationship betweennumbers and the underlying referents.The purpose of this paper is to support theAM approach and to show that "anything doesnot go" — that there are specific requirementsfor measurement aspects within a researcheffort. That is, although measurement considerations do not determine the choice of statisticaltests, there are other ways in which measurementaspects are important in the overall experimentaldesign.Furthermore, the statement accusing the AMgroup of nonconcern with meaning shows a lackCanadian Psychology/Psychologie Canadienne, 1986, 27:163

64Canadian Psychology/Psychologie Canadienne, 1986, 27:1of understanding of what those in this group arestating. I suggest that one main differencebetween the AM and PM positions is that theterm "statistical analysis" has a different meaning and emphasis for the two groups. The AMgroup is defining or emphasizing statisticalanalysis as one of a number of stages withinexperimental design, but the PM group may beequating statistical analysis with the overall research effort, or confounding the interpretationphase with the statistical analysis operations.Implicit within this aspect is a second point ofdifference, that of statistical conclusions andempirical conclusions. Each of these will beconsidered. Then some serious logical inconsistencies and empirical shortcomings of the PMposition will be discussed.1Experimental Design vs. Statistical AnalysisExperimental design can be conceived of asinvolving four stages in an overall researcheffort. More stages might be suggested, but fourwill suffice for the purpose of this article. Thesefour stages are: planning and design of theexperiment; conduct of the experiment; statistical analysis of data; interpretation of results.It is possible that the PM group is concernedwith the overall experimental design stages (orthe last stage — interpretation) when they talk ofmeasurement properties as being important considerations in statistical analyses (the thirdstage). However, there are important distinctionsbetween the operations involved in each of theseparate stages, relevant to measurementaspects.The AM group would agree that measurementproperties are important in the overall experimental design, specifically in the planning/design stage and in the interpretation of results.No AM member would dispute that fact thatmeasurement considerations such as reliability,validity, relevancy, and (especially) meaningfulness of the dependent variables should enterinto consideration during the planning stage.Furthermore, these measurement aspects areimportant in making sense (i.e., providingmeaning) of the results in the interpretation1A detailed discussion of formal measurement theory is notthe intention of this paper. An excellent detailed discussion isprovided by Binder (1984). For more comprehensive formaltreatments, see Adams, Fagot, and Robinson (1965) andPfanzagI (1968).stage. It would be unrealistic not to accept thesemeasurement notions, for the research effortwould be meaningless.However, the AM group would not equatestatistical analysis with experimental design;their ideas refer only to the statistical analysisstage. This stage is merely one in the overallresearch effort in which mathematical operationshold sway and measurement scale considerationsare irrelevant. This is the domain to which themany papers of the AM group are directed. (See,for example, the excellent papers by Burke,1953, and Lord, 1953, which should have settledthe problem over three decades ago.)Let us look closely at a statistical analysis asviewed by the AM group. This stage is concernedwith analyses involving events such as determining medians or means, variances, correlations,etc. and with tests of null hypotheses (Ho). In thelatter case, the observed results are contrastedwith the values expected based on specific mathematical assumptions that are present in themathematical model for the procedure. Theinvestigator then decides whether to reject, ornot reject, H o on the basis of a specific probability level. With the decision to reject, or notreject, H o , the statistical conclusion is that thereis one, or more than one, population distributionfrom which the samples have been chosen.These are statistical conclusions that emanatefrom the mathematical operations involved in thespecific procedure. These conclusions are completely devoid of empirical aspects (i.e., thoseinherent in the experimental and theoreticalnature of the research effort), and characteristicsof data such as reliability, validity, meaningfulness, and relevancy do not enter the picture.Specifically, as Lord implied, "The numbers donot know where they came from."Likewise, the conduct of the experiment is aphysical act and measurement aspects are irrelevant. But this stage is of little consequence forthis paper.Statistical Conclusions vs. EmpiricalConclusionsAnother point of contention that seems to bepresent in the controversy is the possible confusion between conclusions of a statistical natureand those falling in the empirical domain. Asindicated in the last section, statistical conclusions are defined within the mathematical context of the procedure and follow the decision to

65Issues in Measurement Statisticsreject Ho or not. For example, let us take anANOVA situation; the conclusion in the casewherein rejection of H o occurs is that the samples come from two or more population distributions. The conclusion is that at least one set ofnumbers is different from other sets. There is noconcern with what the numbers refer to. The setof numbers is merely a distribution of values.This is a gross statistical statement, that the twoor more samples are from different populationdistributions. In the case in which H o is notrejected, the conclusion is that there is no evidence to indicate that the samples come frommore than one population distribution. This alsois a gross statistical statement, that there is noevidence to indicate that the populations fromwhich the samples were derived are different;that is, only one population distribution is involved. In statistical analyses there is no reference to measurement aspects involved in thenumbers. The conclusion is that the populationsare different, or not.On the other hand, empirical or theoreticalconclusions do have reference to what the numbers stand for and mean. Thus measurementproperties enter the picture. The researcher takesthe results of the statistical analysis stage andplaces them within the context of the researcheffort. For example, if one is conducting a learning experiment and is using number of errors asan indicator of degree of learning, measurementconsiderations arise concerning this choice, forexample, reliability, validity, meaningfulness,and relevancy of the index. If these aspects werehandled in an adequate fashion before the experiment was conducted, then the researcher canconclude that different degrees of learningoccurred in the specific situation of concern (ifH o was rejected) or that there is no evidence toindicate different degrees of learning (if H o wasnot rejected). The interpretation stage brings inthe specific empirical operations with their associated measurement requirements so as to provide meaningful interpretation of the results ofthe experiment.In summary, for measurement purposes numbers are important because they relate to someunderlying referent. However, in a statisticalanalysis, these referents do not enter the picture;it is only the numbers (which have no uniquenessexcept as numbers) that are involved in the statistical operations in a manner prescribed by themathematical properties of the method. Thesestatistical operations allow an effective orderingof the sets of numbers so that empirical statements (and associated meaning) can be added inthe interpretation stage.Logical Inconsistencies in the PM PositionThe position of the PM group — that measurement scale properties of the data determinethe specific statistical analyses — encounters anumber of logical inconsistencies. These are:Levels of Number Analysis or ContextThe context of the number analysis in whichthe assignment of the measurement scale property occurs is of major importance (Gaito, 1960).There can be more than one specific level ofnumber analysis involved in this assignment. Forexample, take the case wherein a number isgiven to a single response of one subject on oneoccasion: S gives a response to a test item and isscored right (1) or wrong (0). The number of 1 or0 in this case would indicate the lowest scalelevel, nominal data, according to the PM group.However, if we determine the total number ofcorrect responses of one subject, then a differentscale should appear. This scale is at least anordinal one; for example, 20 correct of 20responses is greater than 19 correct in 20 items.The same result would occur if there were morethan one subject. Likewise, if the mean ormedian of the set of scores for one subject (ormore than one subject) were determined, at leastan ordinal scale would appear.Finally, if these correct responses are considered as a sample drawn from a population distribution of correct responses and the characteristics of this distribution are determinable, thenan interval scale is involved — the differencesbetween various points on the curve can be ofknown value.In this example, we have demonstrated threedifferent contexts or levels of number analysesthat can be present in a statistical analysis. Itappears that the PM group has been concernedonly with one of these levels in each case. Yet in astatistical analysis all three contexts can appear.Even with the use of a x 2 test of the null hypothesis, which the PM group would specify as anexample of nominal scale statistics, all threelevels are involved. One could ask why there isconcern with only the first level in this case when

66Canadian Psychology/Psychologie Canadienne, 1986, 27:1this statistical procedure involves also the obtaining of the frequency of responses (level 2—afrequency of 10 is greater than a frequency of 9,etc.) and relating these obtained frequencies tothe expected frequencies using a familiar distribution (x 2 — leve.1 3 analysis).Different Scales Give Same ResultA serious problem with the notion that themeasurement scale of data determines the statistical procedures has been pointed out by a number of writers (e.g., Binder, 1984; Gaito, 1980;Savage, 1957). This is the logical inconsistencyinvolved when two or more procedures exemplify different types of scale properties but produce the same result. Three examples should besufficient.(a) The Binomial Test and the Sign Test aresupposed to consist of nominal data and ordinaldata, respectively. However, underlying bothtechniques is the binomial distribution and bothallow for the rejection, or non-rejection, of H o atthe same probability level.(b) The normal distribution (interval scale)provides an excellent approximation to the exactprobabilities given by the binomial distribution(nominal scale, according to PM group), especially when p q and n is 10 or more.(c) With classificatory data, x 2 (nominalscale) and the normal distribution (intervalscale) can give the same result under some conditions. This is to be expected since the square ofa unit normal variate has a chi-square distribution with 1 df, i.e., z2 x 2 when 1 4f occurs.Actually (b) and (c) examples can be combined. In some cases the results of the use ofx2,the binomial distribution, and the normalapproximation provide similar results.The first example illustrates data that haveadjacent scale properties (nominal — ordinal).However, the second and third ones wouldappear to be the most difficult ones for PMadvocates to handle, because these examplesinvolve data that are two scales apart (nominal —interval). Unfortunately, members of the PMgroup have not noted or commented on this apparent inconsistency. In any event, these threeexamples should indicate clearly the independence of measurement scales and statistical analyses. In actual fact, there are only two types ofdata, continuous and discontinuous. However, insome cases even this distinction becomes blurred— for example, when the normal distribution(continuous in form) is used as an approximationto the discontinuous distributions cited above.2Other ExamplesAnother example of logical inconsistency inthe PM position is cited by Binder (1984). Hedescribes an example in a book by Johnson(1981). The latter specifies that use of Pearson's rrequires interval (or ratio) type data. Johnsonthen adds that the rho is the Pearson correlationfor the same data in ranks, and is of ordinal scalenature, and he apparently did not recognize theinconsistency involved. However, Spearman'smethod is a type of product moment procedurethat provides simplified calculations that dependon the numerical properties of ranks; that is,Spearman's rho is an estimate of the Pearson rwhen the numerical values of the latter are converted to ranks.Furthermore, the PM group allows for onlycertain transformations (permissible) to occurwith each measurement scale. Yet a number ofresearchers have shown clearly that non-permissible transformations of data produce similarresults to those with permissible transformations, indicating that statistical tests depend uponnumbers and not their histories or source (Anderson, 1961; Baker, Hardyck, & Petrinovich, 1966;Binder, 1984).Also, in many cases there is the statement orimplication that the operations of addition, subtraction, multiplication, and division cannotoccur with subinterval type data (and that nonparametric procedures should be used). To whichone can question, as for example did Lubin(1962, p. 359),How does one compute chi-square, Spearman'srho, Wilcoxon's U, or any other nonparametricstatistic without adding, subtracting, multiplying,or dividing?In summary, it seems that the PM group eitherhave not recognized the many inconsistencies intheir position or else have superficially cast themaside. The latter aspect occurred on a number ofoccasions in the paper by Townsend and Ashby(1984). For example, in response to the Gaitoarticle (1980) they state "we were simply not2See Binder (1984) and Savage (1957) for additional examples of inconsistencies.

67Issues in Measurement Statisticsable to make sense of." (p. 395). In commenting on the central point implied by Lord (1953) inhis much cited paper "that the numbers do notknow where they came from," they indicate that"just exactly what this curious statement has todo with statistics or measurement eludes us"(p. 396). The statements of Savage (1957) "seembeside the point" (p. 396). It appears that if theydo not understand a point, it must be incorrect.Yet another interpretation of these responsescould be that the authors are showing a lack ofunderstanding of the basic points involved.of measurement properties (Binder, personalcommunication).ConclusionsA central point in the argument of the PMgroup is that the AM group does not allow measurement aspects with associated meaning toenter into the overall research picture. As theabove discussion indicates, this is not a correctdescription of the AM approach. The AM groupallows measurement and meaning to enter intosome stages of experimental design (i.e., planning, interpretation of results), but not into others, specifically not into statistical analyses.Theoretical and Empirical Shortcomings inOnly mathematical, not measurement, aspectsthe PM Positionenter at this point. This is the point on which theIt should be emphasized that measurementexcellent early papers by Burke and by Lord areand statistical procedures are tools that the scienbased.tist uses to attain certain empirical and theoretFurthermore, the logical inconsistency andical objectives. Thus, the scientist should makeuse of any tools that will facilitate movement empirical shortcomings of the examples intoward the goals. The consequences of following volved in the last two sections would appear to beslavishly the pronouncements of the PM group difficult to rationalize by the PM group. Howcan result in the loss of potential theoretical and ever, it seems that members of this group overempirical gains. For example, Binder (1984) look these types of possibilities. Binder (1984)indicated that by disregarding the suggestion that indicated that such inconsistencies occurbecause PM advocates miss the point that "levelsIQ is measurable only on an ordinal scale,of measurement" refers to relations

term "statistical analysis" has a different mean-ing and emphasis for the two groups. The AM group is defining or emphasizing statistical analysis as one of a number of stages within experimental design, but the PM group may be equating statistical analysis with the ov

Related Documents:

May 02, 2018 · D. Program Evaluation ͟The organization has provided a description of the framework for how each program will be evaluated. The framework should include all the elements below: ͟The evaluation methods are cost-effective for the organization ͟Quantitative and qualitative data is being collected (at Basics tier, data collection must have begun)

Silat is a combative art of self-defense and survival rooted from Matay archipelago. It was traced at thé early of Langkasuka Kingdom (2nd century CE) till thé reign of Melaka (Malaysia) Sultanate era (13th century). Silat has now evolved to become part of social culture and tradition with thé appearance of a fine physical and spiritual .

On an exceptional basis, Member States may request UNESCO to provide thé candidates with access to thé platform so they can complète thé form by themselves. Thèse requests must be addressed to esd rize unesco. or by 15 A ril 2021 UNESCO will provide thé nomineewith accessto thé platform via their émail address.

̶The leading indicator of employee engagement is based on the quality of the relationship between employee and supervisor Empower your managers! ̶Help them understand the impact on the organization ̶Share important changes, plan options, tasks, and deadlines ̶Provide key messages and talking points ̶Prepare them to answer employee questions

Dr. Sunita Bharatwal** Dr. Pawan Garga*** Abstract Customer satisfaction is derived from thè functionalities and values, a product or Service can provide. The current study aims to segregate thè dimensions of ordine Service quality and gather insights on its impact on web shopping. The trends of purchases have

Chính Văn.- Còn đức Thế tôn thì tuệ giác cực kỳ trong sạch 8: hiện hành bất nhị 9, đạt đến vô tướng 10, đứng vào chỗ đứng của các đức Thế tôn 11, thể hiện tính bình đẳng của các Ngài, đến chỗ không còn chướng ngại 12, giáo pháp không thể khuynh đảo, tâm thức không bị cản trở, cái được

Le genou de Lucy. Odile Jacob. 1999. Coppens Y. Pré-textes. L’homme préhistorique en morceaux. Eds Odile Jacob. 2011. Costentin J., Delaveau P. Café, thé, chocolat, les bons effets sur le cerveau et pour le corps. Editions Odile Jacob. 2010. Crawford M., Marsh D. The driving force : food in human evolution and the future.

Le genou de Lucy. Odile Jacob. 1999. Coppens Y. Pré-textes. L’homme préhistorique en morceaux. Eds Odile Jacob. 2011. Costentin J., Delaveau P. Café, thé, chocolat, les bons effets sur le cerveau et pour le corps. Editions Odile Jacob. 2010. 3 Crawford M., Marsh D. The driving force : food in human evolution and the future.