THE ASSESSMENT OF WRITING ABILITY: A REVIEW OF RESEARCH .

3y ago
20 Views
2 Downloads
4.12 MB
52 Pages
Last View : 1d ago
Last Download : 3m ago
Upload by : Gideon Hoey
Transcription

THE ASSESSMENT OFWRITING ABILITY:A REVIEW OF RESEARCHPeter L. CooperGRE Board Research Report GREB No. 82-15RETS Research Report 84-12May 1984This report presents the findings of aresearch project funded by and carriedout under the auspicesof the GraduateRecord Examinations Board.

ERRATUMFirstthreelineson page 35 should read:"It is not clear whether 6-pointor even more eirtended scalesproduce more reliableratingsthan does a Cc-point scale, or whetherthey significantlylengthen reading time."

The Assessment of WritingA Review of ResearchAbility:PeterGRE Board ResearchL.CooperReportGREB No.82-15RMay 1984Copyright@1984by EducationalTestingService.Allrightsreserved.

-i-ABSTRACTThe assessment of writingabilityhas recentlyrecevied much attentionfrom educators,legislators,and measurement experts,especiallybecausethe writingof students in all disciplinesand at all educationallevelsseems, on the whole, less proficientthan the writingproduced by studentsfive or ten years ago.The GRE Research Committee has expressed interestin the psychometricand practicalissues that pertainto the assessment ofwritingability.This paper presents not a new study but a review ofmajor research in lightof GRE Board concerns.Specifically,recentscholarshipand informationfrom establishedprograms are used to investigate the nature and limitationsof essay and multiple-choicetests ofwritingability,the statisticalrelationshipof performances on thesethe performance of populationsubgroups on each kind oftypes of tests,task, the possibleneed of differentdisciplinesfor differenttests ofcompositionskill,and the cost and usefulnessof various strategiesforevaluatingwritingability.The literatureindicatesthat essay tests are often considered moreCertainlyvalidthan multiple-choicetests as measures of writingability.they are favored by English teachers.But although essay tests maythe variancein essay testsample a wider range of composition skills,scores can reflectsuch irrelevantfactorsas speed and fluency under timepressure or even penmanship.Also, essay test scores are typicallyfarless reliablethan multiple-choicetest scores.When essay test scoresare made more reliablethrough multipleassessments, or when statisticalcorrectionsfor unreliabilityare applied,performance on multiple-choiceThe multiple-choicemeasures,and essay measures can correlatevery highly.though, tend to overpredictthe performance of minoritycandidatesonessay tests.It is not certainwhether multiple-choicetests have essentiallythe same predictivevalidityfor candidatesin differentacademicStill,at all levelsofdisciplines,where writingrequirementsmay vary.educationand ability,there appears to be a close relationshipbetweenperformance on multiple-choiceand essay tests of writingability.Andyet each type of measure contributesunique informationto the overallassessment.The best measures of writingabilityhave both essay andbut this design can be t cuttingalternativessuch as an unscored or locallyscored writingFor programssample may compromise the qualityof the essay assessment.a discussionof the cost and usesconsideringan essay writingexercise,The holisticmethod, althoughof differentscoring methods is included.the cheapest and best means ofhaving littleinstructionalvalue, offersratingessays for the rank orderingand selectionof candidates.

-iii-TableAbstract.i.[llIntroductionDirectof ContentsversusIndirectAssessment:The Issues.111Sources of Variancein DirectAssessment:.Questions of Validityand ReliabilityWriter. .Topic.Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Time stencies.9Sources of Variancein IndirectAssessment:.Questions of Validityand Reliability11WritingThe Relationcontextof DirectCorrelationsCorrelatedWritingand samplingand Indirectof ddressedGroup Comparison by Directand IndirectAssessment .WritingAssessmentAssessmentby isciplinesCombined measures21.or Combined Methods.measure12.2324242729

-iv-.30Compromise Measures. .32ScoringDirectDirectand IndirectMethods forAssessmentCostsAssessment.Descrintonof the methods and theirAdvantagesand disadvantagesSummary and ConclusionReferencesuses .of the methods.and Bibliography.3434364042

IntroductionEducators in all disciplinesand at all academic levelshave becomeconcerned in recent years about the apparent declinein the writingabilityof theirstudents,and now legislatorsshare that concern.Atpresent,35 states have passed or are consideringlegislationthat mandatesthe statewidetestingof students'writingskills(Notes from the NationalTesting Network in Writing,October 1982, p. 1).As part of a recentresearch study, facultymembers from 190 departments at 34 universitiesinCanada and the United States completed a questionnaireon the importanceof academic writingskillsin theirfields:business management, psychology,computer science,chemistry,civilengineering,and electricalengineering.In all six areas, writingabilitywas judged importantto success ingraduate training.One can assume that it is not merely importantbutessentialin the liberalarts.Even first-yearstudents in programs suchas electricalengineeringmust write laboratoryreportsand articlesummaries.Long research papers are commonly assigned to graduates inbusiness,civilengineering,psychology,the lifesciences,and of coursethe humanitiesand the socialsciences.Respondents in the study alsoagreed that writingabilityis even more importantto professionalthan toacademic success (Bridgeman & Carlson,1983).Understandably,there has been a growing interest,among legislatorsand educators as well as writingspecialists,in the methods used tomeasure, evaluate,Two distinctmethods haveand predictwritingskills.evolved:"directassessment" requiresthe examinee to writean essay orseveralessays, typicallyon iresthe examinee to answer multiple-choiceitems.Hencedirectassessment is sometimes referredto as a "production"measure andindirectassessment as a "recognition"measure.Richard Stiggins(1981)observes that "indirectassessment tends to cover highly explicitconstructsin which there are definiterightand wrong responses (e.g.,a particularlanguage constructionis eithercorrector it is not).Directassessment,on the other hand, tends to measure less tangibleskills(e.g.,persuasiveness),for which th ceptof rightand wrong is less relevant*'(p. 3).DirectversusIndirectAssessment:The IssuesEach method has its own advantages and disadvantages,its own specialuses and shortcomings,its own operationalimperatives.Proponents of thetwo approaches have conducted a debate--attimes almost a war--sincethebeginning of the century.The battlelines were drawn chieflyover theissues of validity(the extent to which a test measures what it purportsability)and reliability(the extent toto measure--inthis case, writingwhich a test measures whatever it does measure consistently).At firstitwas simply assumed that one must test writingabilityby having examineespsychologistsbeganwrite.But during the 1920s and 193Os, educationalexperimentingwith indirectmeasures because essay scorers (also called"readers"or "raters")were shown to be generallyinconsistent,orThe objectivetests not only could achieveunreliable,in theirratings.

-2-extremelyhigh statisticaland scored economically,number of candidates.reliabilitiesthus minimizingbut alsothe costcould be administeredof testinga growingAs late as 1966, OrvillePalmer wrote:"Sixty years of [College]Board English testinghave amply proved that essay tests are neitherreliablenor valid,and that,whatever theirfaults,objectivetests doconstitutea reliableand valid method of ascertainingstudent composzionalability.Such a conclusion was very painfullyand reluctantlyarrivedat"(p. 286).arrivedat.In a landmark study ofAnd, perhaps, prematurelythe same year, Godshalk, Swineford,and Coffman (1966) showed that,underspecialcircumstances,scores on briefessays could be reliableand alsovalidin making a unique contributionto the predictionof performanceon a stablecriterionmeasure of writingability.Since then, directassessment has rapidlygained adherents.Onereason is that,while directand indirectassessments appear statisticallyto measure very similarskills,as willbe discussed later,indirectmeasures lack "face validity"and credibilityamong English teachers.Thatis, multiple-choicetests do not seem to measure writingabilitybecauseAlso, relianceon indirectmethods exclusivelythe examinees do not write.can entailundesirableside effects.Students may presume that writingisnot important,or teachers that writingcan be taught -- if it need betaught at all -- through multiple-choiceexercisesinsteadof practiceinwritingitself.One has to reproduce a given kind of behavior in order topracticeor perfectit,but not necessarilyin order to measure it.Still,that point can easilybe missed, and the College Board has committed itselfto supportingthe teaching of compositionby reinstatingthe essay in theAdmissions Testing Program English Composition Test.The Board's practiceis widely followed.Only about 5 percent ofall statewidewritingassessments rely solelyon objectivetests;ofthose remaining,about half combine directand indirectmeasures, but theproportionusing directmeasures alone has increasedin recent years andWritingtests for a college-agepopulationarecontinuesto increase.Along with the popularityofespeciallylikelyto requirean essay.directassessment has grown the tendency of writingtests,especiallyin higher education,to focus on "higher-level"skillssuch as organizaclarity,sense of purpose, and development of ideas ratherthan ontion,skillssuch as spelling,mechanics, and usage.Of course,"lower-level"higher-levelskillsappear naturallyto be the province of directassessmentand lower-levelskillsthe humbler domain of indirectassessment; hencethe greaterface validityand credibilityof essay tests for those whoteach English.Earle G. Eley (1955) has argued that "an adequate essay test ofwritingis validby definition. .sinceit requiresthe candidatetoIn directperform the actual behavior which is being measured" (p. 11).planning,writing,and perhapstimeassessment, examinees spend theirrevisingan essay.But in indirectassessment, examinees spend theirConsetime reading items, evaluatingoptions,and selectingresponses.

-3-quently,multiple-choicetests confound the variablesby measuringreading as well as writingskills,or rather editorialand errorin that the examinees produce no writing.recognitionskills,some insist,Multiple-choicetests,of course, do not require-- or even permit -- anysuch performance.And they have also been criticizedfor making "littleor no attempt to measure the 'largerelements'of composition,evenindirectly"(Braddock, Lloyd-Jones,& Schoer, 1963, p. 42).Their criticsgenerallymaintainthat objectivetests cannot assess originality,creativity,logicalcoherence, effectivenessof rhetoricalstrategy,management andto generate ideas and supportingexamples,flexibilityof tone, the abilitythe abilityto compose for differentpurposes and audiences,the abilityor the abilityto exerciseany other higher-levelto stick to a subject,that objectivetests cannot assess anythingwritingskill-- in short,very importantabout writing.Charles Cooper and Lee Ode11 (1977),leadersin the fieldof writingresearch,voice a positionpopular with the NationalCouncil of Teachersof English in saying that such examinations"are not valid measures ofwritingperformance.The only justifiableuses for standardized,normreferencedtests of editorialskillsare for predictionor placement orfor the criterionmeasure in a research study with a narrow 'correctness'hypothesis."But even for placement they are less validthan a "singlewritingsample quicklyscored by trainedraters,as in the screeningforSubject A classes (remedialwriting)at the Universityof Californiacampuses" (p. viii).But Cooper and Ode11 make some unwarranted assumptions:that validityor that what appears to be invalidis in factequals face validity,invalid,and that one cannot draw satisfactoryconclusionsby indirectthemeans -- in other words, that one cannot rely on smoke to indicateNumerous studies show a strong relationshipbetween thepresence of fire.be discussed in greaterresultsof directand indirectmeasures, as willdetailbelow.Although objectivetests do not show whether a student hasmuch evidence suggests the usefulnessof adeveloped higher-orderskills,well-craftedmultiple-choicetest for placement and predictionof performance.objectivetests may focus more sharply on the particularaspectsMoreover,Students willpass or failtheof writingskillthat are at issue."singlewritingsample quicklyscored" for a varietyof reasons -- somefor being weak in spellingor mechanics, some for being weak in grammarthesis development and organization,some forand usage, some for lackingnot being inventiveon unfamiliarand uninspiringtopics,some for writingAn essay examinationofpapers that were read late in the day, and so on.the sort referredto by Cooper and Ode11 may do a fairjob in rankingstudents by the overallmerit of theircompositionsbut failto distinguishbetween those who are gramaticallyor idiomaticallycompetent and thosewho are not.

-4-Sources of Variance in Direct Assessment:Questions of Validityand ReliabilitySeveral sourcesof essay tests.of scorevariancereducethe validityand reliabilityexams, "oftenWriter:Braddock et al. (1963) say that compositionreferredto as measures of writingability,. .arealways measuresof writingperformance;that is, when one evaluatesan example of ahe cannot be sure that the student is fullyusing hisstudent'swriting,ability,is writingas well as he can."The authors cite severalstudieson this point,especiallyan unpublished Ed.D. dissertationby Gerald L.1963) that demonstrateshow "theKincaid(1953, cited in Braddock et al.,day-to-daywritingperformance of individualsvaries,especiallytheperformance of betterwriters."If the performance of good writersvariesmore than that of poor writers,then "variationsin the day-to-daywritingperformance of individualstudents[do not] 'cancel each other out' whenthe mean ratingof a large group of students is considered"(pp. 6 - 7).In additionto the "writervariable*'as a source of error,Braddock etal. identify"the assignment variable,with its four aspects:the topic,the mode of discourse,the time affordedfor writing,and the examinationsituation"(p. 7).it is not psychometricallydefensibleTopic:If no topic is prescribed,to compare results.And, too, the variablebeing measured -- writingability-- is confounded by other variables,such as the abilitytogenerate a reasonably good topic on the spur of the moment. A test ofwritingabilityis invalidto the extent that the scores reflectanythingbesides the students'writingability.But prescribedtopics entailtheirown sorts of difficulties.As Gertrude Conlan (1982) observes,"Betweenthe time an essay question is readied for printingand the time it isadministered,any number of things can happen to affectthe qualityandvarietyof responses readers must score" (p. 11).For example, developmentsin the "real world" may make the topic obsolete or emotionallycharged.On the other hand, some topics manifesta surprisingpower to elicitplatitudesand generalities;the papers "allsound alike,"and readersfind it nearly impossibleto keep theirattentionsharp, apply consistentstandards,and make sound discriminations.Of course, such circumstanceslower the reading and the score reliabilityof the examination-- that is,the extent to which another reading session would produce the same scoresfor the same essays or for the same students writingon a new topic.Score reliabilityis typicallya problem in directassessment becausedifferenttopics often requiredifferentskillsor make differentconceptualdemands on the candidates.In such cases, comparabilityof scores acrossAs Palmer (1966) remarks,topics and administrationsis hard to achieve."Englishteachers hardly need to be told that there existsa great deal ofvariabilityin student writingfrom one theme to another and from oneThe most brilliantstudents may do well on one essayessay to another.topic and badly on another.An objectivetest greatlyreduces thisinconsistencyor variability"(p. 288) by sampling a wider and better

-5-controlledrange of behavior with a number of items.Braddock et al.(1963) cite various studies demonstratingthat "the topic a person writeson affectsthe caliberof his writing*'(p. 17).By offeringonly onetopic,examiners apparentlyintroducean invalidsource of variance.For this reason, some researchershave recommended offeringa varietyof topics if the test populationhas a broad range of abilities(Wiseman& Wrigley,1958).Others consider it the lesser evil to provide onlyone topic per administration.Godshalk et al. (1966) observe, "In thefirstplace,there is no evidence that the average student is ableto judge which topic willgive him the advantage.In the second place,the variabilityin topics . .would be introducingerror at the same timethat students might be eliminatingerror by choosing the topic on whichthey were most adequatelyprepared"(pp 13-14).The problem, of course,is exacerbatedif the alternatetopics are not carefullymatched.Mode: Just as any given writeris not equallyprepared to write on alltopics,so he or she is not equallyequipped to write for all purposes orin all modes of discourse,the most traditionalbeing description,narration,exposition,and argumentation.Variationsin mode of discourse"may have more effectthan variationsin topic on the qualityof writing,*'Even sentence structureand otherespeciallyfor less able candidates.syntacticfeaturesare influencedby mode of discourse(See pages 8, 93,17 in Braddock et al.,1963).Edward White (1982) says, "We know thatassigned mode of discourseaffectstest score distributionin importantWe do not know how to develop writingtests that willbe fairtoways.students who are more skilledin the modes not usuallytested"(p. 17) -or in the modes not tested by the writingexercisethey must perform.Research data presentedby Quellmalz,Capell,and Chih-Ping(1982) "castis a good writer'regardlessdoubt on the assumption that ’ a good writerThe implicationis that writingfor differentaimsof the assignment.draws on differentskillconstructswhich must thereforebe measuredinvalidinterpretationsofand reportedseparatelyto avoid erroneous,performance,as well as inaccuratedecisionsbased on such performances"any given writingexercisewillprovide an(pp. 255-56).Obviously,incompleteand partiallydistortedrepresentationof an examinee's overallwritingability.Time limit:The time limitof a writingexercise,another aspect of theWilliamassignment variable,raisesadditionalmeasurement problems.Coffman (1971) observes that systematicand variableerrorsin measurement"Whatresultwhen the examinee has only s

composition skill, and the cost and usefulness of various strategies for evaluating writing ability. The literature indicates that essay tests are often considered more valid than multiple-choice tests as measures of writing ability. Certainly they are favored by English teachers. But although essay tests may

Related Documents:

May 02, 2018 · D. Program Evaluation ͟The organization has provided a description of the framework for how each program will be evaluated. The framework should include all the elements below: ͟The evaluation methods are cost-effective for the organization ͟Quantitative and qualitative data is being collected (at Basics tier, data collection must have begun)

Silat is a combative art of self-defense and survival rooted from Matay archipelago. It was traced at thé early of Langkasuka Kingdom (2nd century CE) till thé reign of Melaka (Malaysia) Sultanate era (13th century). Silat has now evolved to become part of social culture and tradition with thé appearance of a fine physical and spiritual .

On an exceptional basis, Member States may request UNESCO to provide thé candidates with access to thé platform so they can complète thé form by themselves. Thèse requests must be addressed to esd rize unesco. or by 15 A ril 2021 UNESCO will provide thé nomineewith accessto thé platform via their émail address.

̶The leading indicator of employee engagement is based on the quality of the relationship between employee and supervisor Empower your managers! ̶Help them understand the impact on the organization ̶Share important changes, plan options, tasks, and deadlines ̶Provide key messages and talking points ̶Prepare them to answer employee questions

Dr. Sunita Bharatwal** Dr. Pawan Garga*** Abstract Customer satisfaction is derived from thè functionalities and values, a product or Service can provide. The current study aims to segregate thè dimensions of ordine Service quality and gather insights on its impact on web shopping. The trends of purchases have

Chính Văn.- Còn đức Thế tôn thì tuệ giác cực kỳ trong sạch 8: hiện hành bất nhị 9, đạt đến vô tướng 10, đứng vào chỗ đứng của các đức Thế tôn 11, thể hiện tính bình đẳng của các Ngài, đến chỗ không còn chướng ngại 12, giáo pháp không thể khuynh đảo, tâm thức không bị cản trở, cái được

Le genou de Lucy. Odile Jacob. 1999. Coppens Y. Pré-textes. L’homme préhistorique en morceaux. Eds Odile Jacob. 2011. Costentin J., Delaveau P. Café, thé, chocolat, les bons effets sur le cerveau et pour le corps. Editions Odile Jacob. 2010. Crawford M., Marsh D. The driving force : food in human evolution and the future.

Le genou de Lucy. Odile Jacob. 1999. Coppens Y. Pré-textes. L’homme préhistorique en morceaux. Eds Odile Jacob. 2011. Costentin J., Delaveau P. Café, thé, chocolat, les bons effets sur le cerveau et pour le corps. Editions Odile Jacob. 2010. 3 Crawford M., Marsh D. The driving force : food in human evolution and the future.