Formative Assessment And The Design Of Instructional

2y ago
54 Views
2 Downloads
1.68 MB
26 Pages
Last View : 27d ago
Last Download : 3m ago
Upload by : Nixon Dill
Transcription

Instructional Science 18:119-144 (1989) Kluwer Academic Publishers. Dordrecht - Printed in the Netherlands119Formative assessment and the design of instructional systemsD. ROYCE SADLERAssessment and Evaluation Research Unit,Department of Education, Universityof Queensland,StJ ucia, Queensland4067, AustraliaAbstract. The theory of formative assessment outlined in this article is relevant to a broad spectrum oflearning outcomes in a wide variety of subjects. Specifically, it applies wherever multiple criteria areused in making judgments about the quality of student responses. The theory has less relevance foroutcomes in which student responses may be assessed simply as correct or incorrect. Feedback isdefined in a particular way to highlight its function in formative assessment. This definition differs inseveral significant respects from that traditionally found in educational research. Three conditions foreffective feedback are then identified and their implications discussed. A key premise is that for students to be able to improve, they must develop the capacity to monitor the quality of their own workduring actual production. This in tum requires that students possess an appreciation of what high quality work is, that they have the evaluative skill necessary for them to compare with some objectivity thequality of what they are producing in relation to the higher standard, and that they develop a store oftactics or moves which can be drawn upon to modify their own work. It is argued that these skills canbe developed by providing direct authentic evaluative experience for students. Instructional systemswhich do not make explicit provision for the acquisition of evaluative expertise are deficient, becausethey set up artificial but potentially removable performance ceilings for students.IntroductionThis article is about the nature and function of formative assessment in the development of expertise. It is relevant to a wide variety of instructional systems inwhich student outcomes are appraised qualitatively using multiple criteria. Thefocus is on judgments about the quality of student work: who makes the judgments, how they are made, how they may be refined, and how they may be put touse in bringing about improvement. The article is prompted by two overlappingconcerns. The first is with the lack of a general theory of feedback and formativeassessment in complex learning settings. The second concern follows from thecommon but puzzling observation that even when teachers provide students withvalid and reliable judgments about the quality of their work, improvement doesnot necessarily follow. Students often show little or no growth or developmentdespite regular, accurate feedback. The concern itself is with whether some learners fail to acquire expertise because of specific deficiencies in the instructionalsystem associated with formative assessment.The discussion begins with definitions of feedback, formative assessment andqualitative judgments. This is followed by an analysis of certain patterns inteacher-student assessment interactions. A number of causal and conditional

120linkages are then identified. These in turn are shown to have implications for thedesign of instructional systems which are intended to develop the ability ofstudents to exercise executive control over their own productive activities, andeventually to become independent and fully self-monitoring.Formative assessment, feedback and self-monitoringEtymology and common usage associate the adjective formative with forming ormoulding something, usually to achieve a desired end. In this article, assessmentdenotes any appraisal (or judgment, or evaluation) of a student's work or performance. (In some contexts, assessment is given a narrower and more specializedmeaning; some North American readers in particular may prefer to substitute theterm evaluation for assessment.)Formative assessment is concerned with how judgments about the quality ofstudent responses (performances, pieces, or works) can be used to shape andimprove the student's competence by short-circuiting the randomness and inefficiency of trial-and-error learning.Summative contrasts with formative assessment in that it is concerned withsumming up or summarizing the achievement status of a student, and is gearedtowards reporting at the end of a course of study especially for purposes of certification. It is essentially passive and does not normally have immediate impact onlearning, although it often influences decisions which may have profound educational and personal consequences for the student. The primary distinction betweenformative and summative assessment relates to purpose and effect, not to timing.It is argued below that many of the principles appropriate to summative assessment are not necessarily transferable to formative assessment; the latter requires adistinctive conceptualization and technology.Feedback is a key element in formative assessment, and is usually defined interms of information about how successfully something has been or is being done.Few physical, intellectual or social skills can be acquired satisfactorily simplythrough being told about them. Most require practice in a supportive environmentwhich incorporates feedback loops. This usually includes a teacher who knowswhich skills are to be learned, and who can recognize and describe a fine performance, demonstrate a fine performance, and indicate how a poor performance canbe improved. Feedback can also be defined in terms of its effect rather than itsinformational content: "Feedback is information about the gap between the actuallevel and the reference level of a system parameter which is used to alter the gapin some way" (Ramaprasad, 1983, p. 4). This alternative definition emphasizesthe system-control function. Broadly speaking, feedback provides for two mainaudiences, the teacher and the student. Teachers use feedback to make programmatic decisions with respect to readiness, diagnosis and remediation. Students use

121it to monitor the strengths and weaknesses of their performances, so that aspectsassociated with success or high quality can be recognized and reinforced, andunsatisfactory aspects modified or improved.An important feature of Ramaprasad's definition is that information about thegap between actual and reference levels is considered as feedback only when it isused to alter the gap. If the information is simply recorded, passed to a third partywho lacks either the knowledge or the power to change the outcome, or is toodeeply coded (for example, as a summary grade given by the teacher) to lead toappropriate action, the control loop cannot be closed and "dangling data" substitute for effective feedback. In any area of the curriculum where a grade or scoreassigned by a teacher constitutes a one-way cipher for students, attention isdiverted away from fundamental judgments and the criteria for making them. Agrade therefore may actually be counterproductive for formative purposes.In assessing the quality of a student's work or performance, the teacher mustpossess a concept of quality appropriate to the task, and be able to judge the student's work in relation to that concept. But although the students may accept ateacher's judgment without demur, they need more than summary grades if theyare to develop expertise intelligently. The indispensable conditions for improvement are that the student comes to hold a concept of quality roughly similar to thatheld by the teacher, is able to monitor continuously the quality of what is beingproduced during the act of production itself, and has a repertoire of alternativemoves or strategies from which to draw at any given point. In other words, students have to be able to judge the quality of what they are producing and be ableto regulate what they are doing during the doing of it. As Shenstone (correctly)put it over two centuries ago, "Every good poet includes a critick; the reverse willnot hold" (Shenstone, 1768, p. 172).Stated explicitly, therefore, the learner has to (a) possess a concept of the standard (or goal, or reference level) being aimed for, (b) compare the actual (or current) level of performance with the standard, and (c) engage in appropriate actionwhich leads to some closure of the gap. These three conditions form the organizing framework for this article. It will be argued that they are necessary conditions,which must be satisfied simultaneously rather than as sequential steps. It is nevertheless useful to make a conceptual distinction between the conditions. The(macro) process of grading involves the first two in that it is essentially comparing a particular case either with a standard or with one or more other cases.Control during production involves all three conditions and is, by contrast, a(micro) process carried out in real time. Judging from assessment practices common in many subjects, information generated without the participation of the learner but made available to the learner from time to time (as intelligence) isevidently assumed to satisfy these conditions. A detailed examination of the threeconditions shows why this assumption falls short of what is actually necessary.

122For purposes of discussion, it is convenient to make a distinction between feedback and self-monitoring according to the source of the evaluative information. Ifthe learner generates the relevant information, the procedure is part of selfmonitoring. If the source of information is external to the learner, it is associatedwith feedback. In both cases, it is assumed that there has to be some closure of thegap for feedback and self-monitoring to be labelled as such. Formative assessmentincludes both feedback and self-monitoring. The goal of many instructional systems is to facilitate the transition from feedback to self-monitoring.Feedback and formative assessment in the literatureAuthors of textbooks on measurement and assessment published during the past25 years have placed great emphasis on achieving high content validity in teachermade tests, producing reliable scores or grades, and the statistical manipulation orinterpretation of scores. Only cursory attention has usually been given to feedbackand formative assessment, and then it is mostly hortatory, recipe-like and atheoretic. In many cases feedback and formative assessment (or their equivalents) arenot mentioned at all in either the body of the text or the index, although the booksby Rowntree (1977), Bloom, Madaus and Hastings (1981), Black and Dockrell(1984) and Chater (1984) are notable exceptions.In general, a concern with the aims of summative assessment has dominatedthe field in terms of both research and the guidance given to teachers (Black,1986). This dominance is implicit in the treatment given, for instance, to reliability and validity. Textbooks almost invariably describe how the validity (of assessments) is to be distinguished from the reliability (of grades or classifications).Reliability is usually (and correctly) said to be a necessary but not sufficient condition for validity, because measurements or judgments may be reliable in thesense of being consistent over time or over judges and still be off-target (orinvalid). Reliability is therefore presented as a precondition for a determination ofvalidity. In discussing formative assessment, however, the relation between reliability and validity is more appropriately stated as follows: validity is a sufficientbut not necessary condition for reliability. Attention to the validity of judgmentsabout individual pieces of work should take precedence over attention to reliability of grading in any context where the emphasis is on diagnosis and improvement. Reliability will follow as a corollary. Acceptance of this principle, which isemphasized by only a few writers (such as Nitko, 1983), has implications for howthe process of appraisal is conceptualized, and the mechanisms of improvementunderstood.In the literature on learning research, feedback is usually identified with knowledge of results (often abbreviated to KR), a concept which gained considerablecurrency through Thorndike's (1913) so-called Law of Effect. Reviewing a series

123of experimental studies on learning from written materials (texts and programmedinstruction), Kulhavy (1977, p. 211) defined feedback as "any of the numerousprocedures that are used to tell a learner if an instructional response is right orwrong". Kulik and Kulik (1988) adopted a similar definition in their review ofresearch on the timing of feedback. Learning researchers have been particularlyinterested in the effect of various feedback characteristics (such as immediacy,pertinence, data form and type of reward) on the retention of learned material.The research hypotheses tested have almost invariably been based on stimulusresponse learning theories, the aim being to discover the types of stimuli andincentives that promote learning. For the most part, this line of research has beenconfined to learning outcomes that can be assessed by quizzes and progress testsconsisting of problems to be solved or objective items that can be scored corrector incorrect. The learning programs are conceived of as divisible into logicallydependent units which can be mastered more or less sequentially, one by one. Theresulting technology is associated with test scores, diagnostic items, criterionreferencing and mastery learning.Other lines of research occur in specific subject areas. Of particular interest isthe literature on the assessment of writing, which contains descriptions of a number of different approaches, including assessment by means of general impression,analytic scales, primary traits, syntactic features, relative readability and intellectual strategy (Gere, 1980). These differ not only in procedural detail, but also intheir theoretical bases. Much of the discussion about and evaluation of the variouspossibilities has revolved around which assessment criteria should be used (andhow), which of the techniques has the soundest theoretical foundation (such as atheory of composition), or which produces the best agreement among competentjudges (reliability considerations). An alternative criterion for adjudicating amongassessment approaches is the extent to which students improve either as consumers of assessments arrived at by different methods, or through being trained touse a particular assessment approach themselves. With respect to the teaching ofwriting, these issues have not been thoroughly explored, although they aretouched upon by Cooper (1977), Odell and Cooper (1980) and several others.While the line of development in this article is different from that in the literature on writing assessment, it shares an interest in learning outcomes which arecomplex in the sense that qualitative judgments (defined below) are invariablyinvolved in appraising a student's performance. In such learnings, student development is multidimensional rather than sequential, and prerequisite learnings cannot be conceptualized as neatly packaged units of skills or knowledge. Growthtakes place on many interrelated fronts at once and is continuous rather than lockstep. The outcomes are not easily characterized as correct or incorrect, and it ismore appropriate to think in terms of the quality of a student's response or thedegree of expertise than in terms of facts memorized, concepts acquired or content mastered.

124Qualitative judgments defined and characterizedA qualitative judgment is defined (Sadler, 1987) as one made directly by a person,the person's brain being both the source and the instrument for the appraisal. Sucha judgment is not reducible to a formula which can be applied by a non-expert. Ingeneral, qualitative judgments have some or all of the following fivecharacteristics:1. Multiple criteria are used in appraising the quality of performances. As well asthe individual dimensions represented by the criteria, the total pattern of relationships among those dimensions is important. In this sense the criteria interlock, so that the overall configuration amounts to more than the sum of itsparts, Decomposing a configuration tends to reduce the validity of an appraisal.2. At least some of the criteria used in appraisal are fuzzy rather than sharp. Asharp criterion contains an essential discontinuity which is identifiable as anabrupt transition from one state to another, such as from correct to incorrect.There may be two or more well-defined states, but it is always possible in principle to determine which state applies. Sharp criteria are involved in all objective testing (including that in the arts and humanities), and the assessment ofmany outcomes in mathematics and the sciences which involve problem solving and theorem proving. By contrast, fuzzy criteria are characterized by a continuous gradation from one state to another. Originality, as applied to an essay,is an example of a fuzzy criterion because everything between wholly unoriginal and wholly original is possible. A fuzzy criterion is an abstract mental construct denoted by a linguistic term which has no absolute and unambiguousmeaning independent of its context. If a student is to be able to consciously usea fuzzy criterion in making a judgment, it is necessary for the student to understand what the fuzzy criterion means, and what it implies for practice.Therefore, learning these contextualized meanings and implications is itself animportant task for the student.3. Of the large pool of potential criteria that could legitimately be brought to bearfor a class of assessments, only a relatively small subset are typically used atany one time. The competent judge is able not only to make an appraisal, butalso to decide which criteria are relevant, and to substantiate a completed judgment by reference to them. In many cases, the teacher may find it impossible tospecify all of the relevant criteria in advance, or may find that a fixed set of criteria is not uniformly applicable to different student responses, even thoughthose responses may ostensibly be to the same task. Professional qualitativejudgment consists in knowing the rules for using (or occasionally breaking) therules. The criteria for using criteria are known as metacriteria.

1254. In assessing the quality of a student's response, there is often no independentmethod of confirming, at the time when a judgment is made, whether the decision or conclusion (as distinct from the student's response) is correct. Indeed, itmay be meaningless to speak of correctness at all. The final court of appeal isto another qualitative judgment. To give an example of methodological independence, suppose that two essays are to be compared. One approach is to aska competent person to judge which is of higher quality, with or without specifying the criteria. A different method of judging quality would be to use a computer program to analyse certain textual properties such as the frequency ofcommas, and the proportions of prepositions, conjunctions and uncommonwords. These two methods are independent because they use essentially different means for arriving at a conclusion. But having two persons instead of justone would not constitute independent methods, even if both persons were tomake the judgments without reference to each other, and in that sense workindependently.5. If numbers (or marks, or scores) are used, they are assigned after the judgmenthas been made, not the reverse. In making qualitative judgments, the final decision is never arrived at by counting things, making physical measurements, orcompounding numbers and looking at the sheer magnitude of the result.Complex learning outcomes of the type that are assessed by making direct qualitative judgments are common in a wide variety of subjects in secondary, vocational,further and higher education. These subjects include English, foreign languages,humanities, manual and pra

Formative assessment and the design of instructional systems D. ROYCE SADLER Assessment and Evaluation Research Unit, Department of Education, University of Queensland, StJ ucia, Queensland 4067, Australia Abstract. The theory of formative assessment ou

Related Documents:

May 02, 2018 · D. Program Evaluation ͟The organization has provided a description of the framework for how each program will be evaluated. The framework should include all the elements below: ͟The evaluation methods are cost-effective for the organization ͟Quantitative and qualitative data is being collected (at Basics tier, data collection must have begun)

Silat is a combative art of self-defense and survival rooted from Matay archipelago. It was traced at thé early of Langkasuka Kingdom (2nd century CE) till thé reign of Melaka (Malaysia) Sultanate era (13th century). Silat has now evolved to become part of social culture and tradition with thé appearance of a fine physical and spiritual .

̶The leading indicator of employee engagement is based on the quality of the relationship between employee and supervisor Empower your managers! ̶Help them understand the impact on the organization ̶Share important changes, plan options, tasks, and deadlines ̶Provide key messages and talking points ̶Prepare them to answer employee questions

Dr. Sunita Bharatwal** Dr. Pawan Garga*** Abstract Customer satisfaction is derived from thè functionalities and values, a product or Service can provide. The current study aims to segregate thè dimensions of ordine Service quality and gather insights on its impact on web shopping. The trends of purchases have

On an exceptional basis, Member States may request UNESCO to provide thé candidates with access to thé platform so they can complète thé form by themselves. Thèse requests must be addressed to esd rize unesco. or by 15 A ril 2021 UNESCO will provide thé nomineewith accessto thé platform via their émail address.

--1-- Embedded Formative Assessment By Dylan Wiliam _ Study Guide This study guide is a companion to the book Embedded Formative Assessment by Dylan Wiliam. Embedded Formative Assessment outlines what formative assessment is, what it is not, and presents the five key strategies of formative assessment for teachers to incorporate into their

Performance Assessment Score Feedback Formative 1 Date . Formative 2 Date : Formative 3 Date . Formative 4 Date : Formative 5 Date . Formative 6 Date : Summative Date Implements learning activities aligned to chosen standards and incorporates embedded formative assessment. Clearly conveys objectives in student-friendly language so that the

assessment. In addition, several other educational assessment terms are defined: diagnostic assessment, curriculum-embedded assessment, universal screening assessment, and progress-monitoring assessment. I. FORMATIVE ASSESSMENT . The FAST SCASS definition of formative assessment developed in 2006 is “Formative assessment is a process used