Item Response Theory And Computerized Adaptive Testing

3y ago
37 Views
2 Downloads
3.54 MB
86 Pages
Last View : 1d ago
Last Download : 3m ago
Upload by : Ciara Libby
Transcription

Item Response Theory andComputerized Adaptive TestingRichardC.Gershon,PhDDepartment of Medical Social SciencesFeinberg School of MedicineNorthwestern Universitygershon@northwestern.edu

Outline Item Response Theory versus Classical Test Theory Uses of IRT Item Banking Short Forms Computerized Adaptive Tests

Requirements for Measurement Measurement requires the concept of anunderlying trait that can be expressed in terms ofmore or less Test items are the operational definition of theunderlying trait Test items can be ordered from easy to hard Test takers can be ordered from less able to more able

IRT Modeling is Latent TraitModeling A latent trait is an unobservable latent dimension that isthought to give rise to a set of observed item responses.I am too tired to do errandsFalseTrueEnergeticSevereFatigue

Latent Traits (cont.) These latent traits (constructs, variables, θ) aremeasured on a continuum of severity.I am too tired to do errands?EnergeticFalseTrueFatigueSevere

Advantages of Using IRT Equal Interval Measure Test-takers and items are represented on thesame scale Item calibrations are independent of the testtakers used for calibration Candidate ability estimates are independent ofthe particular set of items used for estimation Measurement precision is estimated for eachperson and each item

Test-takers and Items areRepresented on the Same Scale Item Difficulty Severity Measure Theta Item Calibration Location Person Ability Measure Theta PersonCalibration Location

050100Physical Functioning Item tem10Item11Are you able to get in and out of bed?Are you able to walk a block on flat ground?Are you able to run five miles?Item12Item13Item14Item15Item16

More Basic Terms Discrimination the degree to which an itemdiscriminates person ability Item Information the area where an itemdiscriminates Test Information the area where the testdiscriminates

Item “Parameters” IRT statistics about an item Primary: Item Difficulty Often: Item Discrimination Sometimes: Guessing Lots of other “ugly looking numbers”

The Item Characteristic Curve

Differential Item Functioning (DIF) Does an item have different item parameters fordifferent subgroups? Gender Race Age Disease

The Three Main IRT Models Rasch model one parameter logistic model(1PL) Two parameter logistic model (2PL) Three parameter logistic model (3PL)

How to choose an appropriate IRT ModelORMy religion is better than your religion!

WARNING!You are about to see mathematical formulas!

One Parameter Logistic ModelP1,0e (ability - difficulty) 1 e (ability - difficulty)When the difficulty of a given item exactly matches theExaminee’s ability level, then the person has 50% chanceof answering that item correctly:P1,0e (0) 1 e (0) 12 .50

One Parameter Logistic Model Only option for small sample sizes Often the real model underlying a test labeled asthree parameter Less costly “The simple solution is always the best”

Two Parameter Logistic ModelP1,0e a (ability - b) Two parametersa Discriminationb Item Difficulty1 e a (ability - b)

Two Parameter Examplesa .5,b .5a 1.5,b .5a 2.5,b .5

Three Parameter Logistic ModelP1,0 c (1-c)Three parametersa Discriminationb Item Difficultyc Guessinga(abilityb)e1 e a (ability - b)

Three Parameter Logistic Model(3PL) Requires a large sample size Significant research demonstrating that theoretically3PL is better, but practically has little advantage over 1PL “Most accepted theoretical model”

Three Parameter Examplesa 1.5,b .5,c .1a 2.5,b .5,c .25

Polytomous ModelsOne Parameter Rating Scale Model Partial Credit ModelTwo Parameter Graded Response Model Generalized Partial Credit Model

Multi-dimensional ModelsThere are also IRT models which consider morethan one unidimensional trait at a time

How does IRT differ from conventional testtheory?

Classical Test Theory An individual takes an assessment Their total score on that assessment is used forcomparison purposes High Score – The person is higher on the trait Low Score-The person is lower on the trait

Item Response Theory Each individual item can be used for comparisonpurposes Person endorses better rating on “hard items”The person is higher on the trait Person endorses worse rating on “easy items” The person is lower on the trait Items that measure the same construct can beaggregated into longer assessments

ReliabilityCTTIRT Reliability is based upon the Reliability is calculated fortotal test. Regardless of patient“ability”, reliability is thesame.each patient “ability” andvaries across the continuum. Typically, there is betterreliability in the middle of thedistribution.

ValidityCTT Validity is based upon thetotal test. Typically, validity would needto be re-assessed if theinstrument is modified in anyway.IRT Validity is assessedfor the entire item bank. Subsets of items (full length tests,short forms and CAT) all inherit thevalidity assessed for theoriginal item bank.

How Scores Depend on the Difficultyof Test ItemsVeryEasyTestPerson18ExpectedScore 8VeryHard TestPerson1ExpectedScore 0PersonMediumTest18ExpectedScore 5Reprinted with permission from: Wright, B.D. & Stone, M. (1979) Best test design, Chicago: MESA Press, p. 5.8

Raw Scores vs. IRT MeasuresIRT has Equal Interval Measurement4 Item TestRaw:12341.502.50Logit Measures:1.001.25

I Have a Lack of EnergyTraditional Test Theory4 Not at All3 A Little Bit2 Somewhat1 Quite a Bit0 Very Much

I Have a Lack of EnergyTraditional Test Theory4 Not at All3 A Little Bit2 Somewhat1 Quite a BitItem Response Theory0 Very Much

The IRT “Reality” of a 10 PointRating-Scale Item012345678No Pain910Worst Pain012 34 5 6 7 8910

I have a lack of energy1This is an Item Characteristic Curve(ICC) for a rating scale item (eachoption has its own curve)0.90.80.60.50.40.30.20.1Trait Measure4 Not at All 3 A Little Bit 2 Somewhat 1 Quite a Bit 0 Very 3027242118151296300Probability Curve0.7

I have a lack of energy10.90.80.60.50.40.30.20.1Trait Measure4 Not at All 3 A Little Bit 2 Somewhat 1 Quite a Bit 0 Very 3027242118151296300Probability Curve0.7

I have a lack of energy10.90.80.60.50.40.30.20.1Trait Measure4 Not at All 3 A Little Bit 2 Somewhat 1 Quite a Bit 0 Very 3027242118151296300Probability Curve0.7

I have a lack of energy10.90.80.60.50.40.30.20.1Trait Measure4 Not at All 3 A Little Bit 2 Somewhat 1 Quite a Bit 0 Very 30272421181512963000Probability Curve0.7

I have a lack of energy10.90.80.60.50.40.30.20.1Trait Measure4 Not at All 3 A Little Bit 2 Somewhat 1 Quite a Bit 0 Very 30272421181512963000Probability Curve0.7

I have a lack of energy10.90.80.60.50.40.30.20.1Trait Measure4 Not at All 3 A Little Bit 2 Somewhat 1 Quite a Bit 0 Very 30272421181512963000Probability Curve0.7

I have a lack of energy10.90.80.60.50.40.30.20.1Trait Measure4 Not at All 3 A Little Bit 2 Somewhat 1 Quite a Bit 0 Very 30272421181512963000Probability Curve0.7

IRT Polytomous ResponsesI have been too tired to feel happy.Probability of Response1.0None ofthe timeAll of thetime0.8A little ofthe time0.6Some ofthe timeMost ofthe time0.40.20.080Energetic706050Fatigue403020Severe Fatigue

IRT Polytomous ResponsesI have felt energeticProbability of Response1.00.8Most ofAll of the the timetime0.6None ofthe timeSome ofthe timeA little ofthe time0.40.20.080Energetic706050Fatigue403020Severe Fatigue

IRT Polytomous ResponsesI have been too tired to readProbability of Response1.00.8None ofthe time0.6A little ofthe time0.4All of thetimeMost ofthe timeSome ofthe time0.20.080Energetic706050403020Severe Fatigue

Item Banking

Calibrated Item Banks can be used to CreateNumerous Instrument TypesShort Forms 5-7 Items in eachHRQL Area Constructed to coverfull range of traitOR Multiple formsconstructed to onlycover a narrow rangeof trait (e.g., high,medium, or low)Computerized Adaptive Testing (CAT) Custom individualized assessment Suitable for clinical use Accuracy level chosen by m �Item8—Item6—Item4—Item2Custom Item m38BrainTumor—Item 40—Item34—Item32—Item 36—Item 34—Item 32—Item26—Item 28—Item 26—Item 22—Item 22—Item 32—Item 24—Item 18—Item 16—Item 16—Item 14—Item 10—Item 8—Item 2—Item 8—Item 2 3 Diseases 3 Trials 3 Unique InstrumentsSource: Expert Rev. of Pharmoeconomics Outcomes Res. (2003) Each based on content interestof individual researchers

Short Forms5-7 Items in eachHRQL Area Constructed to coverfull range of traitOR Multiple formsconstructed to onlycover a narrow rangeof trait (e.g., high,medium, or low)Source: Expert Rev. of Pharmoeconomics Outcomes Res. (2003)EmotionalDistressPainPhysicalFunctionItem �Item8—Item6—Item4—Item2

Physical FunctionForm CPhysical FunctionForm BPhysical FunctionForm A050100Physical Functioning Item BankItem Item Item Item Item Item Item Item Item Item Item Item Item Item Item Item12345678910111213141516

Computerized Adaptive tem Bank Custom individualized assessment Suitable for clinical use Accuracy level chosen by researcherSource: Expert Rev. of Pharmoeconomics Outcomes Res. —Item8—Item6—Item4—Item2

Custom Item SelectionEmotionalDistressPainPhysicalFunctionItem �Item8—Item6—Item4—Item2Custom Item m38BrainTumor—Item 40—Item34—Item32—Item 36—Item 34—Item 32—Item26—Item 28—Item 26—Item 22—Item 22—Item 32—Item 24—Item 18—Item 16—Item 16—Item 14—Item 10—Item 8—Item 2—Item 8—Item 2 3 Diseases 3 Trials 3 Unique InstrumentsSource: Expert Rev. of Pharmoeconomics Outcomes Res. (2003) Each based on content interestof individual researchers

In Summary,Calibrated Item Banks can be used to: Create a standard static instrument Construct short forms Enable CAT Select items based on unique content interestsand formulate custom short-form or full-lengthinstruments

In every case, using a validated,pre-calibrated item bank allows any of theseinstruments to be pre-validated and producestandardized scores on the same scale

ComputerizedAdaptiveTesting

What is Computerized AdaptiveTesting? Shorter Targeting Computerized Algorithm

CAT in the Military Armed Services Vocational Aptitude Battery(ASVAB)

CAT for Certification

CAT for Licensure

CAT for College EntranceACCUPLACER OnLine

CAT for Education

LowAblePassPointHighAble? ? ? ? PASS!

LowAblePassPointHighAble? ? ? ? FAIL

Example – Binary Search Binary search

-2Grade Level 2With each successive item,Standard error decreases123456789 10 11 12 13 14Item Number15 16 17 18 1920 21 22 23 24 25

ln θ [ui ln Pi (1-ui) ln Qi

Specified # ofitemsSpecified levelof precision

Why bother?Reduce burden of respondingMake room for measuringmore domains

CAT Requirements Calibrated item bank Administration software

Test Specifications Starting rule With item which provides maximum information At cut point

Test Specifications Stopping Rule Fixed length Variable length By Total Test/Subtest Calculated Specified precision of measure Specified confidence in a pass/fail decision Maximum item count Minimum item count

Adaptive Algorithm Person ability algorithm Item selection algorithm Test difficulty Maximum jump size Content issues Item exposure control Option to not allow same items to be used duringretesting Overlapping items (items that cue other items)

339585909MLTEntry 1Ver: 10/01/01Tested:01/28/02Status:2Clear PassPItemAN ContDiff Ans Time!A-3.0-2.0-1.0S01.02.03.0 * * * S * * * 901.001.00314344244111111101oooooo oo0'290'100' 90'560'220'593'171'261' IMM1.120.941.02441011ooo1' 91'152'141.721.851.970.660.650.65 161718214502053722330UABBNCHE1.270.931.12132111 o 1'170'352'322.092.182.280.640.640.63 o0'371' 32'102.362.442.510.630.630.62 .221.281.351.271.063121211001oo 0.550.500.50 o0'271'190'152.312.152.200.500.460.46 0.460.450.450.45 ****X X * * * * * * * * XXXXXXXX **XX* * * * * ** * * * * * * * * * * * *X XXX XXXXXXXXXXXXXXX **

434843789HTEntry 1Ver: 01/01/02Tested:01/28/02Status:1Clear FailPDiff Ans ItemAN STFIX0.310.2833000'211' 9-1.39-1.420.490.49A-3.0-2.0-1.00 S1.02.03.0 * * * S * * * XXXXXXXXX * XXXX * * X** * * * * * * * * * * * * * * * * * * * * ****

411433522Entry 1PBTVer: 10/01/01Tested:01/26/02Status:1Fence SitterP-3.0AN Cont220576 SCDiff Ans -0.33 3 1 oTime0'372220304 LO-0.2421o34220935 SPH220213 SC-0.13-0.034101o 5220378 AP-0.113067220523 SC220611 LO-0.30-0.374289220928 SC220218 SPH-0.38-0.4810220975 .991'139.999.991' 30'520.460.921.221.15 .56-0.81-0.6512121010o 54** *220369 AP-0.8821o0'39-0.040.53 1617220777 SC220265 85 20263-0.98-0.88-0.80-1.0121411101oooo0' 80'320'160'52-0.22-0.15-0.28-0.220.460.460.440.43 * *SPHSCSCLO01.02.03.0 * * * S * * * Meas9.99SCLOSPHSC!-2.0 * ** ** * 23220507 SPH-0.7910o0'30-0.340.42*220037 SC220317 AP-1.00-1.054311 o0'430'11-0.28-0.230.410.41 *2627220535 SC220987 LO-0.92-1.023401 o0'510'25-0.33-0.280.400.39282930220342 SC220089 SPH220860 .290.390.380.37* * ** *31220754 SC-0.9830o0'47-0.380.363233220610 LO220347 856 SC-1.0121 1' 2-0.250.35 X XX XXX XXX XXXX * XX X2425**X*XXX X*X*XXX X X X X * * *X **XX**************************

Simulate Measure 48Item MeasSE010203040506070801GP1 – I have a lack of energy0 Very Much 1 Quite a Bit 2 Somewhat 3 A Little Bit 4 Not at All90100

Simulate Measure 48Item Meas1370SE102030405060708021GP1 – I have a lack of energy0 Very Much; 1 Quite a Bit; 2 Somewhat; 3 A Little Bit; 4 Not at All90100

Simulate Measure 1048501020304050 60708090100

Simulate Measure 1014501020 30405060708090100

Simulate Measure 91007109970102030405060708090100

Richard C. Gershon, PhD Department of Medical Social Sciences Feinberg School of Medicine Northwestern University gershon@northwestern.edu Item Response Theory and

Related Documents:

Item: Paper Item: Stapler Item: Staples Transaction: 2 CC#: 3752 5712 2501 3125 Item: Paper Item: Notebook Item: Staples Transaction: 1 CC#: 3716 0000 0010 3125 Item: Paper Item: Stapler Item: Staples Transaction: 2 CC#: 3716 0000 0010 3125 Item: Paper Item: Notebook Item: Staples Before us

rexroth a10vo & a10vso parts information view: a item # 1: rotary group item # 2: control-ass. item # 3: pump housing item # 4: end cover-ports item # 5: cradel ass. item # 6: shaft - drive item # 7: washer item # 8: adjusting disc item # 9: tappered brg item # 10: tappered brg item # 11: bearing cradle item # 12: seal - shaft

Item 4 Liquid Propellants (b) Fuels (c) Oxidizers Item 9 (c) Accelerometers Item 13 Digital Computer Item 14 A-D Converter Circut Boards Item 2 (c) Solid Rocket Motor Item 2 (c) Liquid Rocket Engine Item 2(f) SAFF Conventional HE Warhead (Not Controlled) Item 11 (c) Satellite Navigation Receiver Item 2 (d) Guidance Set Item 2 (a) Individual .

One of the computer-based testing is the Computerized Adaptive Test (CAT), which is a computer-based testing system where the items were given to the participants adapted to test the ability of the participants. Assessment methods are usually applied in CAT is Item Response Theory (IRT). IRT models are most commonly used today is the model 3 .

Now let’s go back to the example depicted in Table 1. By applying the above equation, we can give a probabilistic estimation about how likely a particular person is to answer a specific item correctly: Table 4a. Person 1 is “better” than Item 1 Item 1 Item 2 Item 3 Item 4

Comparison of classical test theory and item response theory and their applications to test development. Educational measurement: issues and practice, 38-47. 8 Hwang, D.-Y. (2002). Classical test theory and item response theory: Analytical and empirical comparisons. Annual Meeting of the Southwest Educational Research Association.

3. After Anchor (item 17) has been set in concrete, place Lifeguard Column (item 3) over anchor studs. Place ½” Flat Washers (item 9), Lock Washers (item 10) and ½” Hex Nuts(item 11) over anchor studs and tighten. Place ½” Nut Caps (item 12) over hex nuts.See FIGURE 4. 17 5 5 18 7 4. Slide Escutcheons (item 18) over frame lower tubes and place frame and

Rumki Basu, (2004) Public Administration: Concepts and Theories, Sterling Publication, Delhi. 22. Bhogale Shantaram, (2006) Lokprashasanache Siddhant aani Kaeryapadhati, Kailas Prakashan, Aurangabad. 23. Patil B. B., Public Administration (Marathi), Phadake Prakashan, Kolhapur, 2004. 8 SYLLABUS FOR TYBA POLITICAL SCIENCE (S-4) INTERNATIONAL POLITICS Course Rationale: This paper deals with .