1、INTRO 2 IRT,Tim Croudace,2,Descriptions of IRT,“IRT refers to a set of mathematical models that describe, in probabilistic terms, the relationship between a persons response to a survey question/test item and his or her level of the latent variable being measured by the scale” Fayers and Hays p55 As
2、sessing Quality of Life in Clinical Trials. Oxford Univ Press: Chapter on Applying IRT for evaluating questionnaire item and scale properties.,This latent variable is usually a hypothetical construct trait/domain or ability which is postulated to exist but cannot be measured by a single observable v
3、ariable/item.Instead it is indirectly measured by using multiple items or questions in a multi-item test/scale.,3,The data: 0000 1000 0001 0010 1001 1010 0011 1011 0100 1100 0101 0110 1101 1110 0111 1111,n 477 63 12 150 7 32 11 4 231 94 13 378 12 169 45 31,Sources of knowledge : q1 radio q2 newspape
4、rs q3 reading q4 lectures A single latent dimension Z Normal (mean 0; std dev =1 ) so Var= 1 too!,logit hi = h 0 + h 1zi,h0 10,h0 40,h1 21,4,Simple sum scores (n=1729 new individual values),0 0 0 0 n Total score0 0 0 0 477 0 477 zeros added to data set (new column) 1 0 0 0 63 10 0 0 1 12 10 0 1 0 15
5、0 11 0 0 1 7 21 0 1 0 32 20 0 1 1 11 21 0 1 1 4 30 1 0 0 231 11 1 0 0 94 20 1 0 1 13 20 1 1 0 378 21 1 0 1 12 31 1 1 0 169 30 1 1 1 45 31 1 1 1 31 4,5,Binary Factor / Latent Trait Analysis Results: logit-probit model,F,U1,U2,U3,Up,. . .,Warming up to this sort of thing soon .,2 items with similar th
6、resholds and similar slopes 3 items with different thresholds but similar slopes,6,The key concept latent factor models for constructs underpinning multiple binary (0/1) responses, based on innovations in educational testing and psychometric statistics 50 years old Same models used in educational te
7、sting with correct incorrect answers can be applied to symptom present / absent data (both binary) Extensions to ordinal outcomes (Likert scales) Flexibility in parametric form available Semi- and non-parametric approaches too,7,Binary IRT : The A B C D of it,8,Linear vs non-linear regression of res
8、ponse probability on latent variable,x-axis score on latent construct being measured,y-axis prob of response (“Yes”) on a simple binary (Yes/No) scale item,Adapted without permission from a slide by Prof H Goldstein,9,Ordinal IRT : The A B C D of GRM,10,IRT models,Simplest case of a latent trait ana
9、lysis Manifest variables are binary: only 2 distinctions are made these take 0/1 values Yes / No Right / Wrong Symptom present / absent Agree / disagree distinctions for attitudes more likely to be ordinal 2 response categories see next lecture IRT 2 on Friday For scoring of individuals (not paramet
10、er estimation for items) it is frequently assumed that the UNOBSERVED (latent) variable is not only continuous but normally distributed or the prior distn is normal but the posterior distn may not be,11,IRT for binary data,The most commonly used model was developed byLord-Birnbaum model (Lord, 1952;
11、 Birnbaum, )2-parameter logistic a.k.a. the logit-probit model; Bartholomew (1987)The model is essentially a non-linear single factor model When applied to binary data, the traditional linear factor model is only an approximation to the appropriate item response model sometimes satisfactory, but som
12、etimes very poor (we can guess when)Some accounts of Item Response Theory make it sound like a revolutionary & very modern development this is not true! It should not replace or displace classical concepts, and has suffered from being presented and taught as disconnected from these A unified treatme
13、nt can be given that builds one from the other (McDonald, 1999) but this would be a one term course on its own,12,What IRT does,IRT models provide a clear statement picture! of the performance of each item in the scale/testand how the scale/test functions, overall,for measuring the construct of inte
14、rest in the study populationThe objective is to model each item by estimating the properties describing item performance characteristicshence Item Characteristic Curve or Symptom Response Function.,13,Very bland (but simple) example,Lombard and Doering (1947) data Questions on cancer knowledge with
15、four addressing the source of the information Fitting a latent variable model might be proposed as a way of constructing a measure of how well informed an individual is about cancer A second stage might relate knowledge about cancer to knowledge about other diseases or general knowlege,14,Very bland
16、 (but simple) example,Lombard and Doering (1947) data Questions on cancer knowledge with four addressing the source of the information radio newspapers (solid) reading (books?) lectures 2 to the power 4 i.e. 16 possible response patterns from 0000 to 1111,15,Data,Lombard and Doering (1947) data 2 to
17、 the power 4 i.e. 16 possible response patterns (all occur) with more items this is neither likely nor necessary frequency shown for0000 to 1111 frequency is the number with each item response pattern,0000 1000 0001 0010 1001 1010 0011 1011 0100 1100 0101 0110 1101 1110 0111 1111,n 477 63 12 150 7 3
18、2 11 4 231 94 13 378 12 169 45 31,16,The data: 0000 1000 0001 0010 1001 1010 0011 1011 0100 1100 0101 0110 1101 1110 0111 1111,n 477 63 12 150 7 32 11 4 231 94 13 378 12 169 45 31,Sources of knowledge : q1 radio q2 newspapers q3 reading q4 lectures A single latent dimension Z Normal (mean 0; std dev
19、 =1 ) so Var= 1 too!,logit hi = h 0 + h 1zi,h0 10,h0 40,h1 21,17,Basic objectives of modelling,When multiple items are applied in a test / survey can use latent variable modelling to explore inter-relationships among observed responses determine whether the inter-relationships can be explained by a
20、small number of factors THEN , to assign a SCORE to each individual each on the basis of their responses Basically to rank order (arrange) or quantify (score) survey participants, test takers, individuals who have been studied CAN BE THOUGHT OF AS ADDING A NEW SCORE TO YOUR DATASET FOR EACH INDIVIDU
21、AL this analysis will also help you to understand the properties of each item, as a measure of the target construct (what properties?) GRAPHICAL REPRESENTATION IS BEST,18,Item Properties that we are interested in are captured graphically by so called Item Characteristics Curves (ICCs),19,Item/Sympto
22、m & Test/Scale INFORMATION is useful and necessary to examine score precision (the accuracy of estimated scores) we are interested in this for different individuals (individuals with different score values) by inspecting the amount of information about each score level, across the score range (range
23、 of estimated scores) we are identifying variations in measurement precision (reliable of individuals estimated scores) this enables us to make statements about the effective measurement range of an instrument in an population,20,e.g. Item Characteristics Curves,21,Item information functions - add t
24、hem together to get TIF,beware y axis scaling : not all the same,22,Test Information Function,23,Item information functions - shown alongside their ICCs,beware y axis scaling : not all the same,0.140.40,3.00.14,24,1 / Sqrt Information = s.e.m,Standard error of measuremenr is not constant (U-shaped,
25、not symmetrical),25,Approximate reliability,Reliability= 1 1/Info= 1 1 / 1 / (s.e.m 2) s.e.m. = standard error of measurement,26,Back to the Data,Lombard and Doering (1947) data 2 to the power 4 i.e. 16 possible response patterns (all occur) with more items this is neither likely nor necessary frequ
26、ency shown for0000 to 1111 frequency is the number with each item response pattern,0000 1000 0001 0010 1001 1010 0011 1011 0100 1100 0101 0110 1101 1110 0111 1111,n 477 63 12 150 7 32 11 4 231 94 13 378 12 169 45 31,What would be the easiest thing to do with these numbers; to score the patterns?,27,
27、Answer ,0000 1000 0001 0010 1001 1010 0011 1011 0100 1100 0101 0110 1101 1110 0111 1111,What would be the easiest thing to do with these numbers; to score the patterns?,Simply add them up,28,Simple sum scores (n=1729 new individual values),0 0 0 0 n Total score0 0 0 0 477 0 477 zeros added to data s
28、et (new column) 1 0 0 0 63 10 0 0 1 12 10 0 1 0 150 11 0 0 1 7 21 0 1 0 32 20 0 1 1 11 21 0 1 1 4 30 1 0 0 231 11 1 0 0 94 20 1 0 1 13 20 1 1 0 378 21 1 0 1 12 31 1 1 0 169 30 1 1 1 45 31 1 1 1 31 4,29,Weighted by discriminating power scores,0 0 0 0 n Total Factor Component weighted by alpha h 1scor
29、e score score0 0 0 0 477 0 -0.98 0 = 0 1 0 0 0 63 1 -0.68 0.72 0.720 0 0 1 12 1 -0.67 0.77 0.770 0 1 0 150 1 -0.46 1.34 1.341 0 0 1 7 2 -0.41 0.72+ 0.77 1.481 0 1 0 32 2 -0.23 0.72 +1.34 2.060 0 1 1 11 2 -0.22 1.34+ 0.77 2.101 0 1 1 4 3 0.0 0.72+ 1.34+ 0.77 2.820 1 0 0 231 1 0.16 3.40 3.401 1 0 0 94
30、 2 0.42 0.72+3.40 4.120 1 0 1 13 2 0.43 3.40+ 0.77 4.160 1 1 0 378 2 0.66 3.40+ 1.34 4.741 1 0 1 12 3 0.72 0.72+ 3.40+ 0.77 4.881 1 1 0 169 3 0.99 0.72+ 3.40+1.34 5.460 1 1 1 45 3 1.02 3.40+1.34+ 0.77 5.501 1 1 1 31 4 1.41 0.72+3.40+1.34+0.77 6.22,0.72 3.40 1.34 0.77,30,The data: 0000 1000 0001 0010
31、 1001 1010 0011 1011 0100 1100 0101 0110 1101 1110 0111 1111,n 477 63 12 150 7 32 11 4 231 94 13 378 12 169 45 31,Sources of knowledge : q1 radio q2 newspapers q3 reading q4 lectures A single latent dimension Z Normal (mean 0; std dev =1 ) so Var= 1 too!,logit hi = h 0 + h 1zi,h0 10,h0 40,h1 21,31,W
32、eighted by discriminating power scores,0 0 0 0 n Total Factor Component weighted by alpha h 1score score score0 0 0 0 477 0 -0.98 0 = 0 1 0 0 0 63 1 -0.68 0.72 0.720 0 0 1 12 1 -0.67 0.77 0.770 0 1 0 150 1 -0.46 1.34 1.341 0 0 1 7 2 -0.41 0.72+ 0.77 1.481 0 1 0 32 2 -0.23 0.72 +1.34 2.060 0 1 1 11 2
33、 -0.22 1.34+ 0.77 2.101 0 1 1 4 3 0.0 0.72+ 1.34+ 0.77 2.820 1 0 0 231 1 0.16 3.40 3.401 1 0 0 94 2 0.42 0.72+3.40 4.120 1 0 1 13 2 0.43 3.40+ 0.77 4.160 1 1 0 378 2 0.66 3.40+ 1.34 4.741 1 0 1 12 3 0.72 0.72+ 3.40+ 0.77 4.881 1 1 0 169 3 0.99 0.72+ 3.40+1.34 5.460 1 1 1 45 3 1.02 3.40+1.34+ 0.77 5.
34、501 1 1 1 31 4 1.41 0.72+3.40+1.34+0.77 6.22,0.72 3.40 1.34 0.77,32,Something a little more subtle,Simple sum scores assumes all item responses equally useful at defining the construct may not be the case If items are differentially important different discriminating power with respect to what we ar
35、e measuring, we might want to take that into accounf How? Weighted sum scores Component scores weighted by what? weighted by the estimates (factor loading type parameter) from a latent variable model latent trait model with a single latent factor,33,Weighted scores,Weights alpha h 1 parametersQ1 0.7
36、2 Q2 3.40 Q3 1.34 Q4 0.77 These numbers related to the slopes of the Ss,34,Estimated component scores (weighted values),0 0 0 0 n Total Factor Component weighted by alpha h 1score score score0 0 0 0 477 0 -0.98 0 = 0 1 0 0 0 63 1 -0.68 0.72 0.720 0 0 1 12 1 -0.67 0.77 0.770 0 1 0 150 1 -0.46 1.34 1.
37、341 0 0 1 7 2 -0.41 0.72+ 0.77 1.481 0 1 0 32 2 -0.23 0.72 +1.34 2.060 0 1 1 11 2 -0.22 1.34+ 0.77 2.101 0 1 1 4 3 0.0 0.72+ 1.34+ 0.77 2.820 1 0 0 231 1 0.16 3.40 3.401 1 0 0 94 2 0.42 0.72+3.40 4.120 1 0 1 13 2 0.43 3.40+ 0.77 4.160 1 1 0 378 2 0.66 3.40+ 1.34 4.741 1 0 1 12 3 0.72 0.72+ 3.40+ 0.7
38、7 4.881 1 1 0 169 3 0.99 0.72+ 3.40+1.34 5.460 1 1 1 45 3 1.02 3.40+1.34+ 0.77 5.501 1 1 1 31 4 1.41 0.72+3.40+1.34+0.77 6.22,? 0.72 3.40 1.34 0.77,35,But the bees knees are,The estimated factor scores from the model Not just some simple sum or unweighted or weighted items Takes into account the pro
39、posed score distribution (gaussian normal) and the estimated model parameters (but not the fact that they are estimates rather than known values) and more besides (when missing data are present), the estimated factor scores,36,A graphical and interactive introduction to IRT,Play with the key feature
40、s of IRT modelswww2.uni-jena.de/svw/metheval/irt/VisualIRT.pdf,37,a b (see) 2 parameter IRT model,VisualIRT (pdf) Page,VisualIRT (pdf) Page,Individuals score = new ruler valueAny hypothetical latent variable factor/trait continuum expressed in a z-score metric (gaussian normal (0,1) Item propertiess
41、lope = item discriminationlocation = item commonality difficulty/prevalance/ severity,38,IRT Resources,A visual guide to Item Response Theory I. Partchev Introduction to RIT, R.Baker http / An introduction to modern measurement theory B Reeve Chapter in Fayers and Machin QoL book P Fayers ABC of Ite
42、m Response Theory H Goldstein Moustaki papers, and online slides (FA at 100) LSE books (Bartholomew, Knott, Moustaki, Steele),39,Applications of Item Response Theory to Practical Testing Problems Frederick M. Lord. 274 pages. 1980. Applying The Rasch Model Trevor G. Bond and Christine M. Fox 255 pag
43、es. 2001. Constructing Measures: An Item Response Modeling Approach Mark Wilson. 248 pages. 2005. The EM Algorithm and Related Statistical Models Michiko Watanabe and Kazunori Yamaguchi. 250 pages. 2004. Essays on Item Response Theory Edited by Anne Boomsma, Marijtje A.J. van Duijn, Tom A.A. Snijder
44、s. 438 pages. 2001. Explanatory Item Response Models: A Generalized Linear and Nonlinear Approach Edited by Paul De Boeck and Mark Wilson. 382 pages. 2004. Fundamentals of Item Response Theory Ronald K. Hambleton, H. Swaminathan, and H. Jane Rogers. 184 pages. 1991. Handbook of Modern Item Response
45、Theory Edited by Wim J. van der Linden and Ronald K. Hambleton. 510 pages. 1997. Introduction to Nonparametric Item Response Theory Klaas Sijtsma and Ivo W. Molenaar. 168 pages. 2002. Item Response Theory Mathilda Du Toit. 906 pages. 2003. Item Response Theory for Psychologists Susan E. Embretson an
46、d Steven P. Reise. 376 pages. 2000. Item Response Theory: Parameter Estimation Techniques (Second Edition, Revised and Expanded w/CD) Frank Baker and Seock-Ho Kim. 495 pages. 2004. Item Response Theory: Principles and Applications Ronald K. Hambleton and Hariharan Swaminathan. 332 pages. 1984. Logit
47、 and Probit: Ordered and Multinomial Models Vani K. Borooah. 96 pages. 2002. Markov Chain Monte Carlo in Practice W.R. Gilks, Sylvia Richardson, and D.J. Spiegelhalter. 512 pages. 1995. Monte Carlo Statistical Methods Christian P. Robert and George Casella. 645 pages. 2004. Polytomous Item Response
48、Theory Models Remo Ostini and Michael L. Nering. 120 pages. 2005. Rasch Models for Measurement David Andrich. 96 pages. 1988. Rasch Models: Foundations, Recent Developments, and Applications Edited by Gerhard H. Fischer and Ivo W. Molenaar. 436 pages. 1995. The Sage Handbook of Quantitative Methodology for the Social Sciences Edited by David Kaplan. 511 pages. 2004. Test Equating, Scaling, and Linking: Methods and Practices (Second Edition) Michael J. Kolen and Robert L. Brennan. 548 pages. 2004.,