ImageVerifierCode 换一换
格式:PPT , 页数:53 ,大小:596KB ,
资源ID:379248      下载积分:2000 积分
快捷下载
登录下载
邮箱/手机:
温馨提示:
如需开发票,请勿充值!快捷下载时,用户名和密码都是您填写的邮箱或者手机号,方便查询和重复下载(系统自动生成)。
如填写123,账号就是123,密码也是123。
特别说明:
请自助下载,系统不会自动发送文件的哦; 如果您已付费,想二次下载,请登录后访问:我的下载记录
支付方式: 支付宝扫码支付 微信扫码支付   
注意:如需开发票,请勿充值!
验证码:   换一换

加入VIP,免费下载
 

温馨提示:由于个人手机设置不同,如果发现不能下载,请复制以下地址【http://www.mydoc123.com/d-379248.html】到电脑端继续下载(重复下载不扣费)。

已注册用户请登录:
账号:
密码:
验证码:   换一换
  忘记密码?
三方登录: 微信登录  

下载须知

1: 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。
2: 试题试卷类文档,如果标题没有明确说明有答案则都视为没有答案,请知晓。
3: 文件的所有权益归上传用户所有。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 本站仅提供交流平台,并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

版权提示 | 免责声明

本文(Cal State NorthridgePsy 427Andrew Ainsworth, PhD.ppt)为本站会员(王申宇)主动上传,麦多课文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知麦多课文库(发送邮件至master@mydoc123.com或直接QQ联系客服),我们立即给予删除!

Cal State NorthridgePsy 427Andrew Ainsworth, PhD.ppt

1、Classical Test Theory and Reliability,Cal State Northridge Psy 427 Andrew Ainsworth, PhD,Basics of Classical Test Theory,Theory and Assumptions Types of Reliability Example,Classical Test Theory,Classical Test Theory (CTT) often called the “true score model” Called classic relative to Item Response

2、Theory (IRT) which is a more modern approach CTT describes a set of psychometric procedures used to test items and scales reliability, difficulty, discrimination, etc.,Classical Test Theory,CTT analyses are the easiest and most widely used form of analyses. The statistics can be computed by readily

3、available statistical packages (or even by hand) CTT Analyses are performed on the test as a whole rather than on the item and although item statistics can be generated, they apply only to that group of students on that collection of items,Classical Test Theory,Assumes that every person has a true s

4、core on an item or a scale if we can only measure it directly without error CTT analyses assumes that a persons test score is comprised of their “true” score plus some measurement error. This is the common true score model,Classical Test Theory,Based on the expected values of each component for each

5、 person we can see thatE and X are random variables, t is constant However this is theoretical and not done at the individual level.,Classical Test Theory,If we assume that people are randomly selected then t becomes a random variable as well and we get:Therefore, in CTT we assume that the error : I

6、s normally distributed Uncorrelated with true score Has a mean of Zero,T,X=T+E,True Scores,Measurement error around a T can be large or small,T1,T2,T3,Domain Sampling Theory,Another Central Component of CTT Another way of thinking about populations and samples Domain - Population or universe of all

7、possible items measuring a single concept or trait (theoretically infinite) Test a sample of items from that universe,Domain Sampling Theory,A persons true score would be obtained by having them respond to all items in the “universe” of items We only see responses to the sample of items on the test

8、So, reliability is the proportion of variance in the “universe” explained by the test variance,Domain Sampling Theory,A universe is made up of a (possibly infinitely) large number of items So, as tests get longer they represent the domain better, therefore longer tests should have higher reliability

9、 Also, if we take multiple random samples from the population we can have a distribution of sample scores that represent the population,Domain Sampling Theory,Each random sample from the universe would be “randomly parallel” to each other Unbiased estimate of reliability= correlation between test an

10、d true score= average correlation between the test and all other randomly parallel tests,Classical Test Theory Reliability,Reliability is theoretically the correlation between a test-score and the true score, squared Essentially the proportion of X that is TThis cant be measured directly so we use o

11、ther methods to estimate,CTT: Reliability Index,Reliability can be viewed as a measure of consistency or how well as test “holds together” Reliability is measured on a scale of 0-1. The greater the number the higher the reliability.,CTT: Reliability Index,The approach to estimating reliability depen

12、ds on Estimation of “true” score Source of measurement error Types of reliability Test-retest Parallel Forms Split-half Internal Consistency,CTT: Test-Retest Reliability,Evaluates the error associated with administering a test at two different times. Time Sampling Error How-To: Give test at Time 1 G

13、ive SAME TEST at Time 2 Calculate r for the two scores Easy to do; one test does it all.,CTT: Test-Retest Reliability,Assume 2 administrations X1 and X2The correlation between the 2 administrations is the reliability,CTT: Test-Retest Reliability,Sources of error random fluctuations in performance un

14、controlled testing conditions extreme changes in weather sudden noises / chronic noise other distractions internal factors illness, fatigue, emotional strain, worry recent experiences,CTT: Test-Retest Reliability,Generally used to evaluate constant traits. Intelligence, personality Not appropriate f

15、or qualities that change rapidly over time. Mood, hunger Problem: Carryover Effects Exposure to the test at time #1 influences scores on the test at time #2 Only a problem when the effects are random. If everybody goes up 5pts, you still have the same variability,CTT: Test-Retest Reliability,Practic

16、e effects Type of carryover effect Some skills improve with practice Manual dexterity, ingenuity or creativity Practice effects may not benefit everybody in the same way. Carryover & Practice effects more of a problem with short inter-test intervals (ITI). But, longer ITIs have other problems develo

17、pmental change, maturation, exposure to historical events,CTT: Parallel Forms Reliability,Evaluates the error associated with selecting a particular set of items. Item Sampling Error How To: Develop a large pool of items (i.e. Domain) of varying difficulty. Choose equal distributions of difficult /

18、easy items to produce multiple forms of the same test. Give both forms close in time. Calculate r for the two administrations.,CTT: Parallel Forms Reliability,Also Known As: Alternative Forms or Equivalent Forms Can give parallel forms at different points in time; produces error estimates of time an

19、d item sampling. One of the most rigorous assessments of reliability currently in use. Infrequently used in practice too expensive to develop two tests.,CTT: Parallel Forms Reliability,Assume 2 parallel tests X and XThe correlation between the 2 parallel forms is the reliability,CTT: Split Half Reli

20、ability,What if we treat halves of one test as parallel forms? (Single test as whole domain) Thats what a split-half reliability does This is testing for Internal Consistency Scores on one half of a test are correlated with scores on the second half of a test. Big question: “How to split?” First hal

21、f vs. last half Odd vs Even Create item groups called testlets,CTT: Split Half Reliability,How to: Compute scores for two halves of single test, calculate r. Problem: Considering the domain sampling theory whats wrong with this approach? A 20 item test cut in half, is 2 10-item tests, what does that

22、 do to the reliability? If only we could correct for that,Spearman Brown Formula,Estimates the reliability for the entire test based on the split-half Can also be used to estimate the affect changing the number of items on a test has on the reliability,Where r* is the estimated reliability, r is the

23、 correlation between the halves, j is the new length proportional to the old length,Spearman Brown Formula,For a split-half it would beSince the full length of the test is twice the length of each half,Spearman Brown Formula,Example 1: a 30 item test with a split half reliability of .65The .79 is a

24、much better reliability than the .65,Spearman Brown Formula,Example 2: a 30 item test with a test re-test reliability of .65 is lengthened to 90 itemsExample 3: a 30 item test with a test re-test reliability of .65 is cut to 15 items,Detour 1: Variance Sum Law,Often multiple items are combined in or

25、der to create a composite score The variance of the composite is a combination of the variances and covariances of the items creating it General Variance Sum Law states that if X and Y are random variables:,Detour 1: Variance Sum Law,Given multiple variables we can create a variance/covariance matri

26、x For 3 items:,Detour 1: Variance Sum Law,Example Variables X, Y and Z Covariance Matrix:By the variance sum law the composite variance would be:,Detour 1: Variance Sum Law,By the variance sum law the composite variance would be:,CTT: Internal Consistency Reliability,If items are measuring the same

27、construct they should elicit similar if not identical responses Coefficient OR Cronbachs Alpha is a widely used measure of internal consistency for continuous data Knowing the a composite is a sum of the variances and covariances of a measure we can assess consistency by how much covariance exists b

28、etween the items relative to the total variance,CTT: Internal Consistency Reliability,Coefficient Alpha is defined as:is the composite variance (if items were summed)is covariance between the ith and jth items where i is not equal to j k is the number of items,CTT: Internal Consistency Reliability,U

29、sing the same continuous items X, Y and Z The covariance matrix is:The total variance is 254.41 The sum of all the covariances is 152.03,CTT: Internal Consistency Reliability,Coefficient Alpha can also be defined as:is the composite variance (if items were summed)is variance for each item k is the n

30、umber of items,CTT: Internal Consistency Reliability,Using the same continuous items X, Y and Z The covariance matrix is:The total variance is 254.41 The sum of all the variances is 102.38,CTT: Internal Consistency Reliability,From SPSS* Method 1 (space saver) will be used for this analysis *R E L I

31、 A B I L I T Y A N A L Y S I S - S C A L E (A L P H A)Reliability CoefficientsN of Cases = 100.0 N of Items = 3Alpha = .8964,CTT: Internal Consistency Reliability,Coefficient Alpha is considered a lower-bound estimate of the reliability of continuous items It was developed by Cronbach in the 50s but

32、 is based on an earlier formula by Kuder and Richardson in the 30s that tackled internal consistency for dichotomous (yes/no, right/wrong) items,Detour 2: Dichotomous Items,If Y is a dichotomous item: P = proportion of successes OR items answer correctly Q = proportion of failures OR items answer in

33、correctly= P, observed proportion of successes= PQ,CTT: Internal Consistency Reliability,Kuder and Richardson developed the KR20 that is defined asWhere pq is the variance for each dichotomous item The KR21 is a quick and dirty estimate of the KR20,CTT: Reliability of Observations,What if youre not

34、using a test but instead observing individuals behaviors as a psychological assessment tool? How can we tell if the judges (assessors) are reliable?,CTT: Reliability of Observations,Typically a set of criteria are established for judging the behavior and the judge is trained on the criteria Then to

35、establish the reliability of both the set of criteria and the judge, multiple judges rate the same series of behaviors The correlation between the judges is the typical measure of reliability But, couldnt they agree by accident? Especially on dichotomous or ordinal scales?,CTT: Reliability of Observ

36、ations,Kappa is a measure of inter-rater reliability that controls for chance agreement Values range from -1 (less agreement than expected by chance) to +1 (perfect agreement) +.75 “excellent” .40 - .75 “fair to good” Below .40 “poor”,Standard Error of Measurement,So far weve talked about the standa

37、rd error of measurement as the error associated with trying to estimate a true score from a specific test This error can come from many sources We can calculate its size by: s is the standard deviation; r is reliability,Standard Error of Measurement,Using the same continuous items X, Y and Z The tot

38、al variance is 254.41 s = SQRT(254.41) = 15.95 = .8964,CTT: The Prophecy Formula,How much reliability do we want? Typically we want values above .80 What if we dont have them? The Spearman-Brown can be algebraically manipulated to achievej = # of tests at the current length, rd = desired reliability

39、 ro = observed reliability,CTT: The Prophecy Formula,Using the same continuous items X, Y and Z = .8964 What if we want a .95 reliability?We need a test that is 2.2 times longer than the original Nearly 7 items to achieve .95 reliability,CTT: Attenuation,Correlations are typically sought at the tru

40、e score level but the presence of measurement error can cloud (attenuate) the size the relationshipWe can correct the size of a correlation for the low reliability of the items.Called the Correction for Attenuation,CTT: Attenuation,Correction for attenuation is calculated as:is the corrected correlationis the uncorrected correlationthe reliabilities of the tests,CTT: Attenuation,For example X and Y are correlated at .45, X has a reliability of .8 and Y has a reliability of .6, the corrected correlation is,

copyright@ 2008-2019 麦多课文库(www.mydoc123.com)网站版权所有
备案/许可证编号:苏ICP备17064731号-1