Question
click below
click below
Question
Normal Size Small Size show me how
Reliability
Test and Measurement
Question | Answer |
---|---|
Reliability | The consistency or stability of scores. |
Classical Test Theory | The theory of reliability can be demonstrated with mathematical proofs. X=T+E. |
Random Measurement Error | All measurement is susceptible to error. Any factor that introduces error into the measurement process effects reliability. |
Random measurement error is also referred to as what? | Unsystematic Error |
How do we increase our confidence in scores? | We try to detect, understand, and minimize random measurement error. |
Content Sampling Error | A major source of random measurement error. Results from differences between the sample of items (i.e., the test) and the domain of items (i.e., all items. |
Time Sampling Error/Temporal Instability | Random and transient. Situation-Centered influences (e.g., lighting & noise) and Person-Centered influences (e.g., fatigue, illness) |
Other Sources of Error | Administration errors (e.g., incorrect instructions, inaccurate timing) and Scoring errors (e.g., subjective scoring, clerical errors). |
Classical Test Theory | X = T + E |
X | Obtained or observed score (fallible) |
T | True score (reflects stable characteristics) |
E | Error score (reflects random error) |
Test-Retest Reliability | reflects the temporal stability of a measure. Most applicable with tests administered more than once and/or with constructs that are viewed as stable. It is important to consider length of interval. |
Test-Retest Reliability are subject to what? | “carry-over effects.” Appropriate for tests that are not appreciably impacted by carry-over effects. |
Alternate Form Reliability | Involves the administration of two “parallel” forms. |
Delayed Administration for Alternate Form Reliability | reflects error due to temporal stability and content sampling. |
Simultaneous Administration for Alternate Form Reliability | reflects only error due to content sampling. |
Alternate Form Reliability limitations | Reduces, but may not eliminate carry-over effects. Relatively few tests have alternate forms. |
Internal Consistency Reliability | Estimates of reliability that are based on the relationship between items within a test and are derived from a single administration of a test. Split-Half Reliability |
Split-Half Reliability reflects what type of error? | error due to content sampling. |
Coefficient Alpha & Kuder-Richardson (KR 20) | Reflects error due to content sampling. Sensitive to the heterogeneity of the test content (or item homogeneity). Coefficient Alpha is the mathematical average of what? |
Reliability of Speed Tests | For speed tests, reliability estimates derived from a single administration of a test are inappropriate. For speed tests, test-retest and alternate-form reliability are appropriate, but split-half, Coefficient Alpha and KR 20 should be avoided. |
Inter-Rater Reliability | Reflects differences due to the individuals scoring the test. Important when scoring requires subjective judgement by the scorer. |
Reliability of Difference Scores | Difference scores are calculated when comparing performance on two tests. Why is the reliability of the difference between two test scores generally lower than the reliabilities of the two tests? |
Reliability of Composite Scores | When there are multiple scores available for an individual one can calculate composite scores (e.g., assigning grades in class). What is the important issue with reliability of a composite score? |
Standards for Reliability in making important decisions | If a test score is used to make important decisions that will significantly impact individuals, the reliability should be very high |
Standards for Reliability if part of an assessment | If a test is interpreted independently but as part of a larger assessment process (e.g., personality test), most set the standard as .80 or greater. |
Standards for Reliability in research or composite score | If a test is used only in group research or is used as a part of a composite (e.g., classroom tests), lower reliability estimates may be acceptable (e.g., .70s). |
Improving Reliability | Increase the number of items (i.e, better domain sampling). Use multiple measurements (i.e., composite scores). Use “Item Analysis” procedures to select the best items.Increase standardization of the test (e.g., administration and scoring). |
Standard Error of Measurement (SEM) | When comparing the reliability of tests, the reliability coefficient is the statistic of choice. When interpreting individual scores, the SEM generally proves to be the most useful statistic. |
The SEM is an index of what? | the average amount of error in test scores. Technically, the SEM is the standard deviation of error scores around the true score. |
How is the SEM calculated? | using the reliability coefficient and the standard deviation of the test. |
What is the relationship between rxx and SEM? | Since the test’s reliability coefficient is used in calculating the SEM, there is a direct relationship between rxx and SEM. As the reliability of a test increases, the SEM decreases; as reliability decreases, the SEM increases. |
Confidence Intervals | reflects a range of scores that will contain the test taker’s true score with a prescribed probability. What is used to calculate confidence intervals? |
A major advantage of confidence intervals | is that they remind us that measurement error is present in all scores and we should interpret scores cautiously. |
Confidence intervals are interpreted | “The range within which a person’s true score is expected to fall -- % of the time.” |
What is the relationship between SEM and confidence intervals? | Since the SEM is used in calculating confidence intervals, there is a direct relationship between the SEM and confidence intervals. The size of confidence intervals increases as the SEM increases. The size of confidence intervals decreases as the reliabi |
A Primer on Generalizability Theory | An extension of Classical Test Theory. Classical Theory - all error is random. Generalizability Theory - recognizes sources of systematic error. |
When is the classical and generalizability mathematically identical? | If you have a situation in which there is no opportunity for systematic error to enter the model, Classical and Generalizability are mathematically identical. |
Classical Test Theory is most useful when | objective tests are administered under standardized conditions (e.g., SAT or GRE). |
When is Generalizability Theory useful | If considerations for the Classical Test Theory are not met, consideration of the principles raised by Generalizability Theory may be useful (e.g., essay or projective tests). |