Glossary of EPPP Test Construction 2

Reliability Coefficient
Measure of how much obtained score is true ability
-Interpret directly (70% means 70% is true ability, 30% error)
-A good test should have at least 0.7 or higher
Classical Test Theory
Results are:
1. True Score (Ability)
-true variance
2. Some Error (Fatigue...)
-error variance
Reliablity
-Establish reliability first (Test can be reliable but not valid.)
-Consistency
Validity
-Accuracy
Validity can not exceed...
the square root of reliablity
Types of Reliability
1. Test-retest reliability (Coefficient of stability)
2. Alternate Forms (Considered the best but least used)
3. Internal Consistency (Compares test against itself)
Types of Internal Consistency Reliablity
1. Split-Half (split test, problem is restricted range)
-can use Spearman-Brown Prophecy Formula to make it like 2 tests
2. Inter-Item Consistency (compare items on one test one against the other in a systematic way)
-can use Cronbach's Alpha (compare items on test individually against all others systematically) or Kuder Richardson Formula 20 (special version of Cronbach, use when you have true/false or yes/no dichotomous test items)
Kappa Coefficient
Inter rater reliability
Standard Error of Measurement
-Based on reliability coefficient
-Try to get an idea of what a person's true ability is
-Based on a person's single score but has properties of a normal curve
-the more reliable the test, the less the SE of measurement
Standard Error of Mean
-How will sample represent population?
It is best to have ___________ items and _____________ test takers for a test to be most reliable.
Homogeneous
Heterogeneous
Content Validity
Based on expert judgement
Criterion-Related Validity
Outcome
-look at relationship between predictor and outcome
-used most often in personnel psych (predicting job performance, etc)
-two types are predictive validity (who will become schizophrenic?, predicts future behavior) and concurrent validity (who is schizophrenic now?, test results NOW)
Construct Validity
Can not directly define
-Two types are convergent (compare new test with established test that measures same construct) and divergent (discriminant validity - you want your test to have nothing in common with another test of a different construct)
Multitrait-Multimethod Matrix
If it's a single trait, will establish convergent validity - need at HIGH monotrait number to establish convergent validity
-If it's a heterogeneous trait, will need a low trait number to establish divergent validity
Face Validity
Does the test make sense to the people who are taking it?
Cross Validation
Give test instrument again and again
-Shrinkage may occur (range of scores will shrink slightly when you initially cross validate instruments)
Incremental Validity
Can we increase that number of correct decisions we are already making?
Three things to establish Incremental Validity
1. base rate - moderate (number of decisions you are already making correctly)
2. selection ratio - need low selection ratio (number of jobs available to number of applicants)
3. validity coefficient - high validity on predictor and criterion
Criterion-Referenced Scores
-Do not compare score to anyone else, just meeting a standard
Norm-Referenced Scores
-Score is compared to other individuals
-Two types: percentile ranks (not used as much now) and standard scores (transformed scores that allow you to compare)
Floor Effect
-bunch of test takers at bottom of test range
-need to have enough easy items
Ceiling Effect
-need to have enough difficult items to discriminate between best test takeers