Test Construction 2
Terms
undefined, object
copy deck
- reliability
-
consistency, a test is reliable to the degree it provides repeatable, consistent results.
Types: test-retest, alternate forms, internal consistency - validity
- a test is valid to the degree it measures what it purports to measure.
- tests of maximum performance vs. typical performance
- tell about examinee's best possible performance (achievement) vs. what examinee usually does/feels (personality).
- power vs. mastery test
- power test assesses level of difficulty a person can attain (no time limits) vs. attaining a pre-established level of acceptable performance (pass/fail - often used for basic skills tests)
- ipsative vs. normative measures
- ipsative measure uses self as the frame of reference while normative measures provide strengh of each attribute compared to others
- true score vs. measurement error
-
-classical test theory
-test reliable to degree that score reflects true score rather than error
-always some degree of error, no test perfectly reliable - reliability coefficent
-
-method of estimating test's reliability
-range from 0.0 to +1.0
-.0 means 90% variability in test score due to true score differences among examinees and 10% represents measurement error - test-retest reliability
-
a.k.a. coefficient of stability
-affected time factors & provide sources of error
-not typically recommended for reliability testing - alternative forms reliability
-
a.k.a. coefficient of equivalence
-administering two equivalent forms of test to same group of examinees then corelating two sets of scores
-considered best one to use although coefficent tends to be lower than test-retest (due to content differences & time passage)
-costly - internal consistency reliability
-
a.k.a. coefficient of internal consistency
-obtaining correlations among individual items
-methods: split-half, Cronbach's alpha, Kuder-Richardson - split-half reliability
-
dividing test in two and corelating halvs as if two shorter tests
-Spearman-Brown formula can overcome the lowering of coefficient due to decreasing length (thus lowering reliability) - Cronbach's coefficient alpha vs. Kuder Richardson
-
-both recommended over split-half
-indicate average degree of inter-item consistency
-Cronbach's used for tests with multiple scored items (likert choices)
-Kuder-Richardson when test items are dichotomously scored (yes/no, T/F) - major source of measurement error for internal consistency reliability coefficients?
- -content sampling or item heterogeneity: degree items are different in terms of content sampled
- measures of internal consistency good for assessing what and bad for assessing what?
-
good: unstable traits
bad: speed tests (inflated), use test-retest/alternate forms instead - interscorer (inter-rater) reliability
-
-calculating correlation coefficient bt scoes of two different raters
-kappa coefficient: measure of agreement bt two judges who rate set of objects using nominal scales - standard error of measurement
-
-how much error an individual test score can be expected to have
-used to construct confidence intervals - confidence intervals
-
-68% probability true score falls within +- one std error of meas
-95%: within +-2 std error of meas
-98%: within +- 3 std error of meas - factors affecting reliability
-
-length of test (longer>reliability)
-more homogeneous group taking test, lower reliability
-floor/ceiling effects decreases reliability
-T/F < Multiple Choice, if can guess than lowers reliability