## Glossary of EPPP - Test Construction

- Alternate Forms Reliability
- Reliability based on the degree of correlation between results on alternate forms of a test. Involves administering equivalent forms of a test to the same group of examinees.

- Coefficient Alpha
- Reliability coefficient based on an assessment of inter-item correlation, or the extent to which various test items correlate with one another.

- Concurrent Validity
- Form of criterion-related validity in which test scores are correlated with currently-available criterion scores. Important when the purpose of a test is to estimate the individual's current status on the criterion.

- Confidence Intervals
- Statistic that specifies, at a particular probability level, the range within which, given an obtained value, a true value is likely to fall, given that the measurement of that value is subject to error.

- Construct Validity
- The degree to which a test measures the underlying hypothetical construct or trait it claims to measure. Various methods (e.g., factor analysis, multitrait-multimethod matrix) are used to determine a test's construct validity.

- Content Validity
- The degree to which a test measures knowledge of the content domain it is supposed to measure knowledge of.

- Convergent Validity
- The degree to which a test has a high correlation with other measures that purportedly assess the same trait. A type of construct validity.

- Criterion Contamination
- Artificial inflation of the criterion-related validity coefficient that occurs when the criterion measure is scored subjectively by raters, and the raters know how ratees scored on the predictor. Raters let their knowledge of predictor performance bias th

- Criterion Referenced Tests
- Tests that assess how a person performed in relation to a defined external criterion. Unlike as in norm-referenced tests, the person's performance is not compared to that of others.

- Criterion Variable
- The outcome variable in correlational research that one tries to predict.

- Criterion-Related Validity
- A method of determining a test's validity by correlating the score from this test to some outside criterion. For example, a test of computer literacy would be shown to have criterion- related validity if people who score high do well on a job working with

- Differential Validity
- Exists when the validity coefficient of a predictor is significantly different for one subgroup than for another subgroup; e.g., the validity coefficient is significantly lower for males than for females.

- Discriminant (Divergent) Validity
- The degree to which a test has a low correlation with another measure that purportedly assesses a different construct. A type of construct validity.

- Discriminant Function Analysis
- Used to find orderly clusters of items on a test which distinguish groups of people based on their responses to these items.

- Empirical-Criterion Keying
- Method of constructing tests in which items are selected for inclusion on the test, from a larger item pool, on the basis of their relationship to an external criterion measure. For example, items on the MMPI-2 were selected because they were empirically

- Face Validity
- Extent to which test items appear to measure the attribute(s) measured by the test. From a technical standpoint, not truly a type of validity; however, it can influence an examinee's motivation to complete a test accurately.

- Factor Analysis
- A procedure to identify common, underlying constructs being measured by a set of tests, indicating the degree to which scores on all the tests can be accounted for by one or more of the same statistical factors (i.e., attributes or traits).

- False Negative
- An examinee whom a predictor incorrectly identifies as not having the particular characteristic being measured; e.g., a job applicant who would have been good on the job who is not hired because he or she did not "pass" on the predictor.

- False Positive
- An examinee whom a predictor incorrectly identifies as having a particular characteristic; e.g., a job applicant who is hired because he or she did well on the predictor but is not actually good on the job.

- Inter-Rater Reliability
- The degree of consistency among test scores when the test is subjectively scored by different raters.

- Item Discrimination
- The extent to which an item on an exam discriminates between examinees who obtain high and low scores on a test or on an external criterion. An item discrimination index, or the difference between the percentage of high-scoring and low-scoring group indiv

- Mastery Test
- Tests designed to assess whether an examinee has met a pre- determined level of proficiency in a particular content domain.

- Moderator Variables
- Any variable which moderates, or influences, the relationship between two other variables. For instance, the relationship between a job selection test and actual performance might be moderated by experience people had with that type of work.

- Multitrait-Multimethod Matrix
- Method of assessing a test's convergent and divergent validity in which at least two different traits or constructs are each measured by at least two different methods.

- Norm-Referenced Test
- Test yielding scores (e.g., percentile ranks, z-scores) that report examinees' performance in terms of a comparison to a normative group of others who have taken the same test.

- Oblique Rotation
- In factor analysis, a rotation of the factor matrix that yields correlated factors.

- Obtained Score
- The score someone gets on a test. According to classical test theory, any obtained score is a reflection of both the "true" score and random error.

- Orthogonal Rotation
- In factor analysis, a rotation of the factor matrix that yields uncorrelated factors.

- Power Test
- A test of progressively more difficult items with no or a generous time limit. A vocabulary subtest from an intelligence test is an example.

- Predictive Validity
- Type of criterion-related validity determined by correlating applicant predictor scores with criterion scores that are obtained at a later time.

- Predictor Variables
- Those variables which relate to one or more criterion variable(s). Predictors (e.g., SAT score) are used to estimate scores or status on a criterion variable (e.g., college grades).

- Reliability
- The degree to which a test or experiment provides consistent results, or a "true" score. Reliability is inversely proportional to the standard error of measurement.

- Shrinkage
- Tendency of validity coefficients to decrease in magnitude on cross-validation.

- Speeded Tests
- A test of easy items given under a time limit. A typing test is an example of a speed test.

- Split-Half Reliability
- Reliability obtained by splitting a test in half and correlating scores on one half of the test with scores on the other half of the test.

- Standard Error Of Measurement
- An indication of how close the obtained score on a test is to a true score on the test. The standard error of measurement is a function of the test's reliability and the standard deviation of test scores. If reliability of the test is low and the standard

- Standard Error Of The Estimate
- An indication of how much error can be expected when a predictor equation is used. No one measure can predict scores or status on another measure with perfect accuracy, and the standard error of estimate indicates the expected discrepancy between a predic

- Standard Score
- Score that reports performance on a test in terms of the number of standard deviation units a raw score is from the mean of the distribution. Includes z scores, T scores, stanines, and deviation IQ scores.

- Stanine Scores
- A standard score range which divides a distribution into nine equal intervals with 1 being the lowest and 9 being the highest ninth of the distribution. Stanine scores have a mean of 5 and a standard deviation of about 2.

- Test-Retest Reliability
- Test reliability established by administering the same test twice to a single group of examinees and correlating the two sets of test scores.

- True Negative
- An examinee whom a predictor correctly identifies as not meeting the minimum level on a criterion measure. For instance, someone who does not succeed on a job selection measure and would not have been successful on the job.

- True Positive
- An examinee whom a predictor correctly identifies as meeting the minimum level on a criterion measure. For instance, someone who succeeds on a job selection test and is successful on the job.

- True Score
- A hypothetical notion of what a person's actual score is if there were absolutely no error in the measurement. The true score can never be exactly measured.

- T-Score
- A type of standard score with a mean of 50 and a standard deviation of 10. Scores on a number of personality inventories, including the MMPI-2, are reported in terms of T scores.

- Validity
- The extent to which a test measures what it is supposed to measure; i.e., the usefulness of a test. Types of validity include content validity, construct validity, and criterion- related validity.