Glossary of Measurement and Appraisal

- Assessment
- collecting info

- evaluating
- making valued judgement

-good/bad

-strong/poor

- measurement
- attaching a # to an assessment

- Formal vs. Informal assessment
- Formal:

specific rules and directions to follow

informal:

no rules

- supply vs. fixed (selection)
- Supply:

open ended, short answer or essay

fixed:

answer there pick it (mult. choice)

- Max performance vs. typical
- max: best you can do

typical: your performance on a regular average day

ex. pop quiz

- individual vs. group
- individual:one on one test

group: multiple people can take their own test at the same time

- speed vs. power
- speed:

how quickly you answer on the test matters (TIME MATTERS)

Power: There is a time limit, but not a factor in scoring

- objective vs. subjective
- objective: right/wrong factual (true/false)

subjective: interpretation required (short answer/essay)

- Norm vs. Criterion
- Norm:

based on how other did on the test

Criterion: based on some kind of standard or criteria rubric

- Criterion Referenced Tests
- standards based directions

Mastery: mastered completely

Minimal: minimally competent

Absolute:

A 90-100

B 80-89

C 70-79

- Norm Referenced Tests
- All comparative to others

- grading on curve, porportional

- grade equivlents/age equivlents
- Grade Equivilents:

Average performance of students in that grade

Average performance of students of that age level

- percentile ranks
- percentage of people that fall at or below a score based on stand. deviation

- z score formulat
- Number of standard deviations your score is from the mean

z= x-mean / SD

- T score formula
- stand. dev is 10

T= 10z+50

- iq scores formula
- 15z+100=IQ

- CEEB formula
- 100z+500=CEEB (GRE)

- NCE
- 21.06z+50= NCE

- stanine
- 1/2 stand. deviation wide

- Nature of Scores
- Norms

-race

-ethnicity

-gender

-geography

-sample size

- Nature of Test
- what is on the test, what do you do to take it?

- Nature of Environment
- Environment

-where and when was the test given

- Purpose for test
- What was the test designed for?

- Reliability
- gives you the same score every time

-CONSISTENT

- Validity
- It measures what it is supposed to measure

- Can you have reliability and validity
- You CAN have a reliable test but it is not valid

You CANNOT have a valid test that is not reliable

- Standard error of measurement
- x(observed score)=T (true)+Error

-Measuring how much of error we did

-Tells you how far your true score is off from the observed score

- confidence interval
- tells range of scores which you have certain level of confidence that your true score falls in that range

68%

95%

99%

- Test- Retest
- do I get the same score over time

- Constant error
- error is the same

same score, not true score

- parallel forms
- give version A of test then version B and correlate scores

- Test Retest w/parallel forms
- Give version A.. two weeks later give version B

More error-->lower reliability coefficient

- Internal consistency
- way to get reliability coefficient with giving test once

- split half reliability
- -grade half of test and compare scores

ex: grade odds, grade evens and compare scores

- Coefficient Alpha
- average of all possible split halves

- Criterion Validities: face validity
- does it look like what I think its supposed to look like

- Empirical validity
- (types you can calculate)

- predictive
- does my test relate to another measure in the FUTURE

- Concurrent
- Does my test score relate to another test score at THE SAME POINT IN TIME

- Cut score
- where your setting passing

- hit rate
- how many people capable?

who have passed?

- construct validity
- a test is valid if it measures the construct as I define it

- construct underepresentation
- missing questions on part of your construct

- Construct irrevelent variance
- have questions not relevent to construct

- Test Bias
- something making you answer something correctly because of what the test is measuring

Ex: Farm cylo- Some people may know what a cylo is and some may not

- differential item functioning
- people from one group do not do as well as people in another group

- generalization
- take test and try to prove its valid for diff. groups then combine that information

- Interpreting Reliabilities for individual scores
- Standard error of mean gives us confidence interval cause it gives you a range of scores where there is certain level of confidence that your true score falls in that range

- Interpreting reliabilities for Tests
- reliability coefficients- ranges on what kind of test

-academic .8 or .9

- What affects reliability
- -variability of a group

-difficulty of items for a group

-length of the test

-method used

- Testing Process
- Economy:

-computer scoring (rubrics)

-computerized test interpreation (gives and explains scores)

Test administration:

-instructions

-# of subtests (how many tests)

-test format

-materials