Measurement and Appraisal
Terms
undefined, object
copy deck
- Assessment
- collecting info
- evaluating
-
making valued judgement
-good/bad
-strong/poor - measurement
- attaching a # to an assessment
- Formal vs. Informal assessment
-
Formal:
specific rules and directions to follow
informal:
no rules - supply vs. fixed (selection)
-
Supply:
open ended, short answer or essay
fixed:
answer there pick it (mult. choice) - Max performance vs. typical
-
max: best you can do
typical: your performance on a regular average day
ex. pop quiz - individual vs. group
-
individual:one on one test
group: multiple people can take their own test at the same time - speed vs. power
-
speed:
how quickly you answer on the test matters (TIME MATTERS)
Power: There is a time limit, but not a factor in scoring - objective vs. subjective
-
objective: right/wrong factual (true/false)
subjective: interpretation required (short answer/essay) - Norm vs. Criterion
-
Norm:
based on how other did on the test
Criterion: based on some kind of standard or criteria rubric - Criterion Referenced Tests
-
standards based directions
Mastery: mastered completely
Minimal: minimally competent
Absolute:
A 90-100
B 80-89
C 70-79 - Norm Referenced Tests
-
All comparative to others
- grading on curve, porportional - grade equivlents/age equivlents
-
Grade Equivilents:
Average performance of students in that grade
Average performance of students of that age level - percentile ranks
- percentage of people that fall at or below a score based on stand. deviation
- z score formulat
-
Number of standard deviations your score is from the mean
z= x-mean / SD - T score formula
-
stand. dev is 10
T= 10z+50 - iq scores formula
- 15z+100=IQ
- CEEB formula
- 100z+500=CEEB (GRE)
- NCE
- 21.06z+50= NCE
- stanine
- 1/2 stand. deviation wide
- Nature of Scores
-
Norms
-race
-ethnicity
-gender
-geography
-sample size - Nature of Test
-
what is on the test, what do you do to take it?
- Nature of Environment
-
Environment
-where and when was the test given - Purpose for test
- What was the test designed for?
- Reliability
-
gives you the same score every time
-CONSISTENT - Validity
-
It measures what it is supposed to measure
- Can you have reliability and validity
-
You CAN have a reliable test but it is not valid
You CANNOT have a valid test that is not reliable - Standard error of measurement
-
x(observed score)=T (true)+Error
-Measuring how much of error we did
-Tells you how far your true score is off from the observed score - confidence interval
-
tells range of scores which you have certain level of confidence that your true score falls in that range
68%
95%
99% - Test- Retest
- do I get the same score over time
- Constant error
-
error is the same
same score, not true score - parallel forms
- give version A of test then version B and correlate scores
- Test Retest w/parallel forms
-
Give version A.. two weeks later give version B
More error-->lower reliability coefficient - Internal consistency
- way to get reliability coefficient with giving test once
- split half reliability
-
-grade half of test and compare scores
ex: grade odds, grade evens and compare scores - Coefficient Alpha
- average of all possible split halves
- Criterion Validities: face validity
- does it look like what I think its supposed to look like
- Empirical validity
-
(types you can calculate)
- predictive
- does my test relate to another measure in the FUTURE
- Concurrent
- Does my test score relate to another test score at THE SAME POINT IN TIME
- Cut score
- where your setting passing
- hit rate
-
how many people capable?
who have passed? - construct validity
- a test is valid if it measures the construct as I define it
- construct underepresentation
- missing questions on part of your construct
- Construct irrevelent variance
- have questions not relevent to construct
- Test Bias
-
something making you answer something correctly because of what the test is measuring
Ex: Farm cylo- Some people may know what a cylo is and some may not - differential item functioning
- people from one group do not do as well as people in another group
- generalization
- take test and try to prove its valid for diff. groups then combine that information
- Interpreting Reliabilities for individual scores
- Standard error of mean gives us confidence interval cause it gives you a range of scores where there is certain level of confidence that your true score falls in that range
- Interpreting reliabilities for Tests
-
reliability coefficients- ranges on what kind of test
-academic .8 or .9 - What affects reliability
-
-variability of a group
-difficulty of items for a group
-length of the test
-method used - Testing Process
-
Economy:
-computer scoring (rubrics)
-computerized test interpreation (gives and explains scores)
Test administration:
-instructions
-# of subtests (how many tests)
-test format
-materials