Reliability
Terms
undefined, object
copy deck
 What is the significance of Norms?
 Norms are important to compare tests results with.
 How are Norms created?
 Norms are created by defining the population we wish to compare results with. Ex. is SAT's  juniors and seniors.
 How are Norms developed?
 Norms are developed by administering a test to a representative sample.
 How are the distributions of scores related to norms?
 Distribution of the scores is converted into what is called norms of test takers.
 Define Standardizing a Test.
 Process of administering test to representative sample in order to establish norms.
 Define Standardized Test.

A test with standardized procedures for administration,
for scoring, for interpreting test data...includes all normative data.  Why standardized tests?
 Standardized tests are done to establish widespread consistency.
 Define representative sample.
 A smaller sample of the entire universe of a population.
 T or F: The less people in the sample, the better, on some occasions. Why?
 False...the more the better, for less measurement of error.
 Define stratified sample
 One with all the characteristics of our sample of black, white, Catholic, single, for instance.
 What is good about increasing the sample size (stratified sample)?
 It increases our ability to interpret the test and helps to prevent bias (examining all characteristics).
 What's hot on the plate of Psych Corp with regard to sample stratification?
 They will reexamine Weschler scale for adults to make sure it represents entire population.
 Define stratified random sample.
 People randomly chosen to take test (some people opt out). Stratified sample better.
 Define purposive sample.
 Arbitrary selection of people (consumer research...not stratified in any way).
 Define incidental sample.
 Research psychology where you grab warm bodies (not scientific...mostly experimental psych).
 Describe the next step after a test is developed.
 They develop a table of norms...two major areas are, AGE NORMS and GRADE NORMS.
 Define Age Norms.
 Age norms where you compare the results of the test by ages to establish an average performance, Ex. 9yrold.
 Define Grade Norms.
 When a certain group of students in same grade are compared with others, Ex. fifthgraders with the same.
 How are funds awarded?
 Grade Norms are established and reported.
 Define a Fixed Reference Group.
 Not a test developed for everyone in country but for specific individuals for specific purposes, Ex. GRE, SAT
 Define Normed Reference Group (falls under Fixed Reference Group.
 This one does go across whole country like IQ scores.
 How does Normed Reference Group help us to derive meaning from test scores?
 It allows us to make interpretations.
 More on Normed Reference Group scores.
 They allow us to assess this person's rank (ordinal scales), to be able to interpret the student's test score.
 Define Criterion Referenced Groups (criterian is established for you).
 Predictions based on developmental norms that allow us to make predictions (personality tests).
 T or F. Personality is not as stable across time as IQ is.
 True
 What is the difference between IQ and Criterion related tests?
 IQ measures ability. Criterion related tests give information about what people can do.
 What type of Criterian Reference tests make it hard to define what is normal?
 Personality Tests.
 Define correlation.
 The strength of the relationship between two things.
 T or F: Correlation and causation are interchangeable terms.
 False. Correlation does not indicate causation.
 Describe an inverse correlation.
 An inverse correlation is seen when one score goes up, the other goes down.
 Describe a positive correlation.
 A positive correlations is when two things go up or down together. Ex. Height and weight.
 Less studying brings grades down is an example of what kind of a correlation.
 A positive correlation.
 As miles go up, value goes down in cars. What type of correlation is this?
 A negative correlation.
 As anxiety goes up, performance goes down. What type of correlation is this?
 A negative correlation.
 When two things go up or down together there's what type of a correlation?
 Positive correlation.
 Define Level of Significance.
 Level of Chance. Flip a coin 100 times, to determine what the chance is that you'll have 50 heads.
 Why do we try to minimize the level of chance?
 We minimize the level of chance to come up with a level of significance. IQ tests are based on this.
 Describe the term "level of chance" with regard to an IQ test.
 .01 means we have a reasonable degree of psychological certainty that 1 chance out of 100 that IQ will fall in that range.
 What is most widely used for correlation studies?
 pearson r
 How are correlation studies represented
 By a graph or a scatter plot.
 Define reliability.
 REliability refers to consistency in measurement. It is expressed in a coefficient.
 Define reliability coefficient.
 An index of just how strong the test is and also refers to relationship between true score and total variance.
 Give a sample coefficient reliability.
 A test on IQ over period of time...today...two years...five years...the variability among scores.
 Show variability scores over time.
 Time 1, 90; time 2, 95, time 3, 92. Take variance 3 pts. and 2 pts to see how consistent is the test.
 What are the results of assessing variability over time?
 The narrower the bands among scores, the stronger the reliability coefficient will be. More testing...higher reliability.
 How to tell if test is reliable?
 Give same test to everyone. Minimize variance in directions. Use a stratified sample. Make environment conducive to testing.
 Describe Test Retest. (Afterward you run it through reliability coefficient).
 Test is given once then again after a specified amount of time. You examine how the test varies from test to next.
 Describe splithalf method (another test for reliability)
 A 30 item vocabulary test is constructed. One time you test for 15, the next the other half.
 T or F. The more a test increases in size, the greater the reliability.
 True
 Name the most reliable test.
 Spearman Brown correlation. See figure 3.3 p. 47 to calculate.
 Define Parallel tests. Hint: won't ever happen because we'll never have a parallel form of a test. Level of difficulty must increase on both.
 Basically a test that you develop that hopefully wil come up with the exact same Mean and SD.
 Describe alternate form of test.
 It is just a different form of the same test (done so teachers have data to see how well they are doing).
 Achievement based tests. Ex. Achievement based reading, spelling, math.
 Two of each developed with level of difficulty increased on both.
 Explain wy reading needs two alternate forms of testing.
 One tests word recognition and the other tests comprehension.
 Name some other methods for rating test consistency.
 Item analysis and evaluation (item consistency to determine what you are evaluating and the level of difficulty.
 What does homogenous in nature mean?
 Measures a single trait.
 What does heterogenous mean?
 Measures a varity of traits.
 What is the general rule for a homogenous test?
 The more homogenous a test is, the more interim consistency there will be because you are measureing the same trait.
 Give examples of homogoneous and heterogenous tests.
 IQ tests have both...block design subtest, vocabulary. Each item alone is homogenous, taken together they are heterogenous.
 Why do we need to be intimately involved with all the traits as well as things like working memory and verbal reasoning.
 We not only have to understand what we're evaluating, but also the concepts we use, as well as how to interpret variables when they are all put together.
 Name the key factor on reliability.
 Consistency.
 What is the purpose of a reliability coefficient?
 They can be calculated to provide an indication of the amount of measurement error in an instrument, thereby determining the instrument's appropriateness.
 Define measurement error.
 The amount of reliability fluctuation due to errors in measurement.
 Name one way a reliability coefficient is used.
 They can be useful in selecting instruments with the most consistency and the least amount of measurement error.
 Name another way a reliability coeffecient is used.
 They can be used when interpreting the results of an assessment to a client (involves the use of standard error of measurement).
 Explain ussystematic error with regard to reliability.
 Reliability is an estimate of the proportion of total variance that is true variance and the proportion that is error variance. Error variance, in this case is only unsystematic error.
 How do systematic and unsystematic errors compare?
 Systematic errors include errors in test questions that contain a typo that every test taker reads. Unsystematic would be an error (blank page, typo) on one person's test.
 Give an example of unsystematic error.
 A test administrator of an instrument at one school turns the heat up too high. It results in discomfort to test takers and unsystematic error at only one school.
 Describe a correlation coefficient with regard to reliability.
 Reliability is typically calculated based on the amount of conistency between two sets of scores. when individuals want to examine consistency, they use the stat. technique of correlation.
 Name the most common method for calculating correlations.
 The PearsonProduct Moment correlation Coefficient. p. 46 (+ x + = +; + x  = ; small x small = small; big x big = big).
 what is the difference between consistent scores and inconsistent scores with regard to correlation coefficient?
 Consistent scores contribute to a larger cc (toward either a +1.00 or a 1.00). Inconsistent scores reduce the size of the cc (toward .00).
 Describe Testretest.
 A common method for estimating the reliability of an instrument is to give the indentical instrument twice to same group.
 How is the Testretest calculated?
 A reliability coefficient is calculated by correlating the performance on the first test with the second. The variation between them would relate to unsystematic error because everything else would be the same.
 What assumptions must be met to administer the Testretest?
 See p. 49 middle to end.
 Explain alternate or parallel forms. Hint: Least preferred.
 Requires two forms of an instrument. Forms differ from one another avoiding difficulties of Testretest method.
 Describe Splithalf reliability.
 Instrument given onece and then split in half to determine the reliability. First step involves dividing insturment into equivalent halves. p. 5051
 SpearmanBrown formula?
 Two halves increased to original length of instrument. p. 51
 Reliability summary?
 p. 6061