This site is 100% ad supported. Please add an exception to adblock for this site.

Methods ch. 1-4 Multiple Choice

Terms

undefined, object
copy deck
The earliest recorded use of procedures resembling psychological testing is:
A) United States, circa 1850 AD
B) Rome, circa 200 BC
C) China, circa 200 BC
D) Incan Empire, circa 1400 AD
China, circa 200 BC
The first intelligence test capable of measuring a general intelligence level was:
A) Binet-Simon
B) Hermann Ebbinghaus
C) Emil Kraeplin
D) Weber-Fechner
Binet-Simon
What is the definition of the term “battery?”
A) a group of items that pertains to a single variable, arranged in order of difficulty or intensity
B) a group of several tests or subtests that are administered at one time to one person
a group of several tests or subtests that are administered at one time to one person
Who was the first creator of the laboratory dedicated to research of a purely psychological nature?
A) James McKeen Cattell
B) Emil Kraeplin
C) Sir Francis Galton
D) Wilhelm Wundt
Wilhelm Wundt
*According to Testing Standards, “the ultimate responsibility for appropriate test use and interpretation” is primarily assigned to:
A) test authors and developers
B) test publishers
C) test score interpreters
D) test user
test user
*The Woodworth Personal Data Sheet, the first personality test, was used to screen World War I recruits that might suffer from?
A) Dyslexia
B) Attention Deficit Disorder
C) Mental illness
D) Fear of heights
Mental illness
The 1905 _________ was a series of 30 tests or tasks, varied in content and difficulty, designed mostly to assess judgment and reasoning ability irrespective of school learning.
A) Ebbinghaus Completion Test
B) Stanford-Binet Intelligence Scale
Binet-Simon Scale
The primary reason for test misuse lies in the insufficient
A) publication of tests
B) knowledge of test users
C) instruments available to test users
D) use of the test manual by the examiner
knowledge of test users
A psychological assessment differs from a psychological test in that
A) psychological testing is more complex
B) psychological assessment is objective
C) psychological assessment is longer and more unique
D) psychological testing
psychological assessment is longer and more unique
A standardized or normative sample is:
A) a sample in which all the participants have at least one similar characteristic
B) a sample taken after a test has been completed to analyze the results of the test
C) a sample taken in order to ga
a sample taken in order to gauge the performance of others who will later take the test
Which characteristic made the Minnesota Multiphasic Personality Inventory (MMPI) more successful than previous personality inventories?
A) many of its items had no obvious reference to psychopathological tendencies
B) it included 116 statements
many of its items had no obvious reference to psychopathological tendencies
*Standardization of psychological tests refers to measurement based on ________and ________.

A)normal curve & repetition of results
B)normal curve & unbiased analysis
C)normative samples & uniform procedure method
D)
normative samples & uniform procedure method
*A technique, based on correlation, for reducing a large number of variables to a small set of factors is:
A) scaling
B) kurtosis
C) factor analysis
D) sampling
factor analysis
*The Rorschach Test is a type of:
A) personality inventory
B) neuropsychological test
C) thematic apperceptual technique
D) projective technique
projective technique
What came into being following the realization that intelligence is not a unitary concept and that human abilities comprise a broad range of independent factors?
A) scholastic aptitude tests
B) neuropsychological tests
C) multiple aptitude
multiple aptitude batteries
Who was responsible for promoting the field of eugenics, discovering the phenomena of correlation and regression, and also pioneered the twin study method?
A) Alfred Binet
B) Kurt Goldstein
C) Robert Yerkes
D) Francis Galton
Francis Galton
*What is the basic definition of a battery?
A) a process of arriving at the sequencing of items
B) a group of several tests administered at one time
C) a group of items that pertains to a single variable
D) a tool designed to elicit i
a group of several tests administered at one time
*Which is the primary use of psychological tests?
A) decision making
B) psychological research
C) self understanding and personal development
D) making predictions
decision making
*The SAT is based loosely on the historical model of the
A) Stanford-Binet intelligence scale
B) Woodworth Personal Data Sheet
C) Wechsler Intelligence Scale for Children
D) Army Alpha Test
Army Alpha Test
*A test with scores that can range from 0-100 has a distribution with most scores in the 70’s-90’s is said to be which of the following:
a. positively skewed
b. linearly distributed
c. normally distributed
d. negatively skewed
negatively skewed
*The interquartile range of a distribution is:
a. the top 25% of the distribution
b. the bottom 25% of the distribution
c. the bottom 50% of the distribution
d. the middle 50% of the distribution
the middle 50% of the distribution
If a test is written intending to measure achievement of college-level students and the distribution is negatively skewed, what should be done with the test?
a. the test should be made more difficult
b. the test should remain the same
c. t
the test should be made more difficult
*_________ refers to measures derived from sample data, while measures derived from population data are __________.
a. Parameters, statistics
b. Samples, parameters
c. Statistics, parameters
d. Constants, parameters
Statistics, parameters
*The ratio IQ (MA/CA) ratio was not very effective because their intellectual development is far less __________ from year to year.
a. uniform
b. dynamic
c. irregular
d. measurable
uniform
What was the main problem with ratio IQ scores used with the original Stanford-Binet Intelligence Scale?
a. the math was too complicated for psychologists to compute
b. the ratio simply did not work for adolescents and adults
c. the measur
the ratio simply did not work for adolescents and adults
Through a study of 400 high school students, the College Board finds that 60% of high school students wish to attain a higher educational degree. This is an example of:
a. descriptive parameter
b. ordinal percentage
c. statistic
d. in
statistic
What is the interquartile range of a set of data?
a. the range of all four quarters of the data
b. four times the semi-interquartile data
c. the range of the middle two quarters of data
d. half of the semi-interquartile data
the range of the middle two quarters of data
*The distance between a value and the mean of a distribution, expressed in terms of the standard deviation is represented by:
a. Pearson r
b. Median
c. Z-score
d. Correlation coefficient
Z-score
A percentile score is an example of which type of scale
a. ratio
b. ordinal
c. interval
d. nominal
ordinal
*Which measure of central tendency is useful when dealing with qualitative or categorical variables?
a. mean
b. median
c. mode
d. range
mode
What is the basic definition of a battery?
A) a process of arriving at the sequencing of items
B) a group of several tests administered at one time
C) a group of items that pertains to a single variable
D) a tool designed to elicit in
a group of several tests administered at one time
*A characteristic of the stanine scale is:
a. it is complex
b. it is expensive
c. it lacks precision
d. it is not time efficient
it lacks precision
Alternate forms, anchor tests, fixed reference groups and simultaneous norming are all types of:
a. nonlinear transformations
b. equating procedures
c. item response testing
d. performance assessment
equating procedures
Local norms are characterized by:
a. groups formed in terms of age, sex, ethnicity, or any other variable that may significantly impact test scores
b. reference groups drawn from members of a specific, more narrowly defined population or instit
reference groups drawn from members of a specific, more narrowly defined population or institution
If a fifth grader scores at the eighth grade level in arithmetic, it means that:
a. the student’s score is significantly above the average for fifth graders in arithmetic
b. the student has mastered eighth grade arithmetic
c. the same as
the student’s score is significantly above the average for fifth graders in arithmetic
Which of the following demonstrates the difference between percentiles and percentage scores?
a. percentiles reflect an individual’s number of correct responses, while percentage scores reflect the individual’s rank
b. the frame of referenc
percentiles reflect an individual’s rank in reference to other people, while percentage scores reflect an individual’s performance in reference to the entire test
*Which statement clearly distinguishes between three terms that are often used interchangeably?
a. “reference group” identifies a more specific group of test subjects than a “standardization sample” or a “normative sample”
b. “sta
“standardization sample” is the first group to receive the test, whereas the “normative sample” is any group from which norms are gathered
The changing of the reference group standards for the College Board’s SAT score scale is called:
a. anchor testing
b. variating
c. equivocating
d. re-centering
re-centering
*The higher level of performance typically seen in the normative groups of newer versions of general intelligence tests compared to their older counterparts is known as the:
a. deviation from the mean
b. Flynn effect
c. Standard deviation<
Flynn effect
*The foremost requirement of the normative sample is:
a. to be sufficiently large enough to ensure stability of variables
b. to be recent
c. to have the demographic makeup of the nation’s population
d. to be representative of the in
to be representative of the individuals to be tested
Behavioral sequences:
a. can be converted into nominal scales
b. cannot be used normatively
c. depend on an orderly progression from one state to another
d. can only be based on chronological age
depend on an orderly progression from one state to another
*Deviation IQ’s were first introduced in 1939 for use in the:
a. Otis-Lennon School Ability Test
b. SAT
c. Wechsler-Bellevue
d. Kaufmann Adolescent and Adult Intelligence Test
Wechsler-Bellevue
A ________ expresses the distance between a raw score and the mean of the reference group in terms of the standard deviation of the reference group.
a. t-score
b. z-score
c. percentile score
d. grade-equivalent score
z-score
*If a person scores lower than any of the people in the normative sample, the problem is one of:
a. insufficient test ceiling
b. the test was too easy for the individual
c. overly large normative sample
d. insufficient test floor
insufficient test floor
*Which is used when a score distribution approximates but does not quite match the normal distribution?
a. linear transformation
b. nonlinear transformation
c. normalized standard scores
d. stanines
normalized standard scores
If a test taker reaches the test ceiling on a test, then:
a. the test taker is labeled a genius
b. the test taker must retake the test
c. the test is insufficient
d. the test was wrongly scored
the test is insufficient
*The Gesell Developmental Schedules and the Infant-Toddler Developmental Assessment have this in common:
a. they were both developed by Arnold Gesell
b. both were tested and edited at Yale
c. they both use ordinal scaling
d. they deri
they both use ordinal scaling
*Reliability of scores _________ as the error component __________.
a. decreases; remains constant
b. remains constant; increases
c. increases; decreases
d. decreases; decreases
increases; decreases
What two things does reliability in measurement imply?
a. consistency and precision
b. consistency and relatedness
c. precision and relatedness
d. consistency and validity
consistency and precision
*Evidence of score reliability is __________ validity.
a. unrelated to
b. sufficient for
c. necessary and sufficient for
d. necessary but not sufficient for
necessary but not sufficient for
*Traits are considered ________ characteristics, while states are referred to as ________.
a. stable; enduring
b. temporary; static
c. stable; temporary
d. temporary; shortlived
stable; temporary
The Spearman-Brown formula is related to the idea that:
a. a larger number of observations yields more reliable results
b. reliable results do not rely on the number of observations
c. a smaller set of observations is quicker to make
a larger number of observations yields more reliable results
*The KR-20 or alpha coefficients are good indicators of _________ in a test.
a. homogeneity
b. spiral-omnibus formats
c. heterogeneity
d. alternate forms
homogeneity
True scores are:
a. equivalent to the test taker’s observed score
b. the observed score subtracted from the raw score
c. hypothetical scores that would result from error-free measurement
d. normative sample scores of the given distr
hypothetical scores that would result from error-free measurement
The test-retest reliability coefficient tells us:
a. extent to which scores will fluctuate as a result of time sampling error
b. extent to which scores will fluctuate as a result of scorer reliability
c. reliability of the interitem incons
extent to which scores will fluctuate as a result of time sampling error
Low reliability estimates suggest that:
a. the test is too short
b. the test is too long
c. not enough data from the normative sample was analyzed
d. the test is not very trustworthy
the test is not very trustworthy
*Theoretically, if an individual took the same test an infinite number of times, his/her mean score would represent his/her:
a. true score
b. reliability coefficient
c. error component
d. observed score
true score
Reliability and error component are:
a. not related at all
b. positively related
c. inversely related
d. negatively related
inversely related
_________ is used in determining the consistency of mental tests, that is, the repeatability of their results. It evaluates sources of error and the sizes of those errors.
a. Measurement error
b. The reliability coefficient
c. The true sco
The reliability coefficient
The phrase “all other things being equal” should serve to:
a. alert the reader to the possibility that several other things do need to be considered besides the specific concept in question
b. show that all the aspects of a certain concept
alert the reader to the possibility that several other things do need to be considered besides the specific concept in question
Three sources of measurement error with typical reliability include:
a. time sampling error, inconsistency, alternate form
b. homogeneity, time sampling error, off balancing
c. content sampling error, performance error, group diversity
interscorer difficulties, time sampling error, content sampling error
If all the test score variance were true variance, score reliability would be:
a. 1.00
b. -1.00
c. 100
d. -100
1.00
*What is one of the most frequently used formulas to calculate interitem consistency?
a. Cronbach’s Alpha
b. Pearson R formula
c. Spearman-Brown formula
d. Standard Error of Measurement formula
Cronbach’s Alpha
If an alternate form of a test is given shortly after taking the original form, there is likely to be:
a. heightened reliability
b. test-retest reliability
c. significant practice effects
d. the Flynn effect
significant practice effects
*The standard error of measurement (SEM) of Test A is 3 and the SEM of Test B is 5. If you wanted to compare these scores, the standard error of the difference would be:
a. 34
b. √34
c. more than 34
d. √8
e. 15
√34
Item 1: 2 x 8 = ____
Item 2: 5 x 6 = ____
Item 3: 4 x 10 = ____

This problem set can be described as:
a. a low coefficient alpha
b. a low interitem consistency
c. very heterogeneous
d. very homogeneous
very homogeneous
The most appropriate measure used to estimate error for tests scored with a degree of subjectivity would be:
a. scorer reliability
b. test-retest reliability
c. alternate form reliability
d. delayed alternate form reliability
scorer reliability
Which of the following allows for the evaluation of the interaction effects from different types of error sources?
a. heterogeneity theory
b. internal conflict theory
c. standard error theory
d. generalizability theory
generalizability theory
Which of the following is true of a reliability coefficient?
a. the higher the coefficient the better
b. test users must have a coefficient of .85 or higher
c. test users must have a coefficient of .65 or higher
d. there is a minimum
the higher the coefficient the better
Which of the following methods for estimating score reliability is prone to practice effects?
a. split half technique
b. Cronbach’s alpha
c. Alternate form reliability
d. Interval methods
Alternate form reliability
When a test is purposefully designed to include items that are diverse in terms of one or more dimensions, KR-20 and coefficient alpha will __________.
a. underestimate content sampling error
b. overestimate content sampling error
c. round
overestimate content sampling error
In test score data obtained from a large sample under standardized conditions, measurement error:
a. is eliminated and is no longer an issue
b. is assumed to be distributed at random
c. is more likely to influence scores in a positive dire
is assumed to be distributed at random
*If the class’s scores on a reading comprehension test varied due to individual familiarity of some passages, the most useful procedure for estimating this error would be:
a. scorer reliability
b. test-retest reliability
c. alternate for
alternate form reliability
Which of the following best demonstrates the benefits of delayed alternate form reliability?
a. it eliminates the confounding variable of practice effects that are problematic with coefficient alpha
b. it yields the same results, regardless of
it provides a good method for estimating time and content sampling error with a single coefficient
Which of the following is a true statement about the evaluation of reliability data?
a. small differences in the magnitude of coefficients of different tests are greatly significant
b. reliability estimates above 0.50 suggest that the scores de
estimates of error may of may not generalize to groups of test takers other than the original sample
________ is a statistic that represents the standard deviation of the hypothetical distribution if a subject were to take a test an infinite number of times.
a. standard error of the mean
b. standard error of measurement
c. standard error
standard error of measurement
Score reliability is considered to be a necessary, but not significant, condition for:
a. validity
b. accuracy
c. recency
d. significance
validity
A major disadvantage of G theory is:
a. it is more comprehensive and thus less accurate
b. it is overly used by the psychological testing population today
c. it requires multiple observations from the same group
d. it does not all
it requires multiple observations from the same group
_________ are hypotheticals and do not really exist.
a. True scores
b. Observed scores
c. Error score components
d. Raw scores
True scores
The IRT model emphasizes the ____________.
a. use for small scale testing
b. use of the whole test for reliability of error measurement
c. less precise responses by test takers
d. use of the individual test items for reliability and e
use of the individual test items for reliability and error measurement
Which of the following formulas is based on the idea that larger samples yield more reliable results, and is applied to rhh to obtain an estimate for the full portion of a split half test?
a. Cronbach’s alpha
b. Kuder-Richardson formula
Spearman-Brown formula
Time sampling error is the most likely to occur when measuring:
a. verbal ability
b. personality traits
c. personality states
d. psychological constructs related to ability
personality states
Which is true about Generalizability Theory?
a. it is often applied to developing new instruments
b. it requires multiple observations of the same group
c. it is not often used because it does not evaluate interaction effects of different
it requires multiple observations of the same group
If the scoring of a test involves subjective judgment:
a. an estimate of time sampling error is essential
b. the availability of alternate forms of the test is necessary
c. test selection decisions must be made on a case by case basis
scorer reliability must be taken into account
*With item response theory methods, reliability and measurement error are approached from the point of view of:
a. information function of individual test items
b. the test as a whole
c. the trait assessed by the test
d. the standard
information function of individual test items
*Delayed alternate form reliability coefficients can be used to evaluate ______ and ______ reliability.
a. interitem; content
b. content; time
c. time; test-retest
d. interscorer; interitem
content; time
To avoid administering the same test twice or developing alternate forms, ________ reliability can be used to test content consistency.
a. delayed alternate form
b. interrater
c. split half
d. test-retest
split half
Statistically significant differences may not necessarily be:
a. reliable
b. representative of true scores
c. psychologically significant
d. valid
psychologically significant
*When comparing a computer adaptive test (CAT) with a traditional test using item response theory:
a. the CAT is less reliable than a traditional test
b. the CAT can be shorter than the traditional test while still remaining reliable
c. a
the CAT can be shorter than the traditional test while still remaining reliable
What is the major complaint about the WISC-III as a revision of the WISC-R?
a. The changes were mostly cosmetic and did not reorganize the test theoretically.
b. The changes departed too dramatically from the WISC-R making the WISC-III nearly u
The changes were mostly cosmetic and did not reorganize the test theoretically.
Which of the following was not used to assess the reliability of the WIAT-II?
a. inter-rater reliability
b. test-retest reliability
c. parallel forms reliability
d. internal consistency reliability
parallel forms reliability
*Small standard error of measurement for the mathematics subtests of the WIAT-II imply:
a. smaller confidence intervals and lower reliability
b. smaller confidence intervals and higher reliability
c. larger confidence intervals and lower r
smaller confidence intervals and higher reliability
Luria’s main focus was to:
a. distinguish between the three blocks
b. show the functions that can be divided into the three blocks
c. show the integration and interdependence of the three blocks
d. create a one-to-one mapping of the
show the integration and interdependence of the three blocks
How does the Luria model test children from different ethnicities?
a. Uses three blocks to map the brain
b. Excludes tests of acquired knowledge
c. Has many subtests
d. Includes tests of acquired knowledge
Excludes tests of acquired knowledge
*Which of the following KABC-II scales would be used to test a four year old who is deaf?
a. Knowledge
b. Planning
c. Learning
d. CHC
Learning
*What two types of tests can’t have their reliability calculated by split-half procedures?
a. spelling of sounds and punctuation
b. punctuation and compatibility
c. speeded tests and multiple point scored items
d. language and math
speeded tests and multiple point scored items
Which measure on the WJ-III Tests of Achievement requires the examinee, within a three-minute period, to read and comprehend simple sentences and then decide if the answer is true or false?
a. Reading Comprehension
b. Letter-Word Identification
Reading Fluency
The WIAT-II test Math Reasoning evaluates the ability to
a. use nonverbal reasoning skills to solve abstract visual problems
b. solve single and multi-step math word problems
c. solve problems involving basic operations
d. complete
solve single and multi-step math word problems
*A distinctive feature of the Standford-Binet 5th edition is addition of
a. age-graded norms
b. non-verbal routing test
c. deviation IQ
d. composite scores for each subtest
non-verbal routing test
The first edition/model of the Stanford-Binet to introduce the new form L-M
was the

a. SB 3rd Edition
b. SB 4th Edition
c. SB 5th Edition
d. Revision of Terman’s Scale in 1937
SB 3rd Edition
The SB5 is the first intellectual battery to

a. use a deviation IQ
b. use a routing method
c. cover 5 cognitive factors
d. use the point-scale format
cover 5 cognitive factors
*David Wechsler’s main focus in creating the Wechsler Bellevue Intelligence
Scale in 1939 was to

a. provide an extensive psychological test for adults entering the
military during WW II.
b. design an instrument that would eva
go beyond global IQ scores to interpret more specific aspects of an
individual’s cognitive capabilities through the analysis of subtest
scaled scores.
Of the following, which is not one of the broad cognitive areas that tests and clusters of the Woodcock-Johnson-III Cognitive Tests are grouped into?
a. cognitive efficiency
b. thinking ability
c. written expression
d. verbal ability
written expression
Auditory Processing in the Woodcock-Johnson-III Cognitive Tests refers to:
a. measures of processing speed of auditory stimuli
b. the ability to discriminate between similar sounding words
c. the ability to analyze, synthesize, and discrim
the ability to analyze, synthesize, and discriminate auditory stimuli
Which of the following is not included in measures of auditory processing in the Woodcock-Johnson-III?
a. the ability to process distorted speech sounds
b. the time it takes to translate individual phonemes into whole words
c. phonetic cod
the time it takes to translate individual phonemes into whole words
Which of the following would be used to determine the reliability of speeded tests and tests with multiple-point scoring systems?
a. split-half procedures
b. Spearman-Brown formula
c. Rasch analysis
d. Standard error of measurement fo
Rasch analysis
Mean score & SD for T
m=50
sd=10
Mean score & SD for z
m=0
sd=1
Mean score & SD for CEEB
m=500
sd=100
Mean score & SD for IQ
m=100
sd=15
Mean score & SD for sub
m=10
sd=3
How to convert to a z score
#-mean divided by sd

X-Xbar/SD
Explain percentiles
look up
Explain confidence intervals
look up
Explain SEM
look up
Explain percentages under a curve
look up
*Any errors that occur when measuring discrete variable are due to_____.

a. measurement error
b. bias on the part of the administrator of test
c. an incorrect sample size
d. inaccurate counting
inaccurate counting
*Which of the following is true of correlation?

a. sign of the coefficient indicates the degree of relationship between 2 variables
b. high correlation implies a causal relationship between 2 variables
c. high correlation allows us
high correlation allows us to make inferences about variables' shared variance
*A raw score by itself:

a. can convey meaning
b. does not convey any meaning
c. can be used to make inferences about a construct
d. can give a percentile score
does not convey any meaning
*If an 8th grader scores at the 6th grade level on a reading achievement test, what does this mean? The student:

a. scored lower on the test than everyone else in his/her 8th grade class
b. scored within the same range as most of the 8th
got a score that matches the average performance of the 6th graders in the standardization sample
*Michael's score on a test is 60. The standard error of measurement of the test is three points. From this information, one may conclude that chances are about

a. 1 in 2 that his true score is included by the range of scores from 54 to 66
i dunno...guess
*In a normal distribution, a score in one standard devation above the mean. What is its appropriatw percentile rank?

a. 50
b. 75
c. 84
d. 95
e. 99
84
*What percentage of z-scores fall between -3.0 and 3.0 standard deviations?

a. 50%
b. 75%
c. 95%
d. 99%
99%
*The mean of a test is 39. You get a 44 and find it is equivalent to a T score of 65. What is the standard deviation of the test?

a. 2
b. 4
c. 6
d. 10
e. 15
4
*Restrictions of range results in _____ correlation coefficients.

a. no effect on
b. higher
c. lower
d. heigher weights for
lower
*On the Weschler intelligence tests, at each age level, approximately 68% of those tested will have IQ scores between

a. 90 and 110
b. 85 and 115
c. 70 and 130
d. 55 and 145
85 and 115

Deck Info

126

permalink