Research Design & Statistics, Test Construction

Start Studying!

Terms

undefined, object

copy deck

A researcher is interested in determining the effects of a new behavior modification program and a new drug in increasing the vocabulary of mentally retarded children. The various groupsof subjects will receive one of four dosages of the drug (placebo, 1: IVs: Behavior modification (levels = program and attention only) and drug(levels = placebo, 10mg, 20mg, and 30mg)

DV: Scores on the WISC-III vocabulary subtest
A researcher wants to know if high school and college students differ in their attitudes toward affirmative action. He divides both groups of students into African-American and Caucasian groups and administers the measure to each of the four groups.: IVs: School level (high school and college) and race (Caucasian and African-American)

DV: Attitudes toward affirmative action
A researcher wants to know if elderly individuals diagnosed with Alzheimer's disease differ from non-Alzheimer's elderly individuals in terms of a variety of physiological measures, including pulse rate, blood pressure, EEG patterns, kneww reflex, and wh: IV: Diagnosis (Alzheimer's and no Alzheimer's)

DVs: Pulse rate, blood pressure, EEG patterns, kneww reflext, and white blood cell count
A therapist devises a new form of psychotherapy. She obtains separate samples of depressed, anxioux, psychotic, and personality-disordered clients. She then randomly assigns subjects into groups that will receive either her new form of therapy, tradition: IVs: Pathology (depressed, anxious, psychotic, personality-disordered), therapy (new, psychoanalysis, cognitive, humanistic), and income (high, medium, and low)

DVS: Wais full scale IQ scores,BDI, BAI, MMPI paranoia and Schizophrenia scale scores.
What is the threat to internal validity that is most salient? In a research study testing the effects of a new strategy for increasing short-term memory, subjects have to wait three hours in a classroom before the study begins. As a result, th: Maturation
What is the threat to internal validity that is most salient? A drug designed to improve emotional and psychosocial functioning is administered to severely depressed individuals.: Testing
A film designed to increase the racial awareness of white college students is shown the day after a leading civil rights activist spoke at the college.: history
A new diet designed to help obese subjects lose weight is studied. Subjects are to be weighed before and after they go on the diet. Before they are weigned for the second time, the scale breaks.: instrumentation
A study is conducted to test the effectiveness of Academic Review's workshops. Most of the subjects in teh study have taken the psychology licensing exam before.: testing
In a study conducted to test the effeectiveness of a new reading strategy on reading comprehension, the first 20 subjects who sign up are assigned to the experimental group and the second 20 subjects who sign up are assigned to the control group.: selection
Cues in the experimental setting that allow subjects to guess the research hypothesis.: demand characteristics
The effect that an experimenter's expectancy has on the results of a research study.: Rosenthal Effect
A procedure designed to ensure that all subjects in the population of interest have an equal chance of being chosen to partcipate in a research study.: random sampling
The tendence of subjects' behavior to change due to the attention received in a research setting.: Hawthorne Effect
A procedure designed to ensure that all subjects in a research study have an equal probability of ending up in each of the treatment groups.: Random assignment
What is a statistical method of controlling for the effects of an extraneous variable?: ANCOVA
What is a procedure that involves grouping subjects who are similar in terms of their status on an extraneous variable and then assigning the members of each group to different treatment groups?: matching
What is a procedure in which a population is divided into "sub-populations" and all members of each sub-population have an equal probability of being chosen to participate in the research study?: stratified random sampling
What is a study of the effects of aging on psychosocial adjustment that involves comparing older, middle-aged, and younger subjects at one point in time?: cross-sectional research
What is a study of the effects of aging on psychosocial adjustment that involves comparing older, middle-aged, and younger subjects at different points in time?: cross-sequential research
What is a single subject design in which the treatment is withdrawn to determine whether dependent variable scores revert to baseline levels?: reversal design
What is an in-depth study of a single individual, institution, group, or phenomenon?: case study
What involves manipulated variable(s) and random assignment?: true experimental research
What involves manipulated variables and non-random assignment?: quasi-experimental research
What is a study that involves obtaining dependent variable scores from a group of subjects on multiple occasions at regular intervals?: time-series design
What is a study of a single conduct-disturbed adolescent in which the effectiveness of a new behavior modification program is assessed first at school, then at home, and then in the community?: Multiple baseline design
What is a study assessing the association between SAT scores and college GPA?: correlational research
What is a study of the effects of aging on IQ scores in which one group of subjects is examined for 30 years?: longitudinal research
What is the greatest threat to validity when a mail survey is conducted? a. selection b. maturation c. randomization d. instrumentation: A. Selection - in a mail survey, subjects self-select themselves into the study; i.e., in deciding whether or not to mail the survey back, they decide who the study's participants will be. Thus, selection poses a threat to any mail survey's validity.
Which of the following statements regarding the different types of developmental research (longitudinal, cross-sectional, cross-sequential) is most true? a. Cross-sequential and longitudinal studies are particularly vulnerable to "cohort&: D. A cross-sequential study, like a cross-sectional study, involves studying groups of subjects that are divided on the basis of age. And, like a longitudinal study, it involves examining the subjects for a period of time, though this period is shorter in cross-sequential studies. Cross-sequential studies, because they involve studying subjects over time, control for "cohort" effects, which often confound cross-sectional studies. And, because theya re shorter than longitudinal studies, there is less cost in terms of time and money, and less subject drop-out.
The defining feature of true experimental designs is a. random selection of subjects from the population b. random assignment of subjects into experimental groups. c. the use of manipulated variables d. the use: B. In a true experiment, variables are manipulated and subjects are randomly assigned to treatment groups. Choice C is incorrect because other types of designs (e.g., quasi-experiments) also use manipulated variables.
Francis Galton's concept of regression to the mean is best expressed by which of the following statements? a. Individual variation within a species is unlimited. b. short fathers have tall sons. c. Short fathers have tall: C. Regression to the mean refers to the tendency of extreme observations to be less extreme upon re-testing or re-observation. Francis Galton applied this concept to heredity. He concluded that due to regression to the mean, individual variation in the species is limited (the opposite of choice A). That is, since extreme individuals will likely have less extreme offspring (e.g., short fathers are likely tohave taller sons), the characteristics of a species can only vary within a limited range. Note that you did not need to know anything about Galton to answer this question. You just needed to apply what you know about regression to the mean to a new situation.
All of the following are true of multiple baseline designs, except a. a treatment is sequentially applied. b. they may serve as a substitute when the ABAB design is unethical. c. They may involve studying the same treatme: D. A multiple baseline study is a single-subject study that involves the sequential application of a treatment across different baselines (i.e., behaviors, settings, or individual subjects). Unlike a reversal design, a multiple baseline design does not entail withdrawal of the treatment.
The major threat to the internal validity of one-group time-series design is a. maturation b. regression to the mean c. history d. testing: C. A one-group time-series design involves administering multiple pretests and posttests to one group of subjects before and after a tx is administered. The design controls for many threats to internal validity, such as maturation, testing, and statistical regression. The major threat 2 its internal validity is hx, or an external event that occurs at right about the same time the tx is administered.
A major advantage of case studies is that they a. can be used to identify variables for future research. b. involve the study of only one individual c. allow one to draw conclusions about the causal relationship between two or: A. Although case studies cannot tenably be used to identify causal relationships between variables, they are often useful as pilot studies to identify variables and hypotheses for further investigation.
A major disadvantage of case studies is that they a. can never be used to identify variables for future research b. involve the study of only one individual c. do not permit conclusions to be made about causal relationshi: C.do not permit conclusions to be made about the causal relationship between variables.
Data collected in research studies can be classified into four types:: nominal, ordinal, interval, ratio
A(n)_________ scale of measurement contains unordered categores; examples include gender, DSM dx, and haircolor.: nominal
______ data is quantified into ordered categores; however, with such data, ir is impossible to determine the distance tetween data point. Examples include ranks and points on an attitude scale.: ordinal
____ data are continuous data in which the distance between successive data points is equal across the scale. However, there is no absolute zero point; as a result, multiplication and division with such data are not possible. Examples include IQ scores a: interval
Finally, _____ data is the same as interval data except that it includes an absolute zero point and mult and division can be performed.: ratio
In a ____ distribution, most observations (i.e., scores) fall in the middle of the distribution, with fewer and fewer cases as one moves farther away from the middle.: normal
In a _________ distribution, most scores fall at the high end of the distribution, with a few extreme scores falling at the low end.: negatively skewed
In a _______ distribution most scores are low and a few extreme scores are high.: positively skewed
in a _______ distribution, the mode is higher than the median, which is higher than the mean.: negatively skewed
In a _____ distribution, the mean is higher than the median, which is higher than the mode.: positively skewed
In a _____ distrubition, the mean, the median, and mode are all equal.: normal
The variance is a measure of the ______ of a distrubution.: variability (or dispersion, or spread)
The standard deviation, a measure of the same property, is obtained by taking the ________ of the variance.: square root
A Z-score is an individual score expressed in terms of standard deviation units above the mean. For example, in a distrubition with a mean of 80 and a standard deviation of 2, a score of 76 would be equivalent to a Z-score of ____, and a z-score of +3.0: -2.0; 86;
The formula for a Z-score is ________, where X = ______, M = ________ and s.d. = _______.: (X-M)/s.d.
X = raw score
M = mean
s.d. = standard deviation
A percentile rank is a transformed score that reflects the percentage of scores falling ______ the corresponding raw score. For example, a PR of 80 is higher than ____% of the other scores in the distribution; it also could be said to be in the top ____%: below
20%
20%
By definition, percentile ranks have a _____ distribution; for example, in any distribution, the number of scores falling between the values of 10 and 20 is equivalent to the number of cases falling between 80 and 90.: flat (or rectangular)
Therefore, almost all transformations of raw scores to percentile ranks would be termed ______ since they would involve a change of the original distribution's shape.: nonlinear
In a normal distribution, approximately ____% of scores fall between the z-scores of +1.0 and -1.0.: 68%
About ___% of scores fall between the z-scores of -2.0 and +2.0.: 95%;
Say that 1,000 people take the WAIS-III, on which the mean IQ is 100 and the standard deviation is 15. About 680, or ___%, will obtain z-scores beteen ____ and ____; i.e., they will obtain IQs between ____ and ____. And about 950, or ____%, will obtain z: 68% -1.0 +1.0
85 and 115
95% -2.0, +2.0
70 and 130
In a normal distribution, it is possible to determine the z-score equivalents of given percentile rank points. For example, a z-score of +1.0 is equivalent to a percentile rank of about _____, and a percentile rank of 98 is appriximately equivalent to a: 84; +2.0
If you had a test with a mean of 25 and a standard deviation of 5, you would set the cutoff score at ____ if you wanted to select the top 16% of examinees and at ____ if you wanted to select the top 2% of examinees.: 30
35
A person receives a score of 90 on a test with a mean of 100 and a standard deviation of 5. The corresponding z-score is ____, the corresponding T-score is _____, and the corresponding stanine score is approximately ____.: -2.0
30
1
2
In addition, if the distrubution is normal, we would know that the corresponding percentile rank is around ____.: 2

2%
If the score were converted to a WAIS-III IQ score (mean = 100, s.d. = 15), the new transformed score would be ____. And if the score were converted to an ETS score (i.e., SAT and GRE score, mean = 500 and s.d. = 100), the new transformed score would be: 70
300
If you convert raw scores to z-scores you would be conducting a a. linear transformation because the shampe of the distribution changes b. linear transformation because the shape of the distribution does not change: b. When raw scores are converted to z-scores, the shape of the distribution does not change. For instance, if the distribution of raw scores is normal, the distribution of the corresponding z-scores will also be normal. When transformed scores retain the same shape as the original distribution, the transformation is said to be "linear."
Eight students take a math test and obtain the following scores: 80, 53, 39, 32, 45, 72, 28, 49. The median score of this distribution is:: To answer thie question, first arrange the numbers in numerical order: 28, 32, 39, 45, 49, 53, 72, 80. To obtain the median, you must take the mean of the two middles scores (45 and 49), which is 47.
One thousand people take a job selection test that has a mean of 60 and a standard deviation of 5. An industrial psychologist wants to select the top 150 scorers. Assuming a normal distribution of scores, she would set the cutoff score at approximately:: First, you have to recognize that the "top 150" is equivalent to the top 15% (150/1,000 = 15/100 = 15%). Then, you have to remember that, in a normal distribution, 16% of all scores will fall at or above a z-score of +1.0. Finally, you have to convert the raw score in the question to a z-score of +1.0. In this case, a score of 65 is one standard deviation above the mean and therefore is equivalent to a z-score of +1.0.

You might have been thrown by the fact that you were looking for the top 15% even though the standard deviation curve only allows you to identify the cutoff score for the top 16%. If so, you might remind yourself at this point to work on the exam with rounded-off numbers (actually, you'll have no choice, since no calculators will be allowed). Fifteen percent is close enough to 16^ for you to use the standard deviation curve.
Judy and Johnny are students in a school district that is administered a standardized mathematics test. Judy scores in the 48th percentile on the test, while Johnny score in the 93rd percentile. Scores on the test are normally distributed. A few weeks af: The answer to this question is related to the fact that, in a normal distribution, there are more scores in the middle of the distribution that at either extreme. As a result, the percentile rank range in the middle of the distribution is much wider than it is at either end of the distribution. Thus, any change to a raw score in the middle of the distribution results in a greater percentile rank change that the same raw score change at the distribution's extremes. In this case, Judy originally scored at the 47th percentile, which is near the middle of the distribution, while Johnny scored at the 93rd percentile, or at the high end of the distribution. Therefore, adding three points to their raw scores willresult in a greater increase in Judy's percentile rank than in Johnny's - due to the change, Judy will "jump over" a greater percentage of other students than will Johnny.
The deviation of a sample statistic from a parameter of the population from which the sample was drawn?: sampling error
The probability of rejecting a true null hypothesis?: alpha
The probability of retaining a false null hypothesis?: beta
The probability of rejecting a false null hypothesis?: power
A researcher hypothesizes that students who sleep with their textbooks under their pillow score higher on the GRE than students who don't. He obtains a sample of 20 students and assigns 10 to the "books under pillow" group and 10 to the "n: Type 1 Error
A researcher hypothesizes that cog therapy is superior to other forms of therapy in the tx of anxiety. She fails to find any evidence that cog therapy is superior. However, in reality, cog therapy is the superior tx. What error did she make?: Type II Error - accepted a false null hypothesis
In statistical hypothesis testing, because we cannot study the entire population, sample values are used to estimate population vales (a value obtained from a sample is referred to as a(n) _____, while a value obtained from a population is referred to as: statistic; parameter
The discrepancy between a sample value and the corresponding population value is referred to as ______.: sampling error
The mean is one example of a population value that is estimated on the basis of sample data. The expected discrepancy between a sample mean and a population mean is referred to as the _____.: standard error of the mean
The formula for the standard error of the mean is s.d./square root of N, where s.d. equals ____ and N equals ____.: standard dev
sample size
The ______ hypothesis of most research studies posits that there is no relationship between the independent variable(s) and the dependent variable(s).: null
The null hypothesis is usually stated in terms of population ____; an example would be "the mean of population A is equal to the mean of population B.": parameters
The ______hypothesis usually posits that there is a relationship between the independent variables and the dependent variables. This hypothesis can either be _______ (e.g., one pop mean is diff from the other) or ______ (e.g., one pop mean is greater tha: alternative
nondirectional
directional
In statistical decision-making, four outcomes are possible: two are correct decisions, and two are errors. One type of correct decision would be to ____ a true null hypothesis. A second type of correct decision, the goal of research, would be to ___ a fa: retain
reject
power
One of the incorrect decisions would be to retain a ______ null hypothesis. This is referred to as a(n) ____ error, and the probability of making it is known as ______.: falst
Type II
beta
The other incorrect decision would be to reject a(n) _____ null hypothesis. This is referred to as a(n) _____ error and the probability of making it is known as ____: true
Type I
alpha
_____________statistical tests are used to test statistical hypotheses when the dependent variable is measured on an interval or ratio scale. Such tests make two assumptions: 1)_____ and 2) ______________.: parametric
normal distribution of data
homogeneity of variance
Methods designed to test statistical hypotheses when the dependent variable is measured on a nominal or ordinal scalre are referred to as ________. These tests don't make the same assumptions as _____ tests. However, tests in both categories do assume th: nonparametric
parametric
representative
population
In a research study with 400 subjects, the standard deviation of scores on the dependent variable is 20. In this case, the standard error of the mean is:: B. The standard error of the mean is equal to the standard deviation divided by teh square root of the sample size. The square root of 400 is 20. Thus the standard error of the mean in this case is 20/20, or 1.
The standard error of the mean is a. directly proportional to the standard deviation and inversely proportional to the sample size. b. directly proportional to the standard deviation and directly proportional to the sample size.: A. As the population standard deviation increases, the standard error of the mean increases; in other words, the standard error of the mean is directly related (i.e., directly proportional) to the standard deviation. And as the sample size increases, the standard error of the mean decreases; in other words, sample size and the standard error of the mean are inversely related.
Which of the following assumptions is shared by both parametric and nonparametric tests? a. normal distribution of data b. homogeneity of variance c. random assignment of subjects to experimental groups d. rand: d. Both parametric and nonparametric tests are inferential statistical methods. This means that they are used to draw conclusions about a population on the basis of information derived from a sample. For these conclusions to be unbiased and accurate, a sample must be representative of the population from which it is drawn. The best way to ensure that a sample is representative is to randomly select subjects from the population of interest.
When a statistical test lacks power, this means that a. the prob of making a TYpe I error will be high. b. the prob of making a Type II error will be low. c. The prob of obtaining statistical significance will be low.: c. When a statistical test lacks power, this means that there is a high prob of a Type II error, or that a false null hypothesis will be retained; i.e., the test will be unable to detect a true effect of an independent variable on a dependent variable. Put another way, the test will not yeild statistical significance (a finding of an effect) when it should.
Alpha can be defined as a. the prob of rejecting the null hypothesis when the hull hypothesis is true. b. the prob of retaining the null hypothesis when the null hypothesis is true. c. the prob of rejecting the null hypot: A. Alpha is the prob of making a Type I error, which is defined by choice A. In the Eng lang, this means that alpha is the prob that a statistical test will falsely tell you that your independent variable has an effect, when, in the population, it does not.
Which of the following would have the least meaning? a. retaining the null hypothesis when power is low. b. rejecting the null hypothesis when power is low. c. retaining the null hypothesis when power is high.: A. When power is low, a statistical test is unlikely to detect an effect of an independent variable, even when one is present in the pop. In other words, the null hypothesis (the hypothesis of no effect) is likely to be retained. In such cases, when you retain the null, it does not necessarily mean that you have done so corerectly; it could just be that the test lacked the power to correctly reject the null (i.e., to detect a true effect). So retaining the null with low power doesn't really tell you anything.
Subjects take the BDI before and after a six week trial period on the drug?: t-test for correlated samples
Instead of taking the BDI, subjects are either classified by raters as "treatment successes" or "tx Failures.": chi-square
For control & experimental subjects, score on the MMPI's depression scale are obtained in addition 2 those from the BDI. Stat test?: chi-square
Subjects r randomly assigned 2 either the control (no-drug) or the experimental (drog) group. Stat test?: ANCOVA
The researcher is interested in deptermining if the effects of the drug r different at diff levels of symptom severity (highly depressed, mod depressed, and not depressed). Stat test?: factorial ANOVA
Subjects r randomly assigned to either the control or the experimental group & scores on the BDI are converted to ranks. Stat test?: Man-Whitney U
The mean score of subjects who take the drug is compared to the pop mean for depressives on the BDI. Stat test?: t-test for single sample
Subjects r assigned to 1 of 4 groups: high dosage, mod dosage, low dosage, and control. Stat test?: one-way ANOVA
Scores on the BDI r adjusted so that variability accounted 4 by the subjects' scores on a test of self-esteem is removed.: ANCOVA
The magnitude of the F-ratio for a one-way ANOVA depends on the ratio between 2 sources of variance in a set of dependent variable scores. If _____ variance significantly exceeds ______ variance, then the F ratio will be high and the null hypothesis will: between group
within group
rejected
If the ______ variance equals or exceeds _____ variance, then the F ratio will be low, and the null hypothesis will be (rejected/retained). The F-ratio is a fraction with _____, a measure of ________ variance in the numerator. And this fraction has _____: within group
between group
retained
MSB
between group
MSW
within group
In studies with more than one independent variable, a(n) _____ effect occurs when the effects of one independent variable do not generalize to all the _____ of one of the other independent variables. A _____ ANOVA provies an indication of the strength of: interaction
levels
factorial
main
The nonparametric alternative to a t-test for independent samples is the a. Kruskal-Wallis B. Wilcoxon matched paird. c. Mann-Whitney U d. t-test for correlated samples: Mann-Whitney - When a study involves a comparison of two independent groups and interval or ratio data, the t-test for independent samples would be used to compare the means of the two groups. If the assumptions of a parametric test are violated, the data would be converted to ranks and the Mann-Whitney would be used. Mann-Whitney U is the nonparametric alternative to teh t-test for independent samples.
An advantage of using a MONOVA instead of multiple one-way ANOVAs is that a. a MANOVA is computationally simpler b. Multiple ANOVAs cannot be used when a study involves more than one dependent variable c. the probability of ma: B. This study has one independent variable (training) with more than two levels (teacher training, computer training, no training). Thus the appropriate statistical test is the one-way ANOVA.
The use of which of the following post-hoc tests results in the greatest probability of making a Type II error? a. Tukey b. Scheffe c. Fisher's LSD d. Neuman-Keuls: Scheffe - Of all the posthoc tests, teh Scheffe is the most conservative, which means that it provides the greatest protection against a Type I error. However, since there is a trade-off between Type I and Type II errors, this also means that its use results in the greatest probability of making a Type II error (i.e., missing an effect).
When a factorial ANOVA yields a significant main effect and a significant interaction effect, a. the main effect should be ignored b. the main effect should b interpreted in light of the interaction effect c. the interaction effect s: B. Whenever both a main effect and an interaction effect exist, the main effect must be interpreted in light of the interaction effect. This is because the interaction means that the main effect does not hold true in all cases (i.e., at all levels of another independent variable).
A researcher is interested in the correlation between gener and homeownership - stat corr?: phi coefficient
A researcher is interested in the correlation between gener and scores on the BDI?: point-biserial coefficient
a researcher is interested in the correlation between scores on the BDI and IQ scores on the WAIS. Scores of 20 or above on the Beck are reported as "depressed," whereas scores below 20 are reported as "not depressed.": biserial coefficient
A researcher is interested in the correlation between DSM diagnostic category and political party.: contingency coefficient
A researcher is interested in the correlation between motivation and scores on a prof licensing exam. She wishes to statistically remove the effects of IQ on this relationship.: partial correlation
A procedure designed to assess the causal interrelationships among three or more variables.: path analysis
A researcher is interested in using annual income in dollars to peduct scores on a measure of happiness in the elderly.: simple regression
A researcher is interested in using income, an index of support system adequacy, and an index of overall health to predict scores on a measure of happiness in the elderly.: multiple regression
A researcher is interested in the degree to which the combination of income, scores on an index of support system adequacy, and scores on an index of overall health is related to the combination fo scores on three measure of happiness.: canonical correlation
A personnel department will reject all applicants who do not demonstrate a minimum level of proficiency of five tests of aptitude.: multiple cutoff
A gambler is interested in the correlation between racehorses' finishes in their first and seconf races.: Spearman's rho
The term "least squares criterion" describes the principle that underlies a. calculating a Pearson r correlation coefficient. b. constructing a regression line. c. determining whether multicollinearity in a multiple regress: B. The regression line is placed at a location in the scattergram that ersults in the lowest possible sum of squared deviations of points from the line. This principle is known as the "least square criterion."
A researcher is interested in the correlation between scores on a standardized intelligence test and elementary school grades. For her research, she has access to students in a local elementary school. To obtain the highest possible correlation coefficie: C. A correlation coefficient will be lowered if one uses only a restricted range of scores on any of the variables involved. In other words, it is best to utilize the full range of scores, which can be obtained from a random sample of students.
When using multiple regression, a researcher would be best advised to choose predictors that a. high a high correlation with each other and a high correlation with the criterion. b. have a low correlation with each other and a low correla: D. In a multiple regression equation, a migh correlation between the predictors and the criterion is necessary; otherwise, it would be impossible to use the predictors to estimate scores on the criterion. And low intercorrelations among predictors are desirable, so that the predictors are not providing redundant information.
A researcher is interested in the relationship between three predictors and a criterion. One of the predictors has a correlation of .55 with the criterion. which of the following statements is true of the multiple correlation coefficnet (multiple R) for: A. A multiple correlation coefficient can be no lower than any of the individual correlations between a predictor in the equation and the criterion.
The correlation between psychosis and IQ scores would best be assessed using which of the following corrrelation coeficients?: B. To measure the correlation between an artificialdichotomy and a variable measured with interval or ratio data, one would use the biserial correlation coefficient.
Test A has a correlation of .60 with Test B and a correlation of .30 with Test C. Test A accounts for ____ as much variability in Test B as it does in Test C. a. twice b. three times c. four times d. eight times: To determine how much variability in one measure is explained by variability in another, one squares the correlation coefficnet. The square of .60 is .36, and .30 squared is .09. C is therefore correct because .36 is four times greater than .09.
Which of the following describes a correlation of 0.0 between "x" and "Y"? a. The variability of Y scores at each X value is lower than the total variability of Y. b. The variability of Y scores is diff at diff levels: C. To answer this question, u have to look closely at the wording of ea choice and translate ea into everyday English. What C is saying is that the range of "Y" at every individual "X" score will be equal to the entire range of "Y." For example, let's say that scores on both "X" and "Y" can range from 1 to 10. Say that people who get a score of 1 on "X" score anywhere from 1 to 10 on "Y." And those who get a score of 2 on "X" score anywhere from 1 to 10 on "Y." And so on, for all values of "X." One's score on "X" doesn't provide any info about Y, which means that the correlation is O. If you go through the other choices and try to make sense out of them, Choice A is the converse of choice C and therefore describes a correlation that is greater than O. Choice B describes heteroscedasticity, and choice D describes homoscedasticity.
According to the central limit theorem, a. As sample size inc, the shape of the samplind dist of means will appropach a normal shape only if the underlying pop dist is normal b. as sample size inc, the shape of a sampling dist of means wi: B. According to the central limit theorem, the shape of a sampling distribution of means will approach normality as sample size increases. This is true regardless of the shape of the dist of the value in the underlying pop.
The standard deviation of the sampling dist of means is also known as a. the standard error of estimate b. the standard error of measurement c. the standard error of the mean d. the standard error of the day: c. This is the definition of the standard error of the mean.
A difference between meta-analysis and a literature review is that a. meta-analysis involves calculation of an "effect size." b. a lit review is likely to include fewer studies than a meta-analysis c. a lit review has: A. Unlike a traditional lit review, a meta-analysis involves calculation of an effect size. This allows one to estimate the overall effects, across many studies, of a particular tx or independent var.
A one-way ANOVA would be most robust if a. the shape of the underlying pop data is skewed b. there are many levels of the independent variable c. sample size is small d. sample size is large: D. A stat test is said to be robuse when its results tend to be accurate even in the face of mod violations of its assumptions about the pop data. The larger the sample size, the more robust stat tests tend to be, especially with regard to the normal dist of data assumption.
A researcher conducts a study using a time-series design, consisting of a pretest phase, in which the same test is administered five time; a treatment; and a posttest phase, in which the test is admin five more times. The researcher analyzes his results: C. Due to autocorrelation, standard parametric tests such as the t-test cannot be used in the analysis of time series data. Instead, one must use special techniques designed for the purpose of time-series analysis.
In a normal dist of scores, a T-score of 60 is approx equal to a percentile rank of a. 60 b. 68 c. 84 d. 95: C. T is a standard score with a mean of 50 and a standard deviation of 10. Thus, a T-score of 60 is equal to 1 s.d. above the mean. In a normal dist, this is equivalent to the 84th percentile.
The results of an experiment indicated no significant differences at the .05 level. This means that a. the null hypothesis is not rejected b. the null hypothesis is rejected c. the alternative hypothesis is accepted: A. When the results are not significant, you do not reject the null hypothesis. That is, you cannot conclude that the IV had an effect.
3. Assuming a norm dist, how many people would score between 400 and 600 on a standardized test with a mean of 500 and a standard deviation of 100 (N=1000)?: B. First convert the scores to standard deviation units (ie, Z scores). A score of 400 is equivalent here to -1z, and a score of 600 equals +1z. Then, remember that 68% of cases fall between -1z and +1z in a norm dist. Finally, take 68% of 1,000, which is 680.
4. Which of the following correlations is the highest? a. _.50 b. .05 c. .41 d. .23: A. When determining which correlation is larger, you ignore the sign and just look for the bigger number
5. If two variables are positively correlated, this means that a. as one goes up the other goes down. b. as one goes up the other stays the same c. as one goes up the other goes up d. their means are equal: C. A positive correlation between two variables means that both move in the same direction
5. In the F ratio, within-group variance, as measured by MSW, reflects a. variance accounted for by random and irrelevant factors b. the difference between the sample and the population means c. variance due to the effect of the inde: A. In the F ratio, within-group variance is error variance (in face, MSW, the index of within-group variance, is sometimes referred to as the "error term"). This means that it measures variability due to irrelevant random factors such as pre-existing individual differences between subjects.
7. For a given pop, which of the following score distributions will ikely have the least variability? a. the pop dist b. a dist of a sample of means from the pop c. a dist of a sample of 10 scores from the pop d. a dist of a sample of: B. A sample of means from a population always has less variability than the pop or any one individual sample does. This is illustrated pictorially int he Appendix on Advanced Statistics.
7. For a given pop, which of the following score dist will ikely have the least variability? a. the pop of a dist b. a dist of a sample of means from the pop c. a distribution of a sample of 10 scores from the pop d. a dist of a: B. A sample of means from a pop always has less variability than the pop or any one individual sample.
8. The statement most true of nonparametric tests is that they a. require data scaled on an interval or ratio basis b. are more powerful than parametric tests c. rely on pop parameters to draw conclusions about sample stats d. are use: D. Unlike parametric tests, the use of nonparametric tests does not require any assumptions about the shape of the pop dist
10. In a study in which a one-way ANOVA is used, the null hypothesis would be that a. sample variances are equal b. pop variances are equal c. sample means are equal d. pop means are equal: D. An ANOVA is designed to test the hypothesis that group means were drawn from the sa pop; i.e., that means are equal in the pop.
An experimenter is testing the hhpothesis that there is no diff between teaching methods in regards to the grades obtained by the students on an arithmetic test. His design calls for two groups - trad teaching method vs programmed self-instruction. He us: B. If the results are significant at teh .01 level, then you reject the null and conclude that the alternative is true.
12. If a sample of 400 is taken from a pop, and you find that the mean of this ample on some standard test is 50 and the standard deviation is 10, the standard error of the mean would be a. 20.0 b. 10.0 c. .50 d. 5.0: C. To get this one correct, you'll need to know the formula for the standard error of the mean. The standard error of the mean equals the standard deviation of the sample divided by the square root of sample size. In this case you'd take 10 (the standard dev) and divide it by 20, and you'd get the answer of .50.
13. All of the following are true of path analysis, except a. an a priori path is drawn connecting two or more variables in a causative direction. b. teh magnitude of the relationship between variables is determined by thier correlation c: D. Path analysis is a method designed to determine or confirm causative relationships among variables via correlations. Hence, you wouldn't actually manipulate variables; you'd only measure their degree of relationship.
14. In a normal dist of scores, the number of cases falling between a percentile rank of 11 and 20 will be _____ the number of cases falling between a percentile rank of 41 and 50.: A. The distribution of percentile ranks is, by definition, flat. This means that the sa number of scores will fall between equal intervals. In this case, 10% of scores will fall within the ranges identified.
15. The phenomenon whereby an experimenter's expectancies influence subjects' responses on a dependent variable in the direction predicted is known as a. the hawthorne effect b. demand char c. the carryover effect d. the Rosenth: D. It's called the Rosenthal effect bec it was first reported by Robert Rosenthal.
16. In a study that invludes one group that is tested on an intervally-scaled dependent variable before and after it receives tx, what stat test would be used to compare the obtained means? a. t-test for single sample b. t-test for correl: B. To compare two means obtained by correlated samples (e.g., the same grou) one would use the t-test for correlated samples.
If a study such as the one described had 40 subjects, degrees of freedom would be equal to a. 19 b. 38 c. 39 d. 78: C. In the t-test for correlated samples, the degrees of freedom equal N-1. Since there are 40 subjects, there will be40 pairs before & after of scores.
A one-group pretest/posttest design is susceptible to many threats to internal validity, including... a. hx b. maturation c. statistical ergresion d. all of the above: hx, maturation, statistical regression
19. Why might the use of a factorial ANOVA be preferred over the use of separate one-way ANOVAs? a. The use of a factorial ANOVA reduces the prob of making a TYPE II error. B. A factorial ANOVA allows one to assess for interaction effects: B. If you have multiple independent variables, you can use either mult one-way ANOVAs or one factorial ANOVA. An advantage of the latter is that it allows you to measure interaction effects.
20. A mall owner is interested in determining whether shoppers are equally likely to use the east, north, south, and west entrances to the mall. Which of the following stat tests would be most helpful? a. chi-square b. one-way ANOVA c. fac: A. In this case, the data will consist of frequency of observations within categories. The Chi-square
21. If the mall owner in the above question sampled 100 customers, the expected frequency in each cell under the null hypothesis would be a. 20 b. 25 c. 50 d. more than 25 but less than 50.: B. If the null hypothesis is true, the four entrances are used with equal freq. Thus, if 100 customers are sampled, 25 would be expected to use ea entrance.
A job applicant takes five tests. His performance is considered excellent on four of the tests but slightly inadequate on the fifth. If the procedure known as multiple cutoff were used to make hiring decisions, this company would a. place the app in: D. When the mult cutoff procedure is used, an examinee must demonstrate the minimum level of proficiency on all the predictors that are administered. He is not selected.
A researcher is int in the assoc between IQ and happiness. He uses mult measures of both of these attributes. What stat analysis is the researcher likely to use? a. mult regression b. path analysis c. canonical corr d. partial corr: C. Canonical correlation is the appropriate stat method to correlate multiple predictors with multiple criterion measures.
24. In a study involving three groups, the variability in scores of the groups differs. The robustness of the parametric stat test used to analyze the data from this study would be enhanced if a. alpha is set at a high level b. the grp with the: C. A stat test is said to be robust when its results tend to be accurate even in the face of moderate violations of its assumptins about the pop data. In this case, the homogeneity of variance assumption is violated. When violated, the stat test tends to remain tobust as long as the groups' sample sizes are equal.
25. All of the following statements are true of forward stepwise multiple regresion analysis, except: a. the technique is useful in dealing with the prob of redundancy in a set of predictors b. the technique allows a researcher to add predictor: D. Forward stepwise regression is a technique that allows a researcher to choose a smaller set of predictors out of a larger subset. When the technique is used, the predictor with the highest correlation with the criterion is the first one retained for the final equation. Choices A, B, and C are true statements about forward stepwise regression.
26. Bayes' theorem is associated with a. sample size and inferential stats b. conditional prob and base rates c. the normality assumption in the central limit theorem d. meta-analysis and effect sizes: Bayes' theorem is used to revise conditional probabilities based on base rates - B
27. A t-score of 70 corresponds to a. the 70th percentile b. the 90th percentile c. the 98th percentile d. 3 standard deviations above the mean: c. A T-score of 70 is two standard deviations above the mean (the mean of a T-score distribution is 50; standard deviation is 10). When any score is two standard deviations above the mean, 98 percent of the dist is below that score. In this case, 98 percent of the scores is below a T-score of 70, in other words, the 98th percentile.
1. A psychological tst can be devined as a(n) _____ and _____ measure of behavior.: objective
standardized
The process of _____ involves ensuring uniformity of administration and scoring of the test. This proces includes obtaining ______, which represent the score of a larger representative sample of the pop for which the test is intended.: standardization
norms
norms
Interpreting a test score by comparing it to _____ allows us to determine how a given score by comparing it to others of the same pop who have taken the test.: norms
A good test will be ______, which means that it will provide repeatable, consistent results. It will also be _____, which means that it will measure what it purports to measure.: reliable; valid
A(n) _____ test is one in which the examinee's response rate is assessed. A(n) _____ test is one that assesses the level of difficulty an examinee can attain.: speed; power
A9n) _____ test uses the examinee himself as the frame of reference in score interpretation. It indicates which attributes are weakest and strongest within the individual.: ipsative
A9n) ____ effect occurs when a test is unusually difficult, and many test-takers score at or near the bottom of the scale.: floor
The defining characteristic of an objective test is a. the existence of norms b. a standardized set of scoring and administration procedures c. examiner discretion in scoring and interpreting items. d. reliability and validity: B. An objective test is one that is independent of the subjective judgment of the particular examiner. This means that administration and scoring procedures are uniform, or the same for all examiners.
A test developer administers an intelligence test to a group of examinees on oct. lst and then administers the same test to the same group of examinees on nov. lst. Most likely the examiner is interested in a. assessing the test's reliability.: A A test is reliable if it provides repeatable, consistent results. Giving the sa test to the sa group of examinees at diff points in times is one way to assess a test's reliability.
A drawback of norm-referenced interpretation is that a. a person's performance is compared to the performance of other examinees b. it does not permit comparisons of individual examinees' score on diff tests c. it does not indicate w: D. norm-referenced interpretation involves comparing an examinee's score to the scores of others who have taken the same test. A drawback of this type of interpretation is that it does not provide abosolute standards of good or poor performance - the examinee's score must be interpreted in light of the performance of the norm group as a whole.
a. According to classical test theory, an examinee's obtained test score consists of two components: ______, or the portion of variability among examinees that is due to whatever attribute is being measured by the test, and ____, or the portion of varian: truth (or true score vaiance)
error (or measurement error, or error variance)
______ by definition, is _________, which means that it is due to factors that affect different examinees in different ways.: error; random
If a test is ______, it will be free from ______ and yield information about examinees' _____.: reliable
error
true scores
2. The reliability coefficient, unlike other correlation coefficients, is interpreted ______. This means, e.g., that for a test with a reliability coefficient of .70, _____% of observed score variance is true variance. In other words, unlike as with othe: directly
70
square
3. Obtaining a(n) ____ reliability corefficient involves administering the same test to the same group of people, and then correlating scores on the first and second administrations.: test-retest
The sources of measurement error for this type of reliability include factors related to _____.: the passage of time
This coefficient is not appropriate to use for test that measure ______ and those on which scores are affected by ______.: unstable attributes
repeated administration
4. Obtaining a(n) ______ reliability coefficient involves administering two forms of a test to the same group of examinnes, and then obtaining the correlation between the two sets of scores. Sources of measurement error for this reliability coefficnet us: alternate forms
the passage of time
content
5. There are a number of measures of _____ reliability, all of which indicate the magnitude of correlation among individual items.: internal consistence
For instance, obtaining a(n) ____ reliability coefficient involves dividing a test in two and obtaining a correlation between the halves as if they were two shorter tests. When this coefficient is used, the ____ is usually used to correct for the effects: split-half
spearman-Brown Formula;
Kuder-Richardson Formula
coefficient alpha
speed tests
The _____ is used to construct confidence intervals that indicate the range in which an examinee's _____ test score is likely to fall, given his _____.: standard error of measurement
true
obtained test score
For example, there is a _____% probability that the examinee's _____ score lies within one ______ of the _______ score, and a ____% probability that the examinee's _____ score lies within approx two _____ of the _____ scores.: 68
true
standard error of measurement
obtained
95
true
standard error of measurements
obtained
All other things being equal, a short test will have a(n) ____ reliability coefficient than a longer test, a fill-in-the-blank test will have a(n) _____ reliability coefficient than a true/falst test, and a very easy test will have a(n) ______ reliabilit: lower;
higher;
lower
1. You would not use the Kuder-Richardson Formula 20 to assess the reliability of a a. test that is dochotomously scored b. test that measures an unstable attribute c. speed test d. psychological test: C. Internal consistency reliability coefficients (e.g., KR-20, coefficient alpha, aplit-half) should nto be used to assess the reliatbility of speed tests. This is because on a speed test, all attempted items are expected to be answered correctly; thus, any coefficient of internal consistency will yield a spuriously high estimate of the test's reliability.
One way to improve the inter-rater reliability of a bx observation scale would be to use a. mutually exclusive rating categores b. non-exhaustive rating categories c. highly valid rating categories d. empirically derived rating: A. Inter-rater reliability is strengthened when mutually exclusive and exhaustive rating categories are used. This means that categories are clearly enough defined so that no bx will belong under overlapping categories (mutually exclusive), and that all observed behaviors can be placed into a category (exhaustive).
The standard error of measurement is a. inversely related to the reliability coefficient and inversely related to the stand deviation of test scores. b. positively related to the reliability coefficient and positively related to the stand: D. This means that the standard error of measurement increases as reliability decreases adn the standard deviation increases. This can be seen from the formula for the standard error of measurment.
When practical, it is most advisable to use a(n) a. alternate-forms reliability coefficient b. test-retest reliabililty coefficient c. internal consistency reliability coefficient d. interscorer reliability coefficient: A. Although this opinion is not universally shared, it is what many experts believe. The words "when practical" are a good clue, since it is often very impractical to obtain an alternate forms reliability coefficient.
According to classical test theory, an observed test score relects a. true score variance plus systematic error variance b. true score variance plus random error variance c. true score variance plus random and systematic error varian: B. According to classical test theory, a given test score reflects both "truth" (whatever is being measured by the test) and measurement error (factors that are irrelevant to whatever the tset is measuring). Measurement error, which occurs because no test is perfectly reliabile, is random by definition.
Which of the following methods of recording gx is most usefly when the target bx has no fixed beginning or end? a. interval b. continuous frequency d. duration: A. In interval recording, a rater records whether or not an individual is engaging in a target bx during a given interval. During this interval, the rater only has to decide if the behavior is occurring, not when it begins or when it ends. This is why interval recording is the best method of recording behaviors that have no fixed beginning or end.
A test has content validity if it __________.: adequately samples the content domain it is supposed to measure knowledge of;
Content validity is a concern when ___________ tests are being developed.: educational (or achievement, or work sample)
To determine if a test has content validity, we rely primarily on ________.: expert judgment
If a test has criterion-related validity, there would be a high ______ between the _____ and the _____.: correlation; predictor; criterion.
A(n) ______ measure is a direct and independent measure of that which the predictor test is designed to predict; it can be thought of as that which is being predicted. For example, if an industrial psychologist were interested in using scores on an aptit: criterion; predictor; criterion
When _________ validation procedures are used to validate a predictor test, predictor and criterion data are collected at or about the same time.: Concurrent
When _______ validation procedures are used, predictor data is collected first, and criterion data are collected at a future point.: predictive
The former type of validation is more appropriate for predictors that measure _____; the latter type is more appropriate for test designed to measure _______: current status on a criterion
future status on a criterion
Since ______ validation is less costly than _______ validation, the former is often used as a substitute for the latter.: concurrent
predictive
The ______ is a statistic used to contruct a range in which an examinee's ______ criterion score is likely to fall, given his or her ________ criterion score.: standard error of estimate
actual (or true)
predicted
Say a person takes a short aptitude test that is being used as a predictor of IQ score. Say that on the basis of his score on the aptitude test, his IQ score is predicted to be 100. If the ____ were equal to 5, there would be a 68% probability that his _: 95
actual
predicted
standard error of estimate
standard error of estimate
And there would be about a 95% probability that his _____ intelligence is between _____ and _____.: actual
90; 100
Often, a predictor is used for classification purposes. ie., to predict to which of two _____ groups a person belongs. When this is the case, the predictor is administered to examinees, and those scoring above the predictor ____ would be expected to scor: criterion
cutoff
cutoff
For example, a job selection test might be used to predict whether or not a person will be successful at a particular ocupation. Individuals who are predicted to be successful by the test and in fact do turn out to be successful would be called ______.: true positives
And those whom the predictor correctly identified as unsuccessful would be called ______.: true negatives
Those who are classified by the test into the unsuccessful group but turn out to be successful on the job would be called______.: false negatives
And finally, those whome the test predicts to be successful but in fact turn out to be unsuccessful would be called _____.: false positives
A validity coefficient would be lowered if there was a(n)____ range of scores on either the -____ or the ______.: restricted
predictor
criterion
7. After construcing & validating a test, a test developer wil likely want to re-validate it using a second sample of individuals. This process referred to as ______. In such cases, the validity coefficient obtained on the second sample is likely to: cross-validation; lower; shrinkage
8. A test has ______ when its validity coefficient for one subgroup is higher than its coefficient for another subgroup. For ex, an IQ test may be a valid predictor of job performance for whites, but a completely invalid predictor of performance for blac: differential validity;
moderator variable
Costruct validity is a concern in developing tests that measure ______.: hy;othetical constructs or traits
Two types of construct validity are the following 1) ________ validity, which is present when a test has a(n) _______ correlation with another test that measures the same trait, and 2) _____ validity, which is present when a test has a(n) ______ correlat: convergent
high
discriminant (or divergent)
low
A(n) ______ matrix provides a method of assessing the construct validity of two or more tests. On this matrix, if the ______ coefficient (the correlation between two test which measure the same construct using different methods) is ______, evidence of __: multitrait-multimethod
monotrait-heteromethod
high
convergent
And if the ______ coefficient (the correlation between two tests using the same method to measure different constructs) is _____, evidence of _____ validity is provided.: heterotrait-monomethod
low
discriminant (or divergent)
______ is a procedure designed to determine the degree to which a large set of variables or test are measuring the same underlying construct or constructs.: Facotor analysis
The proecedure yeilds a(n) _____, which indicates each test's correlation with each factor identified in the anlysis (a correlation between a test and a factor is referred to as a(n) _____.): factor matrix
factor loading
To facilitate interpretation of a factor analysis, a(n) ______ is usually performed, and there are two types: _____ and ______.: rotation
orthogonal
oblique
When a(n) ______ is conducted, uncorrelated factors are derived, and when a(n) _____ is conducted, correlated factors are derived.: orthogonal rotation
oblique rotation
If, in a factor analysis, factors are ______, the ______ of a test can be obtained by squaring and summing the ______.: orthogonalcommonality
factor loadings
For example, imagine a factor analysis of six tests which yeilded two significant factors. Imagine that Test A has a .60 correlation with Factor I and a .20 correlation with Factor II. By squaring and summing these ______ (assuming the rotation is ______: factor loadings
orthogonal
.40
the two factors
If a test is highly reliable it (will be/may be/will not be) valid. If a test is very valid, it (will be/may be/will not be) reliable.: may be
will be
In other words, reliability is a(n) ______ but not a(n) ______ condition for validity to present. The ____ formula would be used to determine how ____ a test would be if it had _____ reliability.: necessary
sufficient
correction for attenuation
valid
perfect
1. Which of the following is the lowest validity coefficient? a. .80 b. .50 c. .10 d -.15: C. Like any other correlation coefficient, the magnitude of the validity coefficient is determined by its absolute value rather than its direction (i.e., positive or negative). To answer this question, look at the numbers and ignore any negatie signs. Since .10 is the lowerst number, it is, of the choices listed, the lowest validity coeffient.
2. If an indistrial psychologist were concerned about reducing the number of false positives yielded by a job selection test, he could a. raise the predictor cutoff score and/or raise the criterion cutoff score. b. raise the predictor cut: B. False positives can be reduced by raising the predictor cutoff score. If teh selection test becomes more difficult to succeed on, there will be fewer individuals who "pass," and those who do pass are less likely to be unqualified. Lowering the criterion cutoff score will also result in fewer false positives. Lwoering the criterion cutoff is equivalent to relaxing the definition of acceptable performance. This means that it will be easier to be considered adequate; therefore, those who do "pass" the selection test will be more likely to be able to meet this easier criterion standard.
5. Some would argue that, in conducting a factor analysis, an oblique rotation is usually preferable to an orthogonal rotation because a. few factors are uncorrelated b. most factor analyses identify distinct and unrelated traits c.: A. By definition, an oblique rotation produces correlated factors. In other words, if you believe that the traits represented by the factors are correlated, it makes theoretical sense to use an oblique rotation. And if you believe that few factors or traits are ever uncorrelated, you might argue that oblique rotations should always be used.
If a test has a reliability coefficient of .90, we can conclude that a. the highest validity coefficient the test could have is .81. b. its validity coefficient is equal to teh square root of .90. c. the test is probably very valid.<: D. Knowing that the test's reliabilty coefficnet is .90, tells us that teh upper limit of the validity coefficient is the square root of .90 (not .81, which is the square of .90). This means that the test's validity is lower than or equal to the square root of .90. The test may be highly valid, mod valid, or completely invalid.
If a test's validity coefficient were -1.0, the standard error of estimate would be equal to a. 0.0 b. 1.0 c. the standard deviation of criterion scores d. cannot be determined: A. This makes sense. If a test has perfect validity (a validity coefficient of 1.0 or -1.0), there is no error of estimate, or no error when the test is used to predict score on a criterion measure. The anser can also be derived through the formula for the standard error of estimate. Using this formula, you can see that a validity coefficient of -1.0 will always result in a standard error of estimate of 0.
9. Criterion contamination has the effect of a. increasing the validity coefficient b. decreasing teh validity coefficient c. increasing examinees' criterion score d. decreasing examinees' criterion scores: A. Criterion contamination occurs when raters assigning criterion scores have knowledge of the ratees' predictor scores, adn their knowledge affects scores on the criterion. If a supervisor knows that an employee got a low score on a predictor, he might rate the employee lower on the criterion than he normally would have. This results in an artificially high consistency between predictor and criterion scores and inflates the validity coefficient.
10. Fjollowing a prinicipal components analysis of a set of variables, four eigenvectors, symbolized in order as v1, v2, v3, and v4, are derived. Which of the tollowing statements is true? a. v1 will account for more variance in the variables: A. In a principal components analysis, eigenvectors (which are also called factors or principal components) represent underlying traits or constructs that are being measured by some or all ofthe variables being analyzed. In principal components analysis, the first factor accounts for high percentage of variance than any ot\f the other factors. This just means that the variables in the analysis measure the first factor more than they measure any of the other factors.
By definition, an oblique rotation produces ________ factors.: Correlated
If you believe that the traits represented by the factors are correlated, it makes sense to use an ________.: oblique rotation
If you believe that few factors or traits are ever uncorrelated, you might orgue that _______ ______ should always be used.: oblique rotations
A ________ _______ is an examinee who is identified by a predictor as not meeting a criterion but, in reality, does meet it.: false negative
Usually, when a test is developed, a large pool of _____ are written, and _____ is used to determine which items will be retained for the final version of the test.: items
item analysis
A test item's difficulty level (p) is equal to the ________.: percentage of examinees who answer the item correctly.
On most tests, the optimal average difficulty level is ______; this level is associated with maximum ____ and ______.: .50
reliability
differentiation or discriminability (score variability would also be correct)
However, the optimal difficulty level depends on _______. For example, if the test is designed to select only a few highly qualified individuals, one should set the average p value at a relatively (high/low) level. It's imp to remember that the higher th: purpose of testing (or the probability that items can be guessed)
low
less difficult
A test item's discrimination refers to the degree to which the item ______.: differentiates among examinees in terms of what the test measures;
One way to assess an item's discrimination is to correlate each item with either ______ or _______.: the total test score
an external criterion
An item's discrimination index (D) is equal to the percentage of _____ (U) minus the percentage of ________ (L); a value of ______ represents maximum discriminability.: examinees in the high-scoring group
examinees in the low-scoring group
100 or -100
Higher levels of discrimination are associated with _____ levels of difficulty.: moderate
The item difficulty level associated with the maximum level of differentiation among examinees is 1. .10 b. .50 c. .75 d. 1.0: b. An item is most likely to differentiate among examinees (e.g., between high and low scorers) when half the examinees answer it correctly and half answer it incorrectly. The item difficult level (p) of .50.
The optimal average item difficulty level for a true-falst test would be a. .10 b.50 c. .75 d. 1.0: C. For most tests, the optimal item difficulty level is .50. Hwoever, the optimal difficulty level is affected by the probability that examinenees can select the correct answer by chance alone. When considering the effects of chance, the rule-of-thumb is that the average difficulty level of test items should be about halfway between 1.0 and the level of success expected by chance alone. On a true-falst test, the probability of getting an item correct by chance alone is 50%. Therefore the optimal item difficulty level would be midway between 50% and 100%, or 75%``
A test item's difficulty level is most affected by a. thes test's length b. the test's validity c. the natuer of the testing process d. the characteristics of the individuals taking the test.: D. A test item's difficulty is measured in terms of the percentage of examinees who answer the item correctly. Therefore, the characteristics of the individuals taking the test will influence the observed difficulty level. For ex., if all examinees taking an intelligence test are highly gifted, the difficulty index (p) for items will be inflated. That is, test items will be estimated to be easier than they actually are.
Which of the following statements is least true of item response theory? a. It is based on the notion that items analyzed measure a latent trait such as cognitive ability. b. It allows for the ability levels of diff groups of people to be compa: d. One assumption of item response theory is that item parameters (characteristics of items such as diff level and discrimination) will be the same regardless of the sample of individuals taking the test. The other statements are true of item reponse theory.
An examinee's _______ test score is not that meaningful unless a frame of reference is provided for score interpretation.: raw
Two types of scores which provide this frame of reference are _______ scores, which provide a comparison of an examinee's score to that of others who have taken the same test, and ______ scores, which provide a comparison to an external, pre-established: norm-referenced;
criterion-referenced
2. Norm-referenced scores include ______ scores, which indicate how far along the normal path of development an examinee is.: developmental
A(n) _____ IQ score is an example of such a score.: ratio
They also include within-group norms such as _______, which indicate the percentage of scores that fall below a given raw score, and _______, which indicate where a given score stands, in standard deviation units, in relation to the mean.: percentile ranks
standard scores
There are a number of different types of ______, including z-scores, _______, _______, and ______, all of which provide essentially the same information.: standard scores
T-scores
stanines
deviation IQ scores
1. Percentile ranks and T-scores have which of the following in common? a. They are both standard scores b. They are both norm-referenced scores. c. They are both developmental scores. d. They are both criterion-referenced score: B. Percentile ranks and T-scores are both norm-referenced scores; that is, they are both interpreted in terms of a comparison to the scores of those in a normative group. A T-score (but not a percentile rank) is also a standard score, which is a norm-referenced score that is interpreted in terms of distance, in standard deviation units, from the mean of a normative group.
You work as an assistant for a psychology professor at a university and have administered and scored a mid-term exam for him. You report students' score on the exam as z-scores. The prof tells you that "This makes no sense; I'm used to the MMPI and: C. T-score and Z-score are both standard scores, which means they are both interpreted in terms of distance from the mean in standard deviation units. For example, a z-score of 1.0 and a T-score of 60 are equivalent - they both indicate that teh score is one standard deviation unit above the mean.
3. The formula "X-M/s.d." is the formula for a a. standard score b. percentile rank c. criterion referenced score d. T-score: A. This is the formula for a z-score, whci is a typeof standard score.
4. The advantage of a deviation IQ score, as compared to a ratio IQ score is that it a. provides an index of an examinee's absolute level of intelligence b. indicates an examinee's mental age c. alllows scores of individuals wh: 4. D. A deviation IQ score is a standard score, which means that it tells you how many standard deviation units an examinee's score falls above or below the mean. An advantage of a standard score is that scores of individuals from different populations (and on diff tests) can be compared. A 9 year old's deviation IQ score can be meaningfuly compared to that of a 30 year old.

Start Studying!

Deck Info

Number of cards 250

Research Design &amp; Statistics, Test Construction

Terms

Deck Info

Research Design & Statistics, Test Construction