# KEY TERMS: Research/Stats/Test Construction

## Terms

undefined, object

copy deck

- IV vs DV
- INDEPENDENT VARIABLE: what is being compared, can be manipulated or non-manipulated (pre-existing); DEPENDENT VARIABLE: outcome measure selected, can be nominal, ordinal, interval, or ratio
- True Experiment vs. Quasi-Experiment vs. Observational
- TRUE EXPERIMENT: at least one IV is manipulated and subjects are randomly assigned; QUASI-EXPERIMENTAL: at least one IV is manipulated but there is non-random assignment (subjects already in pre-existing groups); OBSERVATIONAL: no intervention or manipulation, passive, non-experimental (e.g., study extent of cigarette smoking between adolescent boys and girls)
- Group vs. Single Subject Design
- GROUP (NOMOTHETIC): Between groups (compares independent groups), Within subjects (correlated groups or subjects repeatedly measured), Mixed Design (groups that are independent and correlated); SINGLE SUBJECT (IDIOGRAPHIC): one of few subjects measured repeatedly; AB design, ABAB descign, Multiple Baseline, Simultaneous Tx Design
- AB vs ABAB vs Multiple Baseline vs. Simultaneous Tx vs. Changing Criterion
- AB (baseline followed by tx; threat of hx); ABAB DESIGN (tx, baseline, tx, baseline; failure to return to baseline, ethics); MULTIPLE BASELINE (sequential or consecutive tx across subjects, situations, bx; time consuming and expensive); CHANGING CRITERION (attempt to change bx in increments to match changing criterion, gradual reduction)
- Simple Random Sampling vs. Stratified Random Sampling vs. Proportional Sampling vs. Cluster Sampling vs. Systematic Sampling
- SIMPLE RANDOM SAMP: every member of pop has equal chance of being randomly selected; STRATIFIED RANDOM SAMP: pop divided into strata and then random samp of equal size from each stratum seleted; PROPORTIONAL SAMP: indviduals random select in proportion to their rep in gen pop; CLUSTER SAMP: i.d. naturally occuring groups of subjects and randomly select certain clusters, then survey all members of cluster; SYSTEMATIC SAMP: selecting every kth element after a random start
- Interval Recording vs. Event Sampling
- INTERVAL RECORDING: time sampling, momentary or whole-interval, good when bx is not discrete and thus has no distinct beginning or end; EVENT SAMPLING: tallying # of times that target bx occurred, good when bx is discrete and occurs relatively infrequently
- Threats to Internal Validity
- History (occurrence of external, related event), Maturation, Testing (practice effects), Instrumentation (change in observer or instrument calibration), Statistical Regression (from extreme twds the mean), Selection Bias (non-random assignment), Attrition, Diffusion (no-tx group gets some of the tx by mistake)
- Threats to Construct Validity
- Attention and Contact w/Clients, Experimenter Expectancies (Rosenthal Effect, cues given to subjects inadvertently), Demand Characteristics (things in procedure that suggest how subject should behave)
- Threats to External Validity
- Sample Characteristics (dfncs between sample and population), Stimulus Characteristics (artifical research arrangements, non-generalizable to real world), Contextual Characteristics (reactivity--bx based on being observed, e.g., Hawthorne effect)
- Threats to Statistical Conclusion Validity
- Low Power (diminished ability to find significant results, e.g., small sample size, inadequate intervention), Unreliability of Measures, Unreliability of Procedures (inconsistency in tx procedures), Subject Heterogeneity
- Nominal vs. Ordinal vs. Interval vs. Ratio Data
- NOMINAL: tallying people to find non-ordered category; no inherent order; no group mean (e.g., 100 subjects, tally based on gender, race); ORDINAL: tallying people to find ordered category; no group mean (e.g., 100 subjects, tally on attitude re: abortion); INTERVAL: obtain numerical scores for each person where score values have equal intervals; no zero score or zero isn't absolute (e.g., IQ score, t-score), can calculate group mean; RATIO: obtain numerical scores for each person where score values have equal intervals and absolute zero (e.g., savings in bank, EPPP score); can calculate group mean
- Descriptive vs. Inferential Statistics
- DESCRIPTIVE STATS: data simply described; INFERENTIAL STATS: goal is to make inferences about population from the sample
- Mean vs. Mode vs. Median
- MEAN: arithmetic average of group of data, add up all scores and divide by # of scores; MODE: most frequently occurring score; MEDIAN: the score at the 50th percentile
- Standard Deviation vs. Range vs. Variance
- STANDARD DEVIATION: measure of avg deviation (spread) from mean in given set of scores; VARIANCE: standard deviation squared; RANGE: crudest measure of variability, the difference between the lowest and highest value obtained
- Criterion-Referenced vs. Norm-Referenced Scores
- CRITERION-REFERENCED SCORE (aka Domain-Referenced Score): Percentage correct; NORM-REFERENCED SCORE (aka Standard Score): Z-score, t score, IQ score, percentile rank
- Z Scores
- most basic standard score; correspond directly to SD units; have mean of zero and SD of one; z-score distribution will always be identical to raw score distribution; Z=score-mean/SD
- Standard Error of the Mean and Central Limit Theorem
- STANDARD ERROR OF THE MEAN: the average amount of deviation of plotted means from sample distribution; CENTRAL LIMIT THEOREM: assuming an infinite number of equal sized sampels drawn from the population plotted, a normally distributed distribution of means will result
- Null Hypothesis vs. Alternative Hypothesis
- NULL HYPOTHESIS: there are no differences between gorups, researcher hopes to reject this statement; ALTERNATIVE HYPOTHESIS: there are differences between groups
- Rejection vs. Retention Region
- REJECTION REGION: at the tail end of the curve, aka region of unlikely values, size of region corresponds to ALPHA level, when obtained values fall in this region, the null hypothesis is rejected and it is concluded that there were tx effects; RETENTION REGION: aka acceptance region, when obtained values fall in this region, the null hypothesis is accepted and it is concluded that there were no tx effects
- Alpha vs. Beta
- ALPHA=size of rejection region, the greater the size, the higher likelihood of Type I error (incorrectly rejecting the null hypothesis/differences erroneously found); BETA= the probability of making a Type II error, where the null is incorrectly accepted (no dfnces found where they really did exist); THERE IS AN INVERSE RELATIONSHIP BTWN ALPHA & BETA
- Type I vs. Type II Error
- TYPE I ERROR: occurs when null is incorrectly rejected (related to size of ALPHA); TYPE II ERROR: occurs when null is incorrectly accepted (BETA)
- Power
- The ability to correctly reject the null; increased when sample size is large, the magnitude of the intervention is large, random error is small, stats test is parametric and one-tailed; INVERSE rltnshp with BETA (POWER=1-beta); DIRECT rltnshp with ALPHA (the more alpha, the more power)
- t-test vs. ANOVA
- Parametric tests used when the DV is interval or ratio: T-TEST=One IV only (e.g., type of tx), only one or two groups compared (e.g., effectiveness of CBT vs med for depression); ANOVA: One or More IVs (e.g., type of tx) and two or more groups are compared (e.g., effectiveness of CBT vs med vs combined tx for depression)
- One-Way ANOVA vs. Factorial ANOVA
- ONE-WAY ANOVA: One IV only (e.g., type of tx), two or more groups are compared (e.g., effectiveness of CBT vs med vs combined tx for depression); FACTORIAL ANOVA: Two or more IVs (e.g., type of tx and sex), data for each IV is independent (e.g., effectiveness of CBT vs med and dfncs between men and women in tx of depression
- Split-plot ANOVA vs. Randomized block ANOVA vs. Repeated measures ANOVA
- SPLIT PLOT ANOVA: Two or more IVs (e.g., type of tx and time), data for at least one IV are independent and for at least one IV are correlated (e.g., effectiveness of CBT vs med for tx of depression msrd before, during, and after tx); RANDOMIZED BLOCK ANOVA: 2 IVs, 2 groups or more per IV, both IVS=group independent, one blocked; REPEATED MEASURES FACTORIAL ANOVA: 2 IVS, 2 groups or more per IV, groups correlated
- MANOVA vs. ANCOVA
- MANOVA: 2 or more DVs, 2 groups or more per IV, ind and/or corr groups, typically run when there is more than one DV (outcome measure) b/c of lower Type I error likelihood (than separate ANOVAs); ANCOVA: 2 IVS, 2 groups or more per IV, with covariate, ind and/or corr groups
- Main vs. Interaction Effects
- MAIN EFFECTS (e.g., whether there were differences btwn ethnic groups or between different tx in reducing anxiety): INTERACTION EFFECTS: (e.g., whether one's ethnicity affects one's response to type of tx)
- Trend Analysis
- Used with quantitative IV (e.g., dosage of drug, hours of food deprivaton) where the outcome is nonlinear--we are then less interested in differences between the groups and more in the trend of the data, or the ups and downs; an extension of the ANOVA
- Bivariate vs. Multivariate Correlation
- BIVARIATE CORRELATION: involve two variables, X (predictor) and Y (criterion), where neither is an IV in the truest sense and what is being looked at is the relationships between the two variables, 3 basic assumption: linear rltnshp btwn X and Y, homoscedasticity, and unrestricted range of scores on X and Y; MULTIVARIATE CORRELATION: correlation between two or more IVs (X) and one DV (Y), where Y is always interval or ratio data and at least one X is interval or ratio data
- Least Squares Criterion
- Pearson r vs. Eta vs. Biserial Correlation
- PEARSON R: both variables are continuous; ETA: curvilinear relationship between X and Y ("Old Aunt Eta" with curved abck); BISERIAL: one variable is artifical dichotomy and other variable is continuous ("Buy Cereal" with artifical sweeteners)
- Zero order vs. Partial vs. Semipartial Correlation
- ZERO CORRELATION: no extraneous variables affecting rltnshp between X and Y; PARTIAL CORRELATION: looking at relationship between two variables with 3rd variable removed, aka First Order correlation (e.g., correlating SAT and GPA w/o parental education variable); SEMIPARTIAL CORRELATION: aka Part, looking at rltnshp between two variables where influence of the 3rd variable is removed from only one of the variables
- Multiple R vs. Canonical vs. Discriminant vs. Loglinear Correlation
- MULTIPLE R (aka Multiple Correlation): correlation btwn 2 or more IVs (X) and one DV (Y) where Y interval/ratio and at least one X is interval/ratio; CANONICAL R: correlation between 2 or more IVs and 2 or more DVs; DISCRIMINANT **Compensatory and can not be used to infer causal relationships ** FUNCTION ANALYSIS: 2 or more IVs and one DV (Y) but Y is nominal rather than invertal/ratio; LOGLINEAR ANAYLSIS: used to predict a categorical criterion (Y) based on categorical predictors (X)
- Correlation vs. Regressoin
- CORRELATION: statistics that depict relationships between variables; REGRESSIONS: aka analyses, statistics that predict
- Path Analysis vs. LISREL
- PATH ANALYSIS: statistical procedure that allows for testing a model specifying the causal links among vairables by applying multiple regression techniques, straight arrows->causal rltnshps (paths), curved arrows->correlational rltnshps; LISREL: computer program used for solving path diagrams, a type of structural equation modeling-way to etermine whether or not a given model of relationships among variables is correct
- Orthogonal vs. Oblique
- ORTHOGONAL ROTATIONS: axes remain perpendicular and result in factors with NO correlation w/one another (communality=related term); OBLIQUE ROTATIONS: angle between axes is non-perpendicular and factors ARE correlated
- Factor Analysis vs. Cluster Analysis
- FACTOR ANALYSIS: operates by extracting as many significant factors from the data as is possible in increasing strength (e.g., an attractiveness scale that measures 17 dfnt elements of attractiveness can be factor analyzed for whether the scale is measuring one vs several dimensions underlying attractiveness); CLUSTER ANALYSIS: involves gathering data on a variety of dependent variables and statistically looking for naturally occurring subgroups inthe data w/o any apriori hypotheses (e.g., entering MMPI-2 data for 100s of police and finding 3 basic profile groups)
- Reliability vs. Validity
- RELIABILITY: consistency, repeatability, dependability, in scores obtained with a given test; VALIDITY: meaningfulness, usefulness, or accuracy of the test measuring what it is supposed to be measuring
- Test-Retest vs. Parallel Form vs. Internal Consistency vs. Interrater Reliability
- TEST-RETEST RELIABILITY: aka the coefficient of stablitiy, involves correlating pairs of scores from the same sample of people tested 2x w/identical test; PARALLEL FORMS REL: aka the coefficient of equivalence, admin 2 roughly equivalent test to same group of people at 2 dfnt points in time; INTERNAL CONSISTENCY REL: looks at consistency of scores w/in the test, admin only 1x to one group of people--split test in half or use Kruder-Richardson or Cronback's coefficient alpha; INTERRATER REL: looks at degree of rel between 2 or more scorers when test is subjectively scored
- Spearman-Brown Prophecy Formula
- Tells us how much more reliable the test would be if it were longer
- Split-half vs. Coefficient alpha and Kuder-Richardson
- SPLIT-HALF RELIABILITY: calculated by splitting test in half and correlating scores on 2 halves with one another, for each person who is taking the test; KUDER-RICHARDSON and CRONBACH'S COEFFICIENT ALPHA: very sophisticated forms of internal consistency reliability that essentially involve the correlation of each item with every other item in the test
- Standard Error of Measurement
- The standard deviation of a theoretically normal distribution of test scores obtained by one individual on equivalent tests; when a test is totally unreliable, the stand error of msrmt would be equal to the SD of the test; b/c of st er of msrmt we report scores using confidence bands/intervals
- Calculating confidence intervals
- Create bell shaped distribution, plot the person's score in the middle, and use standard error of measurement to label the values at the z-scores (from -3 to + 3), scores are reported in terms of 3 possible confidence intervals: 68% (from -1 to +1), 95% (from -2 to +2) and 99% (from -3 to +3)
- Content vs. Criterion-related vs. Construct Validity
- CONTENT VALIDITY: how adequately a test samples a particular content area; CRITERION-RELATED VALIDITY: how adequately a test score can be used to infer, predict, or estimate criterion outcome (e.g., SAT scores-> GPA); CONSTRUCT VALIDITY: how adequately a test measures a construct or trait (a hypothetical concept that typically cannot be measured directly)
- Concurrent vs. Predicitive validity
- CONCURRENT VALIDITY: the predictor and criterion are measured and correlated at about the same time (e.g., post test and EPPP given w/in a few days of each other and correlated); PREDICTIVE VALIDITY: there is a delay between the measurement of the predictor and the criterion (e.g., using SAT scores to predict college GPA)
- Standard Error of Estimate
- Amount of error in a predictor test's criterion-related validity; the standard deviation of a theoretically normal distribution of criterion scores obtained by one person measured repeatedly
- Taylor-Russell tables
- A complete set of tables that numerically describe the amount of improvement in our selection decisions that will result from using a predictor test vs. no test at all: Base rate (the rate of selecting successful employees w/o any predictor test), Selection ratio (the proportion of available openings to available applicants), Incremental Validity (the amount of improvement in success rate that results from using a predictor test)
- False positives vs. True positives vs. False negatives vs. True negatives
- FALSE POSITIVES: those incorrectly id as having what is being measured (not successful); TRUE POSITIVES: those correctly id as having what is being measured (successful); FALSE NEGATIVES: incorrectly id as not possessing what is being measured (successful); TRUE NEGATIVES: correctly identified as not possessing what is being measured (not successfull)
- Multi-trait Multi-method matrix
- A table that allows us to determine whether our test has both convergent and divergent/discriminant validity, both of which are necessary for construct validity
- Convergent vs. Divergent (discriminant) validity
- TYPES OF CONSTRUCT VALIDITY=CONVERGENT VALIDITY: the correlation between scores on the new test with other available measures of the same construct, the expected correlation should be moderate to high; DIVERGENT/DISCRIMINANT VALIDITY: the correlation between scores on the new test with scores on another test that measures a divergent construct, the expected correlation should be low
- Classical test theory vs. Item response theory (item characteristics curve)
- ITEM RESPONSE THEORY: it is assumed that item performance is related to the amount of the respondent's latent trait (plot of relationship between item performance and total score)