This site is 100% ad supported. Please add an exception to adblock for this site.

Psyc 306


undefined, object
copy deck
Validity Types:
How accurate is the conclusion you draw from a statistical test. We hope to be able to draw conclusions based on reliable, dep measures. We desire our statistical assumptions (pop distrib and var) be met.
p < .05
The probability of obtaining a test statistic as extreme or more extreme than a stated value usu one you've computed. If its less than 5% chance that we could get a test statistic this extreme, we decide to reject the null hyp b/c the occurence of the statistic is so unus. A risk in making an error and the extent to which we'd like to think we risk leaving statistical validity unprotected.
Do the findings support the theory over competing theories. We hope our theory explains the nature of the data we analyze but we must submit our theory to scrutiny.
Can we generalize these results to other conditions? We need to be able to be assured our findings can be detected in other contexts besides our controlled lab setting or this particular set of partic.
Did the IV cause a change in the DV? We need to eliminate confounds which subtract from the potential explanatory power of the IV.
Confounds: Maturation
DV changes occur solely as a result of partic growing older or more experienced. Partic relevant to longit studies.
Confounds: History
DV changes due to variation in events outside the study's IV effects. Worst with large amount of time btwn pre and post tests. Testing: partic show dv changes due to practice in repeated measurements.
Confounds: Instrumentation. Regression to the mean.
DV changes are due to the measuring instrument's variability. Ex, changing observational criteria. Regression: the effect of initially high scorers showing score reductions and initially low scorers showing score inc-more extreme to less.
Selection. Attrition.
Result of groups not randomly selected and assigned to groups; they're unequivalent. Attrition: partic drop out differentially across groups, causing biased results.
Confounds: Diffusion of treatment
Sequence effects
Partic in diff exper conditions share info from their manip and reduce exper diff across groups. Seq effects: in repeated measure designs or w/in-subject designs, partic are exposed to diff exper conditions, but the order is constant, changes in DV may be caused by the order of the presentaton and the not the condition.
Subject effects
Deals w/ partic behaviors in a social setting like: curiosity, motivation, expectation and bias. Unnatural beh exhibited, acting good. Hawthorne effect: inc productivity after any special attention.
Subj effects: demand characteristics. Placebo effects
This is the term given to those cues given to partic usu unintentional by the researcher, that urge the partic to beh unnaturally. Placebo effects: demand char. Expectations of change, improvement w/out manip.
Experimenter effects
Deal w/ the experimenter influencing partic towoard an unexpected outcome, via manip, or cueing of some type, leading to bias. Ethical violations: another experimenter effect deals w/ only data which supports his hypoth or w/ outliers?
General control procedures
Prepare lab setting. Lighting, temp, sound. Make setting similar for all. Use same experimenter whenever possible.
Exact vs systematic
Identical in every way vs similar but w/ procedural changes.
General Control Procedures⬦
Use stimuli with good reliability and validity

Reliability = stability or “repeatability” of results from test, scale, or stimulus

Validity = stimulus, test, or scale should give us confidence that we’re measuring the “thing” we think we’re measuring; our test should correlate with other, similar measures of the same behavior
Replication: Systematic and Exact
Exact = identical in every way
Systematic = Similar, but with procedural changes, e.g.,

“When I read something, I find I must read it again”
nearly always
Conceptual Replication
Different studies are generated from the same problem statement

More common than systematic or exact replication

E.g., Study 1 on self-efficacy: q-aires and real-life problem solving situation (observational)

Study 2 on self-efficacy: experimental manipulation and artificial but controlled problem-solving (experimental)
Control over Subject and Experimenter Effects
Single and double-blind procedures
Single=RA is blind to manipulation/condition
Double=RA and participants are blind to manipulation/condition

Particularly important to have double-blind studies in social psychology experiments
Single-blind studies are important for protecting against experimenter bias
Standardization of the study’s instructions using a computer, or video/audio tape

Computer-aided scoring and recording of responses tremendously reduces clerical errors in data entry

Many studies now use PDAs to automate their data collection
Reduces bias, errors, saves money, reduces waste
Use Objective Measures
Use easily agreed-upon responses if observational measures are taken
“Does mother gaze at the infant, face-to-face, for at least 10 seconds at a time?”
Use Multiple Observers
At least two (more is better) observers rate the behaviors being measured, and compare ratings with some “% agreement” (e.g., 90%).
Deliberately withholding information from participants (and misleading them to believe the study’s purposes are very different from what they actually are)
Use only when necessary
Always debrief participants at end of study
E.g., “helping behavior and altruism” Batson study
Control Through Subject Selection and Assignment
Subject selection
Issues: proper sampling helps us increase generalizability of results

E.g., The Nun Study on Alzheimer’s Disease

Good: careful selection; no smokers, no sex or reproduction; little/no drinking; similar vocations (teachers); group living
Bad: generalizability to whom???
Subject Selection⬦
Broad Terms:
Population — some larger group of interest
Sample — some smaller group drawn from a population; should be representative
Specific Terms:
General population --- all persons
Target population --- specific population of interest (e.g., Alzheimer’s patients)
Accessible population --- subset of population available to the researcher (e.g., 100 Alzheimer’s patients from N. Virginia)
Subject Selection
Then, for our sample, of N=1000 people, we should randomly select:
540 who work in universities
180 in hospitals
120 in private practice
100 in industry
Good, but we still have the issue of study volunteers being unrepresentative of others (selection bias)
Subject Selection
3. Ad hoc samples
samples drawn from accessible populations
The sample’s characteristics define the population to which we can generalize
This type of sample is used most often in psychological research
E.g., “100 female psychology students, aged 18-20, from a southern university”…gather background information on the participants, and report it in our research article
Subject Assignment
Placing persons in experimental conditions
More important than random selection⬦especially for internal validity⬦5 different ways⬦
Random assignment
Use a random # table or generator
Random is not equal to haphazard
We may not end up with equal groups, but we can say the sources of bias have an equal likelihood of being spread across groups
2. Matched Random Assignment
Makes n’s equal and balanced across conditions
E.g., “memory in college students” study;
2-group design; one group gets mnemonic training;
we believe GPA could be a confound if allowed to run loose
3. Eliminate the variable
Elimination involves removing persons beyond a certain level of a variable

E.g., Aging study of visual perception---mental transformation of a visual object

Persons having visual acuity of 20/20, 20/30, 20/40, and 20/50 are acceptable; those persons having acuity worse than 20/50 are eliminated from the study
4. Hold a variable constant
Constancy involves allowing persons in a study having only a given level of a variable

Same example of visual perception--- in this case, we allow only persons with 20/20 acuity to be in the study---the result gives us constancy
5. Put a variable in study’s design
Build this variable into the design and examine the effect of the levels on the DV
IV. Control through experimental design
Goal is to reduce threats to internal validity
We want to be able to say “The IV manipulation caused a change in the DV” and nothing else…
The best way to have control in an experiment...Select participants in an unbiased way
e.g., stratified random sampling or ad hoc samples
Assign participants to conditions in an unbiased way
e.g., random assignment, or matched random assignment
Control through experimental design⬦
4. Use strong procedures for testing causal relationships

E.g., use a control group, use pilot testing, use prior research to support work

5. Use specific control procedures to reduce internal validity threats

E.g., random assignment, automation, objective measures, placebo, deception
Hypothesis testing: General steps
We want to draw conclusions about our results by examining the probability of getting our results if the opposite of what we are predicting were true. Make questions into H0 and H1 about populations
Ex. Step 1
2. Determine characteristics of the comparison distribution
We believe the Population 1 adults do not remember more words than Population 2 adults (hence, Population 1’s mean is less than or equal to Population 2’s mean).
2. Next, we ask, “What is the probability of obtaining a particular test statistic if the null hypothesis is true?”

We assume the sample (here, 1 person) was selected from a distribution representing a true null hypothesis, approximately normal in form, with a mean of 8 and a standard deviation of 3.
The distribution to which you compare your sample (when the null hypothesis is true) is the comparison distribution
3. Determine the cut-off score to reject the null hypothesis
If the null hypothesis is true, then witnessing an adult remembering 14 words is very unlikely…that’s TWO standard deviations above the mean…an extreme value associated with less than 2% probability of occurrence…

Performing 2 standard deviations above the mean is rare, regardless of experimental training… The cut-off score, here, is set in terms of normal Z-units
If the Z-statistic we compute is more extreme than +2 (in our example), we can reject the null hypothesis
Typically, we use a cut-off score that corresponds to a probability of 5%…typically called alpha (a = .05).
4. Determine the sample’s score on comparison distribution
So, in our example, we need Z > +2 to reject the null hypothesis
So, we’ve conducted our study on N = 1, and we see that our adult was able to remember 15 words
Now we’ll compute Z and determine its standing on the comparison distribution. Our Z of + 2.33 indicates that the adult who remembered 15 words is 2.33 standard deviations above the population mean.
5. Reject null hypothesis or not?
We know we needed a Z-score in excess of +2 to reject the null hypothesis…we got +2.33…so we can reject the null hypothesis.

Therefore, our research hypothesis was supported…Pop 1 adults who received memory training remembered more words than Pop 2 adults who didn’t receive training. If we would not have had a statistically significant test statistic, we’d simply say our results were inconclusive, and that we failed to reject the null hypothesis…
Ways of framing hypotheses testing questions
One-tailed tests and directional hypotheses. The researcher does not have an idea of which population has the higher mean, only that they’ll be different
Hypothesis Testing Introduction
1. Make questions into H0 and H1
2. Determine characteristics of the comparison distribution
3. Determine cutoff score to reject H0 4. Determine sample’s score to reject H0
5. Reject H0 or not?
Hyp Testing Popul 2
Pop. 2: Adults who do not receive such special memory training
Hypothesis testing using a distribution of means
1. Mean of a distribution of means = pop mean.
2. Variance of a distribution of means = pop variance, divided by # of scores in each sample
3. Shape of a distribution of means is bell-shaped and unimodal
Three kinds of distributions
1. Distribution of a population of persons
2. Distribution of a sample drawn from a population
3. Distribution of means of all possible samples of a particular size taken from that distribution
The distribution of means
We want to compare this mean to something similar---a distribution of means. The Central Limit Theorem underlies several rules we’ll address: As number of samples increases, the mean of the distribution of means approximates the population mean. As # of samples increases, high and low means cancel out each other. Rule 2: The variance of the distribution of means is the variance of the distribution of scores divided by the number of scores in each sample.
Rule 2
The standard deviation of a distribution of means is simply the square root of the variance of the distribution of means. **Remember this ! This standard error is the “average amount off we expect our sample mean to differ from the mean of the distribution of means”
Rule 3
The shape of a distribution of means tends to be unimodal and bell-shaped. It is at least approximately normal if there are at least 30 scores in each sample, OR if the population of scores is normal
Single sample t-test
Compare a single mean to a population mean when the population standard deviation (s) is not known
Dependent means t-test
Compare two linked means when the population standard deviation (s) is not known
Single sample t-test⬦
We don’t know the s of the population, only m
SO! We will estimate s of the population using our sample information
If the sample is randomly. Compute an unbiased estimate of the population variance in a way that boosts the sample’s variance a bit---we’ll do this by putting N – 1 in the denominator of the variance equation
t-test for dependent means
Here, two scores from each person are compared (repeated measures designs, or within-subjects designs), OR
two scores from matched persons are compared
When independent means t-tests should be used:
when sigma (s) is unknown for the population distributions
two separately sampled means are being compared. Important point: the comparison distribution is a distribution of differences between means. With equal N, the two estimates are averaged, and we call this the pooled est of popul variance. Also have: the weighted avg est of pop var.
***Assumptions of the independent means t-test
B. Variances are equal across populations
---this is the “homogeneity of variance” assumption
---if “moderate” violations occur and N per sample is equal or nearly equal, then the t-test is robust to these violations. Examine the variances in your data.
---if “very large” differences in sample variances occur, the t-test could give inaccurate results. ((If both the normality and variance assumptions are not met, we may want to consider performing some non-parametric tests, or “distribution-free” tests))
Effect Size
(d = .80 (large) means we expect (or, found) 4/5 of a standard deviation between the two means)
These values should be evaluated in their absolute sense; i.e., obtaining an effect size of -.80 is still considered large; their value is not bounded by –1 or +1
These effect sizes are symbolized by d, and are calculated as follows:
d= M/S. **Denominator of is relatively UNinfluenced by N, as for dep means t tests (M-0/SM) and indep.
What shapes the statistical power in our studies?
Effect size – the larger the effect size, the more power associated with it; it’s easier to find a big difference than a small one, relative to the standard deviation
Sample size – the larger the N, the more power we have; and, the smaller the standard error, which describes the spread of the distribution of means
Statistical significance level – the less extreme the a, the easier it will be to reject H0
1- or 2-tailed tests – 1-tailed tests provide less stringent cut-offs for rejecting H0
Statistical Power and Effect Size
Please note from these tables:
As N increases, so does power
As effect size (d) increases, so does power
1-tailed tests have more power than their 2-tailed counterparts
For the same effect size, a dependent means t-test has more power with half the sample size as that of an independent means t-test – why ?
(and, not shown in these tables…more extreme a levels will be more difficult to achieve, and will therefore have less power than a less extreme a level)
Hypothesis Testing Steps for CORRELATIONS
df for t = N – 2 = 20 – 2 = 18Step 3: Get t-critical to reject H0 Step 4: Get sample’s score on the comparison distribution Step 5: Compare scores and make decision ***t computed is more extrme than t critical which happens to be. . .Write this part: As hyp, we found stat sig pos Pearson correl btwn x and y, r = .70, p<.01. Underline.

Deck Info