Psyc 306
Terms
undefined, object
copy deck

Validity Types:
Statistical  How accurate is the conclusion you draw from a statistical test. We hope to be able to draw conclusions based on reliable, dep measures. We desire our statistical assumptions (pop distrib and var) be met.
 p < .05
 The probability of obtaining a test statistic as extreme or more extreme than a stated value usu one you've computed. If its less than 5% chance that we could get a test statistic this extreme, we decide to reject the null hyp b/c the occurence of the statistic is so unus. A risk in making an error and the extent to which we'd like to think we risk leaving statistical validity unprotected.
 Construct
 Do the findings support the theory over competing theories. We hope our theory explains the nature of the data we analyze but we must submit our theory to scrutiny.
 External
 Can we generalize these results to other conditions? We need to be able to be assured our findings can be detected in other contexts besides our controlled lab setting or this particular set of partic.
 Internal
 Did the IV cause a change in the DV? We need to eliminate confounds which subtract from the potential explanatory power of the IV.
 Confounds: Maturation
 DV changes occur solely as a result of partic growing older or more experienced. Partic relevant to longit studies.

Confounds: History
Testing  DV changes due to variation in events outside the study's IV effects. Worst with large amount of time btwn pre and post tests. Testing: partic show dv changes due to practice in repeated measurements.
 Confounds: Instrumentation. Regression to the mean.
 DV changes are due to the measuring instrument's variability. Ex, changing observational criteria. Regression: the effect of initially high scorers showing score reductions and initially low scorers showing score incmore extreme to less.
 Selection. Attrition.
 Result of groups not randomly selected and assigned to groups; they're unequivalent. Attrition: partic drop out differentially across groups, causing biased results.

Confounds: Diffusion of treatment
Sequence effects  Partic in diff exper conditions share info from their manip and reduce exper diff across groups. Seq effects: in repeated measure designs or w/insubject designs, partic are exposed to diff exper conditions, but the order is constant, changes in DV may be caused by the order of the presentaton and the not the condition.
 Subject effects
 Deals w/ partic behaviors in a social setting like: curiosity, motivation, expectation and bias. Unnatural beh exhibited, acting good. Hawthorne effect: inc productivity after any special attention.
 Subj effects: demand characteristics. Placebo effects
 This is the term given to those cues given to partic usu unintentional by the researcher, that urge the partic to beh unnaturally. Placebo effects: demand char. Expectations of change, improvement w/out manip.
 Experimenter effects
 Deal w/ the experimenter influencing partic towoard an unexpected outcome, via manip, or cueing of some type, leading to bias. Ethical violations: another experimenter effect deals w/ only data which supports his hypoth or w/ outliers?
 General control procedures
 Prepare lab setting. Lighting, temp, sound. Make setting similar for all. Use same experimenter whenever possible.
 Exact vs systematic
 Identical in every way vs similar but w/ procedural changes.
 General Control Procedures⬦

Use stimuli with good reliability and validity
Reliability = stability or â€œrepeatabilityâ€ of results from test, scale, or stimulus
Validity = stimulus, test, or scale should give us confidence that weâ€™re measuring the â€œthingâ€ we think weâ€™re measuring; our test should correlate with other, similar measures of the same behavior  Replication: Systematic and Exact

Exact = identical in every way
Systematic = Similar, but with procedural changes, e.g.,
â€œWhen I read something, I find I must read it againâ€
vs.
nearly always  Conceptual Replication

Different studies are generated from the same problem statement
More common than systematic or exact replication
E.g., Study 1 on selfefficacy: qaires and reallife problem solving situation (observational)
Study 2 on selfefficacy: experimental manipulation and artificial but controlled problemsolving (experimental)  Control over Subject and Experimenter Effects

Single and doubleblind procedures
Single=RA is blind to manipulation/condition
Double=RA and participants are blind to manipulation/condition
Particularly important to have doubleblind studies in social psychology experiments
Singleblind studies are important for protecting against experimenter bias  Automation

Standardization of the studyâ€™s instructions using a computer, or video/audio tape
Computeraided scoring and recording of responses tremendously reduces clerical errors in data entry
Many studies now use PDAs to automate their data collection
Reduces bias, errors, saves money, reduces waste  Use Objective Measures

Use easily agreedupon responses if observational measures are taken
â€œDoes mother gaze at the infant, facetoface, for at least 10 seconds at a time?â€  Use Multiple Observers
 At least two (more is better) observers rate the behaviors being measured, and compare ratings with some â€œ% agreementâ€ (e.g., 90%).
 Deception

Deliberately withholding information from participants (and misleading them to believe the studyâ€™s purposes are very different from what they actually are)
Use only when necessary
Always debrief participants at end of study
E.g., â€œhelping behavior and altruismâ€ Batson study  Control Through Subject Selection and Assignment

Subject selection
Issues: proper sampling helps us increase generalizability of results
E.g., The Nun Study on Alzheimerâ€™s Disease
Good: careful selection; no smokers, no sex or reproduction; little/no drinking; similar vocations (teachers); group living
Bad: generalizability to whom???  Subject Selection⬦

Broad Terms:
Population â€” some larger group of interest
Sample â€” some smaller group drawn from a population; should be representative
Specific Terms:
General population  all persons
Target population  specific population of interest (e.g., Alzheimerâ€™s patients)
Accessible population  subset of population available to the researcher (e.g., 100 Alzheimerâ€™s patients from N. Virginia)  Subject Selection

Then, for our sample, of N=1000 people, we should randomly select:
540 who work in universities
180 in hospitals
120 in private practice
100 in industry
Good, but we still have the issue of study volunteers being unrepresentative of others (selection bias) 
Subject Selection
3. Ad hoc samples 
samples drawn from accessible populations
The sampleâ€™s characteristics define the population to which we can generalize
This type of sample is used most often in psychological research
E.g., â€œ100 female psychology students, aged 1820, from a southern universityâ€â€¦gather background information on the participants, and report it in our research article  Subject Assignment

Placing persons in experimental conditions
More important than random selection⬦especially for internal validity⬦5 different ways⬦
Random assignment
Use a random # table or generator
Random is not equal to haphazard
We may not end up with equal groups, but we can say the sources of bias have an equal likelihood of being spread across groups  2. Matched Random Assignment

Makes nâ€™s equal and balanced across conditions
E.g., â€œmemory in college studentsâ€ study;
2group design; one group gets mnemonic training;
we believe GPA could be a confound if allowed to run loose  3. Eliminate the variable

Elimination involves removing persons beyond a certain level of a variable
E.g., Aging study of visual perceptionmental transformation of a visual object
Persons having visual acuity of 20/20, 20/30, 20/40, and 20/50 are acceptable; those persons having acuity worse than 20/50 are eliminated from the study  4. Hold a variable constant

Constancy involves allowing persons in a study having only a given level of a variable
Same example of visual perception in this case, we allow only persons with 20/20 acuity to be in the studythe result gives us constancy  5. Put a variable in studyâ€™s design
 Build this variable into the design and examine the effect of the levels on the DV
 IV. Control through experimental design

Goal is to reduce threats to internal validity
We want to be able to say â€œThe IV manipulation caused a change in the DVâ€ and nothing elseâ€¦
The best way to have control in an experiment...Select participants in an unbiased way
e.g., stratified random sampling or ad hoc samples
Assign participants to conditions in an unbiased way
e.g., random assignment, or matched random assignment  Control through experimental design⬦

4. Use strong procedures for testing causal relationships
E.g., use a control group, use pilot testing, use prior research to support work
5. Use specific control procedures to reduce internal validity threats
E.g., random assignment, automation, objective measures, placebo, deception  Hypothesis testing: General steps
 We want to draw conclusions about our results by examining the probability of getting our results if the opposite of what we are predicting were true. Make questions into H0 and H1 about populations

Ex. Step 1
2. Determine characteristics of the comparison distribution 
We believe the Population 1 adults do not remember more words than Population 2 adults (hence, Population 1â€™s mean is less than or equal to Population 2â€™s mean).
2. Next, we ask, â€œWhat is the probability of obtaining a particular test statistic if the null hypothesis is true?â€
We assume the sample (here, 1 person) was selected from a distribution representing a true null hypothesis, approximately normal in form, with a mean of 8 and a standard deviation of 3.
The distribution to which you compare your sample (when the null hypothesis is true) is the comparison distribution  3. Determine the cutoff score to reject the null hypothesis

If the null hypothesis is true, then witnessing an adult remembering 14 words is very unlikelyâ€¦thatâ€™s TWO standard deviations above the meanâ€¦an extreme value associated with less than 2% probability of occurrenceâ€¦
Performing 2 standard deviations above the mean is rare, regardless of experimental trainingâ€¦ The cutoff score, here, is set in terms of normal Zunits
If the Zstatistic we compute is more extreme than +2 (in our example), we can reject the null hypothesis
Typically, we use a cutoff score that corresponds to a probability of 5%â€¦typically called alpha (a = .05).  4. Determine the sampleâ€™s score on comparison distribution

So, in our example, we need Z > +2 to reject the null hypothesis
So, weâ€™ve conducted our study on N = 1, and we see that our adult was able to remember 15 words
Now weâ€™ll compute Z and determine its standing on the comparison distribution. Our Z of + 2.33 indicates that the adult who remembered 15 words is 2.33 standard deviations above the population mean.  5. Reject null hypothesis or not?

We know we needed a Zscore in excess of +2 to reject the null hypothesisâ€¦we got +2.33â€¦so we can reject the null hypothesis.
Therefore, our research hypothesis was supportedâ€¦Pop 1 adults who received memory training remembered more words than Pop 2 adults who didnâ€™t receive training. If we would not have had a statistically significant test statistic, weâ€™d simply say our results were inconclusive, and that we failed to reject the null hypothesisâ€¦  Ways of framing hypotheses testing questions
 Onetailed tests and directional hypotheses. The researcher does not have an idea of which population has the higher mean, only that theyâ€™ll be different
 Hypothesis Testing Introduction

1. Make questions into H0 and H1
2. Determine characteristics of the comparison distribution
3. Determine cutoff score to reject H0 4. Determine sampleâ€™s score to reject H0
5. Reject H0 or not?  Hyp Testing Popul 2
 Pop. 2: Adults who do not receive such special memory training
 Hypothesis testing using a distribution of means

1. Mean of a distribution of means = pop mean.
2. Variance of a distribution of means = pop variance, divided by # of scores in each sample
3. Shape of a distribution of means is bellshaped and unimodal  Three kinds of distributions

1. Distribution of a population of persons
2. Distribution of a sample drawn from a population
3. Distribution of means of all possible samples of a particular size taken from that distribution  The distribution of means
 We want to compare this mean to something similara distribution of means. The Central Limit Theorem underlies several rules weâ€™ll address: As number of samples increases, the mean of the distribution of means approximates the population mean. As # of samples increases, high and low means cancel out each other. Rule 2: The variance of the distribution of means is the variance of the distribution of scores divided by the number of scores in each sample.
 Rule 2
 The standard deviation of a distribution of means is simply the square root of the variance of the distribution of means. **Remember this ! This standard error is the â€œaverage amount off we expect our sample mean to differ from the mean of the distribution of meansâ€
 Rule 3
 The shape of a distribution of means tends to be unimodal and bellshaped. It is at least approximately normal if there are at least 30 scores in each sample, OR if the population of scores is normal
 Single sample ttest
 Compare a single mean to a population mean when the population standard deviation (s) is not known
 Dependent means ttest
 Compare two linked means when the population standard deviation (s) is not known
 Single sample ttest⬦

We donâ€™t know the s of the population, only m
SO! We will estimate s of the population using our sample information
If the sample is randomly. Compute an unbiased estimate of the population variance in a way that boosts the sampleâ€™s variance a bitweâ€™ll do this by putting N â€“ 1 in the denominator of the variance equation  ttest for dependent means

Here, two scores from each person are compared (repeated measures designs, or withinsubjects designs), OR
two scores from matched persons are compared  When independent means ttests should be used:

when sigma (s) is unknown for the population distributions
two separately sampled means are being compared. Important point: the comparison distribution is a distribution of differences between means. With equal N, the two estimates are averaged, and we call this the pooled est of popul variance. Also have: the weighted avg est of pop var.  ***Assumptions of the independent means ttest

B. Variances are equal across populations
this is the â€œhomogeneity of varianceâ€ assumption
if â€œmoderateâ€ violations occur and N per sample is equal or nearly equal, then the ttest is robust to these violations. Examine the variances in your data.
if â€œvery largeâ€ differences in sample variances occur, the ttest could give inaccurate results. ((If both the normality and variance assumptions are not met, we may want to consider performing some nonparametric tests, or â€œdistributionfreeâ€ tests)) 
Effect Size
(d = .80 (large) means we expect (or, found) 4/5 of a standard deviation between the two means) 
These values should be evaluated in their absolute sense; i.e., obtaining an effect size of .80 is still considered large; their value is not bounded by â€“1 or +1
These effect sizes are symbolized by d, and are calculated as follows:
d= M/S. **Denominator of is relatively UNinfluenced by N, as for dep means t tests (M0/SM) and indep.  What shapes the statistical power in our studies?

Effect size â€“ the larger the effect size, the more power associated with it; itâ€™s easier to find a big difference than a small one, relative to the standard deviation
Sample size â€“ the larger the N, the more power we have; and, the smaller the standard error, which describes the spread of the distribution of means
Statistical significance level â€“ the less extreme the a, the easier it will be to reject H0
1 or 2tailed tests â€“ 1tailed tests provide less stringent cutoffs for rejecting H0  Statistical Power and Effect Size

Please note from these tables:
As N increases, so does power
As effect size (d) increases, so does power
1tailed tests have more power than their 2tailed counterparts
For the same effect size, a dependent means ttest has more power with half the sample size as that of an independent means ttest â€“ why ?
(and, not shown in these tablesâ€¦more extreme a levels will be more difficult to achieve, and will therefore have less power than a less extreme a level)  Hypothesis Testing Steps for CORRELATIONS
 df for t = N â€“ 2 = 20 â€“ 2 = 18Step 3: Get tcritical to reject H0 Step 4: Get sampleâ€™s score on the comparison distribution Step 5: Compare scores and make decision ***t computed is more extrme than t critical which happens to be. . .Write this part: As hyp, we found stat sig pos Pearson correl btwn x and y, r = .70, p<.01. Underline.