# Biostats Lec 3

## Terms

undefined, object
copy deck
Random
- equal likelihood of something happening
Randomization
- you strive for randomization whenever possible.
- this way, you prevent bias
Predictability and randomness
- if something is predictable, it is not random. The lottery is not predictable because it is random
Block randomization (see textbook)
- If you have two groups, control and treatment and you are randomly assigning participants to each group based upon probability (coin toss) and assign heads to treatment and tails to control and after 96 tosses, you get 50 heads and 46 tails. Then the last four participants WILL have to be assigned to the control group - this is not random. Instead you do block randomization: you set a block size usually about 4,6, or 8: small enough so that you don't have to worry about the previous situation of assigning remaining participants to a particular group, and large enough so that the block size is not easy to guess.
- ivide your study up into blocks and then randomize within each block. If the study ends at a block boundary, then you are guaranteed to have perfect balance between treatments and controls. If a study ends in the middle of a block, there might be some imbalance, but probably much less than with regular randomization.
Ethical issues of randomization
- Doctor is supposed to give patient the best treatment he think he/she can get.
- However, with clinical randomized trials, it is important to randomize (often within blocks)
Equipoise
- you have to assume equal efficiency of treatment and control treatments in order to remain ethical in using coin tosses to assign patients to group
- however, this is antithetical to the purpose of the study because the study seeks to improve upon extant (control) treatment
What happens when you try to use simple random sampling (SRS) from an heterogeneous population?
- aspirin factory example
- three different shifts, each shift including different amounts of ASA in the aspirin.
- SRS (as with random number tables) could result in picking more from one type of shift resuting in a biased inference about the population.
- therefore if more samples come from the morning shift, you would infer that the ASA content is too high.
Covariates
- variables other than what you are studying.
- shifts would be a covariate when trying to study the variable, ASA content.
- when covariates are present, population is heterogenous with respect to these covariates: simple random sampling does not work when covariates are present.
Stratified Random Sampling
- One way to analyze heterogenous populations
- separate popultion into strata or groups which have homogeniety amongst themselves, but heterogeneity between them
- ex. democrats, republicans, male, female, old, young, etc.
- then, you do SRS within each strata because you need homogeneity to do SRS
Proportional Allocation
- Allocate sample size according to strata size, relative to the population.
Stratified Randomization
- Clinical Research Analog of Stratified Random Sampling
- Note everyone responds to treatment the same way (non-homogeniety)
- If men do worse with a treatment and more men are in the trial then sample is biased (in this case, gender is a covariate with the variable of interest, which is the response to treatment)
- Thus you would divide participants into males and females and perhaps old and young (so you have four possible strata - OM, OF, YM, YF) and then randomly select partcipants from each, using proportional allocation.
- You must limit number of strata (covariates) to about 2 or 3 because you can keep making divisions until there is only 1 person in each group.
Clustered Random Sampling
Stratified Random Sampling is expensive because you have to find people that fit into each strate
- an alternative is to do clustered random sampling in which you group people according to where they are (not what/ whom they are)
- Clusters are the opposite of strata in that within a cluster, you will have heterogeneity and between clusters, you will have homogeneity.
- If the aforementioned two assumptions are met, a clustered random sampling can be done.
- First, You do SRS of clusters. Then you do SRS from each cluster (these two, together, are called 2 stage cluster sampling)
Hawthorne Effect and solution
- People change their behavior when they realize that someone is watching them (analogous to Heisenberg uncertainty principle which states that things change when we observe them)
- Always present in research
- The way you address this is by having controls and assuming that they Hawthorne effect is the same with treatment and control group and that they would cancel out, allowing you to analyze the differences between the two groups.
Cluster Randomization
- Clinical research analog of Clustered random sampling
- You take several clinics and assign them as either giving the treatment or the control
- this takes care of the contamination effect because if two people in the same clinic were subjected to different treatments, one could be persuaded by other to follow the other's course of treatment if it seems to be doing better for him (Quit smoking campaign at VA)
- this is different from stratified randomization because the random process occurs at the level of the clinics, not at the level of the individuals.
Systematic Random Sampling
- Use when you have a dynamic population; one that is moving literally or figuratively.
- Ex. you are sitting in front of a conveyor belt and want to sample 25 aspirin bottles out of a population of 1000 bottles for their ASA content. Since the conveyor belt is moving with respect to you, it qualifies for systematic random sampling (another ex. thumbing through and scrolling through computer files).
- 25/1000 corresponds to picking 1 bottle per 40 bottles.
- picking the first, 41st, 81st is not random because factory workers could just make those bottles with the right amount of aspirin.
- the random part is picking a number between 1 and 40, like 27.
- the systematic part is to pick 27,67,107,147,...
- systematic part makes it easier than SRS.
- will not work if you have periodicity or cyclic nature in your population. for example, if period was 40 bottles, every 40th bottle will have the same properties.

15