Learning theory for animal training (Work In Progress!)
Terms
undefined, object
copy deck
- CC
- Classical Conditionin
- OC
- Operant Conditioning
- Learning/Performance distinction
-
The distinction between knowing and doing.
Without motivation, an animal may not perform behavior it knows.
A mouse that is not hungry won't run a maze for cheese. - Behavior vs Knowledge
- You are dealing with BEHAVIOR and infer knowledge
- Motivation
- Forces which act on or within animal to activate and direct behavior.
- Four stages of learning
-
Acquisition
Fluency (automatic!)
Generalization (application)
Maintenance (forever) - Acquisition
-
First step of learning.
Animal learns a basic skill.
Animal must learn what is expected.
Trainer focuses on accuracy. - Fluency
-
Second step of learning.
With repetition, behavior becomes fluid, automatic.
Trainer can focus on enhancing response - Generalization
-
Animal learns that what is being learned is relevant in various contexts.
Generalization is rarely automatic. - Maintenance
-
Fourth stage of learning.
Repetition.
Forever.
Periodic reinforcement required to preserve proficiency and prevent
extinction. - Ethology
- The study of animal behavior
- Clever Hans
-
A horse who appeared to be able to do math by tapping foot.
Horse could only solve problem trainer could!
Trainer was raising eyebrows.
Pfungst discovered that people can unconsciously communicate information to others by subtle movements and that some animals can perceive these unconscious movements. - Parsimony
- Unless there is evidence to the contrary, you must account for a phenomenon with the simplest explanation available.
- Occam's Razor
-
The principle states that one should not make more assumptions than the minimum needed.
See Parsimony - Conditioning
- Learning
- Behavior/Response
-
Any action that can be observed and measured.
Barking, Sitting, Lying Down, etc. - Stimulus
- Any event that can be percieved by the animal.
- Consequence
-
An action or event that occurs AFTER a behavior.
A consequence can affect how often a behavior will occur in future. - Contingency
-
Depends on
If XXX then YYY
If my dog lies down, I'll throw the frisbee.
Throwing the frisbee is contingent upon lying down.
If LIE DOWN then FRISBEE
FRISBEE depends on LYING DOWN - Performance
-
Actual Behavior.
What you see is what you get.
Performance does NOT imply learning - Does performance imply learning?
- NO
- Appetitive/Positive
-
Good things.
Food
Sound
Play
Feelings
Dead Meat
Prey Drive - Aversive/Negative
-
Bad things
Pain
Annoyance
Uncomfortable
Loud
Leash jerk
Shock
Quick Movement - Theory
- An explanation
- Principles
- Principles are the rules outlined by a theory.
- Learning Principles
- The rules or laws governing learning
-
Classical Conditioning
(Defined) -
Associative learning
Pavlov
Animal learns to associate stimulus with a response.
See a FOO get a cookie.
Receiving a cookie is NOT contingent upon seeing a FOO -
Classical Conditioning
(Notation) -
CS(TONE) -> UCS(FOOD) -> UCR (SALIVA)
becomes
CS (TONE) -> CR(DROOL)
UCS(FOOD -> UCR - CS
-
Conditioned Stimulus
This is the stimulus that brings on a particular response after being paired with an unconditioned stimulus. The flashing light was this role in the experiment. It had an important effect on the dog's behaviour but only under a specific condition, it had been paired temporarily with the tasting of food. - UCS
-
Un Conditions Stimulus
This is a stimulus that automatically elicits an unconditional response. Pavlov's experiment had food as an unconditional stimulus. - UCR
-
Un Conditioned Response
It is the automatic response to an unconditional stimulus. An example of this is the automatic salivation of the dog in response to the food - CR
-
Conditioned Response
This refers to a response that the conditioned stimulus elicits, but only because it has previously been paired with the unconditioned stimulus. An example of this was the salivation of the dog in response to the light, this is the conditioned response. - CS presentation
- BEFORE the UCS
- In classical conditioning, is CR required when the UCS is presented?
- No.
- Train a dog to sneeze, growl, snarl or dig
-
Use Classical Conditioning:
Find something that makes behavior occur and precede it with a neutral stimulus - Neutral Stimulus
-
Sometimes called orienting stimulus :
does not elicit the response of interest: this stimulus is a neutral stimulus since it does not elicit the Unconditioned (or reflexive) Response. - Orienting Stimulus
-
Dog pays attention when presented, but does not yet mean anything.
Sometimes called Neutral Stimulus - Operant Conditioning
-
Instrumental Learning
An animal learns that behavior has consequences.
Things happen because we do things.
If you cook you eat.
Rolling over feels good.
Sitting when trainer says 'sit' gets a cookie.
Turn on the cookie machine. -
Operant Conditioning
(notation) - Sd -> R -> S
- Sd
-
Discriminating Stimulus
The context in which a response will grant a consequence.
Aside from an S(d), the events that occur are under the animal's control. - Thorndike Law of Effect
- If a consequence is pleasant the preceding behavior becomes more likely. If a consequence is unpleasant, the preceding behavior becomes less likely.
- A-B-C
-
Antecedent
Behavior
Consequence - Positive Reinforcement
-
R+
Present Something Good
Behavior More Likely - Positive Punishment
-
P+
Present Something Bad
Behavior Less Likely - Negative Punishment
-
P-
Take Away Something Good
Behavior is Less Likely - Negative Reinforcement
-
R-
Take Away Something Bad
Behavior is More Likely -
Four consequences
(the final S in Sd -> R -> S) -
R+ Positive Reinforcement
P+ Positive Punishment
R- Negative Reinforcement
P- Negative Punishment -
Positive Reinforcement
(Example) - I get a flashcard right, I get a raisin.
-
Negative Reinforcement
(Example) -
Do homework to avoid nagging.
Work late to avoid housework.
Dog heels to avoid yanking -
Positive Punishment
(Example) -
Pee on floor get hit
Drink and drive go to jail
Stop paying attention and dog bites -
Negative Punishment
(Example) -
Time out. (TO)
Dog plays rough. Play stops.
Drink/Drive. Lose License. - TO
- Time Out
- Reinforcement makes behavior (more or less) likely.
- More
- Punishment makes behavior (more or less) likely.
- Less
- Distinction between classical and operant behavior
-
Classical: UCS presented regardless of what animal does.
Operant: Some behavior or response is required for consequence. - Pizza example, CC vs OC
-
No matter how much money you get paid to NOT eat a pizza, you will not be able to stop drooling when you see and smell the pizza.
CC UCS's are involuntary or reflexive. - When CC are at odds with OC, who wins?
-
CC: Misbehavior of Organisms.
Reflexive behavior will get in the way of learning and what is intended as OC may elicit reflexive responses.
A squirrel or barking may prevent your dog from performing well conditioned behaviors on cue. - Habituation
- Learning not to react to stimuli
- Sensitization
- Becoming more sensitive to stimuli, especially with emotional reactions
- Habituation: Weak vs Intense stimulus
-
Weak stimulus best for habituation.
Usually. - Sensitization: Weak vs Intense stimulus
-
Intense Stimuli leads to sensitization.
Usually. - Adaptation
-
Similar to habituaion.
BUT - adaptation is physical process of tiring.
Scent, Visual. - Learned Irrelevance
-
When a stimulus is presented without consequence the behavior won't happen.
A dog will learn to ignor things that are of no importance, and attend to things that are.
Sit. Sit. Sit. Sit. Sit. becomes white noise.
May persist forever! - Spontaneous Recovery
- When a previously habituated stimulus again causes a reaction (doorbell)
- Does habituation have spontantous recovery?
- Yes
- Does Learn Irrelvance have spontanesou recovery
- No
- Factors impacting learning
-
Deprivation Level
Reward
Contrast Effect
Jackpots
Reinforcer Sampling - Deprivation Level
-
A reinforcer is likely to be more effective if the dog 'needs' it.
Attention
Food
Water
Play - Contrast Effect
-
A better reward may increase learning. (Kibble to Liver)
A lesser reward may decrease learning. (Liver to Kibble) - Quantity vs Reward Size
-
More smaller treats more effective.
A dog can count but he can't weigh! - Using high value rewards always can impact traing how?
-
A mouse that always gets cheese will run mazes slower than one who gets kibble and random cheese awards.
A mouse accustomed to kibble who gets cheese will run maze faster. - Positive Behavior Contrast
- Getting a great reward will improve behavior.
- Negative Behavior Contrast
- Getting a lesser reward will reduce behavior.
- Jackpot
-
Reward for excellence.
NOT a noncontingent reward to get motivation - Reinforcement sampling
-
Let animal know what's coming.
A reason to perform well.
This is what you'll get when you eat your veggies. - Grandma's rule
-
PREMACK
Eat your veggies before dessert and finish your homework before moving on to the fun stuff - Jumpstart
-
Reward to motivate.
Contrast with Jackpot, reward for excellence. - Novelty
-
If a very familiar stimulus is used as the CS, the animal will learn much more slowly than if a novel stimulus is used.
Kibble sucks. - CS-Prexposure effect
-
Learned Irrelevance
If an animal has already been exposed to a stimulus and its has not been paired with anything meaningful, it becomes meaningless. -
Timing - Classical Conditioning
Inter-stimulus Interval -
CS -> UCS
CS MUST appear before UCS for learning to occur. - Timing - Operant Conditioning
-
Time between R -> S
Must be less than a second!
Primary/Secondary reforcer
(Food/Click) can help make this less critical by presenting the Sr -
Primary Reinforcer
(Examples) -
Food
Water
Touch
Play
Drive -
Secondary Reinforcer
(Examples) -
Click
Yes!
Good!
Etc - Primary Reinforcer
-
Something an animal intrinsically likes.
Food, water, hugs. - Secondary Reinforcer
- Something that is meaningless to the dog that has become associated with a primary reinforcer and thus important.
- Establishing a Secondary Reinforcer
-
Repeat the following until dog turns head every time:
1). Click
2). Reward
Do NOT require any behavior beyond head turn/orientation -
Prey drive sequence
9 Steps -
Orient
Stare
Stalk
Chase
Grab
Bite
Kill
Dissect
Consume - CRF
-
Continuous Reinforcement Schedule
Reinforce every trial
Best for new behavior - PRF
-
Partial (Intermittent) Reinforcement Schedule
Behavior is reinforced after certain responses -
Intermittend Reinforcement Schedule (PRF)
Examples -
Fixed Ratio : FR
Variable Ratio: VR
Random Ratio: RR
Fixed Interval: FI
Variable Interval: VI - Differential Reinforcement Schedule
-
Certain rates of responding or certain types of responding are reinforced.
Differential Rate: Depends on how long after the preceding response.
DRH - Differential Reinforcement of high rates of behavior.
DRL - Differential Reinfocement of Low Rates of Behavior - DRH
-
Differential Reinforcement of High Rates of Reinforcement.
Animal is only reinforced if it responds BEFORE a certain interval.
NOT USEFUL - DRL
-
Differential reinforcement of low rates of behavior
Animal is only reinforced if it responds AFTER a certain interval.
NOT USEFUL - RR
-
Random Ratio
Increased drive perhaps due to frustration at expected reward.
Must be truly random. - Free Operant Behavior
-
Behavior that is not prompted but is rewarded.
Eye Contact during heeling. - FR
-
Fixed Reinforcement
Reinforce every N times
FR-5,for example.
Very high rate of performance except RIGHT AFTER receiving
post reinforcement pause/scallop - post reinforcement pause
- After receiving reinforcement an animal may decrease performance a bit.
- VR
-
Variable Rate of Reinforcement.
VR-5 means an average of one in five times gets reinforced.
Very effective.
Low post-reinforcement.
Slot Machine.
Sales Commission - Slot Machine
- Variable Reinforcement Schedule
- Ratio Strain
- On a variable reinforcement schedule, when an animal starts to shut down if not reinforced often enough.
- FI
-
Fixed Interval Schedule
FI-5 reinforced for first response AFTER five seconds. - VI
-
Variable Interval
VI-5 - On average, response will be rewarded ??? - Limited Hold
-
The time interval that a reinforcement is available.
Example, you have to eat lunch at the cafeteria from 12-1. - DRI
-
Differential Response of Incompatible Behavior
Response type schedule
Reward only incompatible behaviors.
EG: Reward sitting not jumping - DRO
-
Differential rewarding of Other behavior
Reward ANY other behavior.
EG: Reward anything other than barking or lunging. - DRE
-
Differential Reinforcement of EXCELLENT behaviors.
Reinforce only the best.
Use during maintenance - Duration Schedule
-
Watch Me
Down Stay
Dog reinforced during after a specified interval.
Fixed Interval
Random Interval - Teaching Stay or Wait, what is best schedule to use
- Slowly raise criteria on a duration schedule.
- Best reinforcement schedule to use for basic behaviors
-
Start with CRF
Move to VR or RR - Best reinforcement for complex behaviors
-
DRE
DOGS VARY! - Best reinforcement schedule for problem behaviors
-
DRL, DRO, DRI
DOGS VARY! - Premack
-
Grandma's rule
The opportunity to engage in some activities may be reinforcing for others.
Juno likes to sit in a chair during ralley obedience. - Best reinforcement scheudle for classical conditioning is?
- CRF
- Stimulus Control
-
Generalization
Discrimination - Overshadowed
-
A more salient stimulus (squirrel) may overshadow
Less salient (hotdog)
or even less salient(pat) - Prevent blocking
-
By not presenting cues at the same time.
Present new cues FIRST, then old cues. - Say the command once (because...)
- Everytime the dog hears sit and doesn't get rewarded, it degrades the significance of the Sd to the Sr+
- Preparedness
-
The tendency to associate certain types of stimuli more readily than others.
E.G. - Sound to Pain, Food to Illness
High pitch sounds to fast motion,
Low pitch sounds to slow motion - Learning Sets
-
When a dog learns the rules of the game.
For example, learning to match things that they just saw together vs things they did not??? - Experimental Neurosis
-
Asking a dog to do incompatible things. Can induce real problems.
The gaurd dog asked to stop attacking on hand raise sees theif raise chair and shuts down. - Extinction
- Learning that a CS does not result in an UCS. Responding declines.
- Extinction Burst
- Increase in response, frustration as stimulus no longer produces response.
- Spontanous Recovery
- Recovery of a behavior after it has become 'extinct' .
- Partial Reinforcement Extinction Effect
-
PREE
In CRF - Extinction happens quickly.
In VRF Extinction happens slowly.
Does not happen on CC - PREE
- Partial Reinforcement Extinction Effect
- Does training transfer knowledge?
- No, it changes probabilities.
- What makes a trained dog sit when you say sit?
- A history of reinforcement for sitting in response to the stimulus 'sit'.
- Good thing starts
- Positive reinforcement
- Good thing ends
- Negative punishment
- Bad thing starts
- Positive Punishment
- Bad thing ends
- Negative reinforcement
- Training changes probabilities not ___________
- knowledge
- Rules for Good Desensitization
- "1. Stay under threshold
- Blocking
- An already learned cue is attended to
- Overshadowing
- The more salient element in a compund is learned only
- Good CER
- "1: Order of events,
- Spooky dogs
- Dogs that have been working dogs until recently: Working, Guard, Flock Guard, toy
- P Value
- Probability that differences between groups occurred by chance. Usually done by comparing differences between a control group and a studied group and controlling all non testing variables. A p value of .05 means there is a 5% chance the difference between the groups is due to chance.
- inclusive fitness
- Genes are instructing her to save the copies of themselves tored in her kittens
- why are we here
- Each of us is descended from an unbroken line of successful reproducers
- meta communications
- using behavior to indicate what the following behavior is : a play bow indicates the next ripping run is a play move.
- Four F's of behavior
- Food, Fear, Fight an Sex : Adaptive significance, can fuel other
- Group hunting's genetic legacy
- Impacts socially facilitiated predation ⬦
- Aggression reason for being
- To displace individuals
- Fear .. Bred?
- Fear can be bred for, yes.
- Fear in puppies
- Genetics, Prenatal, Neonatal
- Fear in adults
- Genetics, Prenatal, Neonatal, socialization, sentization
- Dog human aggression
- "Strangers,
- Dog Human Aggression treatment
- "1. Habituation;
- When to use flooding?
- Puppy Mill Rescues, may be only choice ⬦
- Desensitization
- Exposure at sub threshold level so no fear is evoked, gradually increased
- DRI Strangers/Dogs
- Sit/Watch
- DRI Guarding
- Retrieve
- DRI Handling
- Offering Body Part
- DRI Sofa Guarding
- Voluntarily vacating locations
- Differential OC/CC Fear/Aggression strategies
- "OC: DRI - Operant ⬦ dog reinforced if he gives correct response ;
- Difficult prognosis indicators aggression cases
- hard mouth, strangers, port client compliance, explosive without a threat, large dog (>30lsb)
- Good prognosis indicators aggression cases
- **soft mouth, resource guarding, protracted warnings, ** committed owners, plastic dog, small dog (30lbs)
- Serious bite level #
- "IV - VII
- Pressure or puncture more serious
- Pressure
- Less serious bite levels
- "I-III :
- Stranger aggresison is hard to fix because
- "1). Recruiting;
- Assessing bites: Bite History Incident
- "Victim Characteristics;
- Fear case prognosis
- Slow moving - months and years ⬦. Younger the better
- Dog-Dog reasons for problems
- "Undersocialized⬦
- Dog-Dog-Fix: Easy
- Tarzan, guarding - mild, bullying
- Dog-Dog-Fix: Good
- "Play Skill Deficit;
- Dog-Dog-Fix: Harder
- "Proximity Sensitive: Severe;
- Dog-Dog-Fix: Very hard
- "Compulsive;
- Predatory Drift
- Size mismatch, double team, panic
- ABI
- ??? Bite Inhibition
- Family aggression usually manifests because
- "Resource guarding,