Learning theory for animal training (Work In Progress!)

Start Studying!

Terms

undefined, object

copy deck

CC: Classical Conditionin
OC: Operant Conditioning
Learning/Performance distinction: The distinction between knowing and doing.

Without motivation, an animal may not perform behavior it knows.

A mouse that is not hungry won't run a maze for cheese.
Behavior vs Knowledge: You are dealing with BEHAVIOR and infer knowledge
Motivation: Forces which act on or within animal to activate and direct behavior.
Four stages of learning: Acquisition

Fluency (automatic!)

Generalization (application)

Maintenance (forever)
Acquisition: First step of learning.

Animal learns a basic skill.

Animal must learn what is expected.

Trainer focuses on accuracy.
Fluency: Second step of learning.

With repetition, behavior becomes fluid, automatic.

Trainer can focus on enhancing response
Generalization: Animal learns that what is being learned is relevant in various contexts.

Generalization is rarely automatic.
Maintenance: Fourth stage of learning.

Repetition.

Forever.

Periodic reinforcement required to preserve proficiency and prevent
extinction.
Ethology: The study of animal behavior
Clever Hans: A horse who appeared to be able to do math by tapping foot.

Horse could only solve problem trainer could!

Trainer was raising eyebrows.

Pfungst discovered that people can unconsciously communicate information to others by subtle movements and that some animals can perceive these unconscious movements.
Parsimony: Unless there is evidence to the contrary, you must account for a phenomenon with the simplest explanation available.
Occam's Razor: The principle states that one should not make more assumptions than the minimum needed.

See Parsimony
Conditioning: Learning
Behavior/Response: Any action that can be observed and measured.

Barking, Sitting, Lying Down, etc.
Stimulus: Any event that can be percieved by the animal.
Consequence: An action or event that occurs AFTER a behavior.

A consequence can affect how often a behavior will occur in future.
Contingency: Depends on

If XXX then YYY

If my dog lies down, I'll throw the frisbee.

Throwing the frisbee is contingent upon lying down.

If LIE DOWN then FRISBEE

FRISBEE depends on LYING DOWN
Performance: Actual Behavior.

What you see is what you get.

Performance does NOT imply learning
Does performance imply learning?: NO
Appetitive/Positive: Good things.

Food
Sound
Play
Feelings
Dead Meat
Prey Drive
Aversive/Negative: Bad things

Pain
Annoyance
Uncomfortable
Loud
Leash jerk
Shock
Quick Movement
Theory: An explanation
Principles: Principles are the rules outlined by a theory.
Learning Principles: The rules or laws governing learning
Classical Conditioning (Defined): Associative learning

Pavlov

Animal learns to associate stimulus with a response.

See a FOO get a cookie.

Receiving a cookie is NOT contingent upon seeing a FOO
Classical Conditioning (Notation): CS(TONE) -> UCS(FOOD) -> UCR (SALIVA)

becomes

CS (TONE) -> CR(DROOL)
UCS(FOOD -> UCR
CS: Conditioned Stimulus

This is the stimulus that brings on a particular response after being paired with an unconditioned stimulus. The flashing light was this role in the experiment. It had an important effect on the dog's behaviour but only under a specific condition, it had been paired temporarily with the tasting of food.
UCS: Un Conditions Stimulus

This is a stimulus that automatically elicits an unconditional response. Pavlov's experiment had food as an unconditional stimulus.
UCR: Un Conditioned Response

It is the automatic response to an unconditional stimulus. An example of this is the automatic salivation of the dog in response to the food
CR: Conditioned Response

This refers to a response that the conditioned stimulus elicits, but only because it has previously been paired with the unconditioned stimulus. An example of this was the salivation of the dog in response to the light, this is the conditioned response.
CS presentation: BEFORE the UCS
In classical conditioning, is CR required when the UCS is presented?: No.
Train a dog to sneeze, growl, snarl or dig: Use Classical Conditioning:

Find something that makes behavior occur and precede it with a neutral stimulus
Neutral Stimulus: Sometimes called orienting stimulus :

does not elicit the response of interest: this stimulus is a neutral stimulus since it does not elicit the Unconditioned (or reflexive) Response.
Orienting Stimulus: Dog pays attention when presented, but does not yet mean anything.

Sometimes called Neutral Stimulus
Operant Conditioning: Instrumental Learning

An animal learns that behavior has consequences.

Things happen because we do things.

If you cook you eat.

Rolling over feels good.

Sitting when trainer says 'sit' gets a cookie.

Turn on the cookie machine.
Operant Conditioning (notation): Sd -> R -> S
Sd: Discriminating Stimulus

The context in which a response will grant a consequence.

Aside from an S(d), the events that occur are under the animal's control.
Thorndike Law of Effect: If a consequence is pleasant the preceding behavior becomes more likely. If a consequence is unpleasant, the preceding behavior becomes less likely.
A-B-C: Antecedent
Behavior
Consequence
Positive Reinforcement: R+

Present Something Good
Behavior More Likely
Positive Punishment: P+

Present Something Bad
Behavior Less Likely
Negative Punishment: P-

Take Away Something Good
Behavior is Less Likely
Negative Reinforcement: R-

Take Away Something Bad
Behavior is More Likely
Four consequences (the final S in Sd -> R -> S): R+ Positive Reinforcement
P+ Positive Punishment
R- Negative Reinforcement
P- Negative Punishment
Positive Reinforcement (Example): I get a flashcard right, I get a raisin.
Negative Reinforcement (Example): Do homework to avoid nagging.

Work late to avoid housework.

Dog heels to avoid yanking
Positive Punishment (Example): Pee on floor get hit

Drink and drive go to jail

Stop paying attention and dog bites
Negative Punishment (Example): Time out. (TO)

Dog plays rough. Play stops.

Drink/Drive. Lose License.
TO: Time Out
Reinforcement makes behavior (more or less) likely.: More
Punishment makes behavior (more or less) likely.: Less
Distinction between classical and operant behavior: Classical: UCS presented regardless of what animal does.

Operant: Some behavior or response is required for consequence.
Pizza example, CC vs OC: No matter how much money you get paid to NOT eat a pizza, you will not be able to stop drooling when you see and smell the pizza.

CC UCS's are involuntary or reflexive.
When CC are at odds with OC, who wins?: CC: Misbehavior of Organisms.

Reflexive behavior will get in the way of learning and what is intended as OC may elicit reflexive responses.

A squirrel or barking may prevent your dog from performing well conditioned behaviors on cue.
Habituation: Learning not to react to stimuli
Sensitization: Becoming more sensitive to stimuli, especially with emotional reactions
Habituation: Weak vs Intense stimulus: Weak stimulus best for habituation.

Usually.
Sensitization: Weak vs Intense stimulus: Intense Stimuli leads to sensitization.

Usually.
Adaptation: Similar to habituaion.

BUT - adaptation is physical process of tiring.

Scent, Visual.
Learned Irrelevance: When a stimulus is presented without consequence the behavior won't happen.

A dog will learn to ignor things that are of no importance, and attend to things that are.

Sit. Sit. Sit. Sit. Sit. becomes white noise.

May persist forever!
Spontaneous Recovery: When a previously habituated stimulus again causes a reaction (doorbell)
Does habituation have spontantous recovery?: Yes
Does Learn Irrelvance have spontanesou recovery: No
Factors impacting learning: Deprivation Level

Reward

Contrast Effect

Jackpots

Reinforcer Sampling
Deprivation Level: A reinforcer is likely to be more effective if the dog 'needs' it.

Attention

Food

Water

Play
Contrast Effect: A better reward may increase learning. (Kibble to Liver)

A lesser reward may decrease learning. (Liver to Kibble)
Quantity vs Reward Size: More smaller treats more effective.

A dog can count but he can't weigh!
Using high value rewards always can impact traing how?: A mouse that always gets cheese will run mazes slower than one who gets kibble and random cheese awards.

A mouse accustomed to kibble who gets cheese will run maze faster.
Positive Behavior Contrast: Getting a great reward will improve behavior.
Negative Behavior Contrast: Getting a lesser reward will reduce behavior.
Jackpot: Reward for excellence.

NOT a noncontingent reward to get motivation
Reinforcement sampling: Let animal know what's coming.

A reason to perform well.

This is what you'll get when you eat your veggies.
Grandma's rule: PREMACK

Eat your veggies before dessert and finish your homework before moving on to the fun stuff
Jumpstart: Reward to motivate.

Contrast with Jackpot, reward for excellence.
Novelty: If a very familiar stimulus is used as the CS, the animal will learn much more slowly than if a novel stimulus is used.

Kibble sucks.
CS-Prexposure effect: Learned Irrelevance

If an animal has already been exposed to a stimulus and its has not been paired with anything meaningful, it becomes meaningless.
Timing - Classical Conditioning Inter-stimulus Interval: CS -> UCS

CS MUST appear before UCS for learning to occur.
Timing - Operant Conditioning: Time between R -> S

Must be less than a second!

Primary/Secondary reforcer

(Food/Click) can help make this less critical by presenting the Sr
Primary Reinforcer (Examples): Food
Water
Touch
Play
Drive
Secondary Reinforcer (Examples): Click
Yes!
Good!
Etc
Primary Reinforcer: Something an animal intrinsically likes.

Food, water, hugs.
Secondary Reinforcer: Something that is meaningless to the dog that has become associated with a primary reinforcer and thus important.
Establishing a Secondary Reinforcer: Repeat the following until dog turns head every time:
1). Click
2). Reward

Do NOT require any behavior beyond head turn/orientation
Prey drive sequence 9 Steps: Orient
Stare
Stalk
Chase
Grab
Bite
Kill
Dissect
Consume
CRF: Continuous Reinforcement Schedule

Reinforce every trial

Best for new behavior
PRF: Partial (Intermittent) Reinforcement Schedule

Behavior is reinforced after certain responses
Intermittend Reinforcement Schedule (PRF) Examples: Fixed Ratio : FR
Variable Ratio: VR
Random Ratio: RR
Fixed Interval: FI
Variable Interval: VI
Differential Reinforcement Schedule: Certain rates of responding or certain types of responding are reinforced.

Differential Rate: Depends on how long after the preceding response.

DRH - Differential Reinforcement of high rates of behavior.

DRL - Differential Reinfocement of Low Rates of Behavior
DRH: Differential Reinforcement of High Rates of Reinforcement.

Animal is only reinforced if it responds BEFORE a certain interval.

NOT USEFUL
DRL: Differential reinforcement of low rates of behavior

Animal is only reinforced if it responds AFTER a certain interval.

NOT USEFUL
RR: Random Ratio

Increased drive perhaps due to frustration at expected reward.

Must be truly random.
Free Operant Behavior: Behavior that is not prompted but is rewarded.

Eye Contact during heeling.
FR: Fixed Reinforcement

Reinforce every N times

FR-5,for example.

Very high rate of performance except RIGHT AFTER receiving

post reinforcement pause/scallop
post reinforcement pause: After receiving reinforcement an animal may decrease performance a bit.
VR: Variable Rate of Reinforcement.

VR-5 means an average of one in five times gets reinforced.

Very effective.

Low post-reinforcement.

Slot Machine.

Sales Commission
Slot Machine: Variable Reinforcement Schedule
Ratio Strain: On a variable reinforcement schedule, when an animal starts to shut down if not reinforced often enough.
FI: Fixed Interval Schedule

FI-5 reinforced for first response AFTER five seconds.
VI: Variable Interval

VI-5 - On average, response will be rewarded ???
Limited Hold: The time interval that a reinforcement is available.

Example, you have to eat lunch at the cafeteria from 12-1.
DRI: Differential Response of Incompatible Behavior

Response type schedule

Reward only incompatible behaviors.

EG: Reward sitting not jumping
DRO: Differential rewarding of Other behavior

Reward ANY other behavior.

EG: Reward anything other than barking or lunging.
DRE: Differential Reinforcement of EXCELLENT behaviors.

Reinforce only the best.

Use during maintenance
Duration Schedule: Watch Me
Down Stay

Dog reinforced during after a specified interval.

Fixed Interval

Random Interval
Teaching Stay or Wait, what is best schedule to use: Slowly raise criteria on a duration schedule.
Best reinforcement schedule to use for basic behaviors: Start with CRF
Move to VR or RR
Best reinforcement for complex behaviors: DRE

DOGS VARY!
Best reinforcement schedule for problem behaviors: DRL, DRO, DRI

DOGS VARY!
Premack: Grandma's rule

The opportunity to engage in some activities may be reinforcing for others.

Juno likes to sit in a chair during ralley obedience.
Best reinforcement scheudle for classical conditioning is?: CRF
Stimulus Control: Generalization

Discrimination
Overshadowed: A more salient stimulus (squirrel) may overshadow

Less salient (hotdog)

or even less salient(pat)
Prevent blocking: By not presenting cues at the same time.

Present new cues FIRST, then old cues.
Say the command once (because...): Everytime the dog hears sit and doesn't get rewarded, it degrades the significance of the Sd to the Sr+
Preparedness: The tendency to associate certain types of stimuli more readily than others.

E.G. - Sound to Pain, Food to Illness

High pitch sounds to fast motion,

Low pitch sounds to slow motion
Learning Sets: When a dog learns the rules of the game.

For example, learning to match things that they just saw together vs things they did not???
Experimental Neurosis: Asking a dog to do incompatible things. Can induce real problems.

The gaurd dog asked to stop attacking on hand raise sees theif raise chair and shuts down.
Extinction: Learning that a CS does not result in an UCS. Responding declines.
Extinction Burst: Increase in response, frustration as stimulus no longer produces response.
Spontanous Recovery: Recovery of a behavior after it has become 'extinct' .
Partial Reinforcement Extinction Effect: PREE

In CRF - Extinction happens quickly.

In VRF Extinction happens slowly.

Does not happen on CC
PREE: Partial Reinforcement Extinction Effect
Does training transfer knowledge?: No, it changes probabilities.
What makes a trained dog sit when you say sit?: A history of reinforcement for sitting in response to the stimulus 'sit'.
Good thing starts: Positive reinforcement
Good thing ends: Negative punishment
Bad thing starts: Positive Punishment
Bad thing ends: Negative reinforcement
Training changes probabilities not ___________: knowledge
Rules for Good Desensitization: "1. Stay under threshold
Blocking: An already learned cue is attended to
Overshadowing: The more salient element in a compund is learned only
Good CER: "1: Order of events,
Spooky dogs: Dogs that have been working dogs until recently: Working, Guard, Flock Guard, toy
P Value: Probability that differences between groups occurred by chance. Usually done by comparing differences between a control group and a studied group and controlling all non testing variables. A p value of .05 means there is a 5% chance the difference between the groups is due to chance.
inclusive fitness: Genes are instructing her to save the copies of themselves tored in her kittens
why are we here: Each of us is descended from an unbroken line of successful reproducers
meta communications: using behavior to indicate what the following behavior is : a play bow indicates the next ripping run is a play move.
Four F's of behavior: Food, Fear, Fight an Sex : Adaptive significance, can fuel other
Group hunting's genetic legacy: Impacts socially facilitiated predation ⬦
Aggression reason for being: To displace individuals
Fear .. Bred?: Fear can be bred for, yes.
Fear in puppies: Genetics, Prenatal, Neonatal
Fear in adults: Genetics, Prenatal, Neonatal, socialization, sentization
Dog human aggression: "Strangers,
Dog Human Aggression treatment: "1. Habituation;
When to use flooding?: Puppy Mill Rescues, may be only choice ⬦
Desensitization: Exposure at sub threshold level so no fear is evoked, gradually increased
DRI Strangers/Dogs: Sit/Watch
DRI Guarding: Retrieve
DRI Handling: Offering Body Part
DRI Sofa Guarding: Voluntarily vacating locations
Differential OC/CC Fear/Aggression strategies: "OC: DRI - Operant ⬦ dog reinforced if he gives correct response ;
Difficult prognosis indicators aggression cases: hard mouth, strangers, port client compliance, explosive without a threat, large dog (>30lsb)
Good prognosis indicators aggression cases: **soft mouth, resource guarding, protracted warnings, ** committed owners, plastic dog, small dog (30lbs)
Serious bite level #: "IV - VII
Pressure or puncture more serious: Pressure
Less serious bite levels: "I-III :
Stranger aggresison is hard to fix because: "1). Recruiting;
Assessing bites: Bite History Incident: "Victim Characteristics;
Fear case prognosis: Slow moving - months and years ⬦. Younger the better
Dog-Dog reasons for problems: "Undersocialized⬦
Dog-Dog-Fix: Easy: Tarzan, guarding - mild, bullying
Dog-Dog-Fix: Good: "Play Skill Deficit;
Dog-Dog-Fix: Harder: "Proximity Sensitive: Severe;
Dog-Dog-Fix: Very hard: "Compulsive;
Predatory Drift: Size mismatch, double team, panic
ABI: ??? Bite Inhibition
Family aggression usually manifests because: "Resource guarding,

Start Studying!

Deck Info

Number of cards 172