What is operant dog training?
From the classical conditioned reflex named after I.P. Pavlov, this reflex differs in that it is based on the active purposeful activity of the animal, caused by some kind of need. And reinforcement at the same time is the result of this very active and purposeful activity. While with the classical conditioned reflex, the reinforcement is the unconditioned, or simply the second stimulus.
Operant learning was discovered by the American scientist E.L. Thorndike thanks to the intelligence of cats and dogs. The fact is that Thorndike, figuring out the ability of animals to learn, designed a special cage equipped with a door with a simple lock. Closing cats and dogs in this cage, he watched with the healthy gloating of a scientist as his smaller brothers learned to open this door. And the younger brothers and sisters learned to open the door by making various attempts, some of which were successful, and some were not. Therefore, Thorndike called the form of learning he discovered “trial and error.”
A reflex, however, this form of learning was dubbed much later by another well-known American scientist, B.F. Skinner, who devoted his entire scientific life to it. That is why, among the several fathers of the operant reflex, Skinner is considered the main father. However, in fairness, we note that for the first time in the world, training based on operant learning was described by our wonderful trainer Vladimir Durov in his book “Animal Training. Psychological observations on animals trained according to my method. 40 years of experience.” Thus, you can read about the Russian version of operant training in the book by Vladimir Durov, and the American version of operant training is well described in the book “Don’t growl at the dog!” by psychologist and trainer Karen Pryor, which, by the way, I also advise you to read.
Skinner’s general method of operant training can be described in the following steps:
stage of deprivation. This is what Skinner called this stage in the 30s. However, now this stage should be called “the stage of choosing and creating a basic need.”
When forming an operant conditioned reflex, almost all of the needs known to dogs can be used, but Skinner used the food need more often. And the meaning of the deprivation stage was that Skinner either underfed the animals for a while, or starved them. It was believed that food reinforcement only became significant for the animal and effective for learning when this animal lost about 20% of its live weight. Oh times, oh manners!
The stage of formation of conditioned food reinforcement. In his research, Skinner used automatic feeders, the sound of which was supposed to be a signal to animals for the appearance of a feed pellet. And this took time. The stage was considered completed when, in response to the sound of the feeder, the rat immediately ran to the feeder.
In fact, this stage is the formation of a classical conditioned sound reflex with food reinforcement. It also serves as the basis for the so-called clicker training – a training method using conditioned sound food positive reinforcement.
And we have to admit that the school of operant training distinguishes favorably from domestic traditional training by the attention that operant training pays to the issue of reinforcement. Especially positive and probabilistic reinforcement.
Stage of reaction formation. As a model behavior, Skinner trained his rats to press the pedal and his pigeons to peck the key. The formation of the reaction of pressing the pedal was carried out in one of three ways: by trial and error (spontaneous formation), by directed or sequential formation and by the target method.
Spontaneous formation consisted in the fact that the animal, traveling through the Skinner box, accidentally pressed the pedal and gradually associated pressing it with the inclusion of the automatic feeder.
During directional formation, the researcher switched on the automatic feeder, first reinforcing any orientation towards the pedal, then approaching it, and finally pressing it. Why not clicker training!
And the target method was that a pellet of food was glued onto the key, attempts to tear it off led to pressing the lever.
The modern method of operant training for initiating the desired behavior allows the use of almost all known methods of influencing the animal. However, it is considered ineffective to use aversive (leading to pain or discomfort) effects.
Bringing behavior under stimulus control or introducing a differentiating stimulus. In other words, the introduction of a conditioned stimulus or command.
Skinner and his supporters believed that the formation of an action and the simultaneous parallel development of its connection with a conditioned stimulus (command) are two different processes. And the simultaneous assimilation of two different things complicates learning. Therefore, traditional operants first form the behavior, and then enter the command.
It should be emphasized that in operant learning, a differentiating stimulus is by and large not a command in our understanding. A team is like an order, isn’t it? We usually interpret it this way. A differentiating stimulus is information that right now the execution of a behavior is most effective and generally possible. Thus, the “command” in operant training has the function of allowing and allowing the behavior to be performed.
To make it clearer, let us analyze the introduction of a light bulb into the experiment as a differentiating stimulus. So, the rat has learned to press the pedal and presses it when it wants to eat. The researcher turns on the light for a couple of seconds and creates conditions under which pressing the pedal only when the light is on leads to the feed supply. And when the light turns off, no matter how much you press, you will have a combination of three fingers! That is, the inclusion of a light bulb creates, separates, distinguishes, differentiates different conditions. And the rat soon begins to understand. And since she really wants to eat (she has a food need!), Then, when she sees the light bulb on, she immediately runs to the pedal and, well, press it! From the outside, it seems that the switched on light bulb makes the rat, orders it to press the pedal. But now you understand that it is not so. When the light comes on, it says: Now you can press the pedal. But only!
Reinforcing behavior. Consolidation of the formed behavior to the skill is carried out by repetition using probabilistic reinforcement. It is also useful to use different needs for this and, accordingly, apply different reinforcements.
The domestic version of the operant method of training, originating from Vladimir Durov, differs only in that it allows you to immediately introduce an executive stimulus (command, differentiating stimulus, conditioned stimulus). Practice shows that a skill is formed no slower than with an imported technique. And since it allows you to eliminate a whole step, it saves time. So it makes sense to support the domestic manufacturer of training techniques!
24 September 2019
Updated: 26 March 2020