Psychologist B. F. Skinner saw that classical conditioning is limited to existing behaviors that are reflexively elicited, and it doesn’t account for new behaviors such as riding a bike. He proposed a theory about how such behaviors come about. Skinner believed that behavior is motivated by the consequences we receive for the behavior: the reinforcements and punishments. His idea that learning is the result of consequences is based on the law of effect, which was first proposed by psychologist Edward Thorndike . According to the law of effect , behaviors that are followed by consequences that are satisfying to the organism are more likely to be repeated, and behaviors that are followed by unpleasant consequences are less likely to be repeated (Thorndike, 1911). Essentially, if an organism does something that brings about a desired result, the organism is more likely to do it again. If an organism does something that does not bring about a desired result, the organism is less likely to do it again. An example of the law of effect is in employment. One of the reasons (and often the main reason) we show up for work is because we get paid to do so. If we stop getting paid, we will likely stop showing up—even if we love our job.
Working with Thorndike’s law of effect as his foundation, Skinner began conducting scientific experiments on animals (mainly rats and pigeons) to determine how organisms learn through operant conditioning (Skinner, 1938). He placed these animals inside an operant conditioning chamber, which has come to be known as a “Skinner box” ( [link] ). A Skinner box contains a lever (for rats) or disk (for pigeons) that the animal can press or peck for a food reward via the dispenser. Speakers and lights can be associated with certain behaviors. A recorder counts the number of responses made by the animal.
Watch this brief video clip to learn more about operant conditioning: Skinner is interviewed, and operant conditioning of pigeons is demonstrated.
In discussing operant conditioning, we use several everyday words—positive, negative, reinforcement, and punishment—in a specialized manner. In operant conditioning, positive and negative do not mean good and bad. Instead, positive means you are adding something, and negative means you are taking something away. Reinforcement means you are increasing a behavior, and punishment means you are decreasing a behavior. Reinforcement can be positive or negative, and punishment can also be positive or negative. All reinforcers (positive or negative) increase the likelihood of a behavioral response. All punishers (positive or negative) decrease the likelihood of a behavioral response. Now let’s combine these four terms: positive reinforcement, negative reinforcement, positive punishment, and negative punishment ( [link] ).
Reinforcement | Punishment | |
---|---|---|
Positive | Something is to the likelihood of a behavior. | Something is to the likelihood of a behavior. |
Negative | Something is to the likelihood of a behavior. | Something is to the likelihood of a behavior. |
The most effective way to teach a person or animal a new behavior is with positive reinforcement. In positive reinforcement , a desirable stimulus is added to increase a behavior.
For example, you tell your five-year-old son, Jerome, that if he cleans his room, he will get a toy. Jerome quickly cleans his room because he wants a new art set. Let’s pause for a moment. Some people might say, “Why should I reward my child for doing what is expected?” But in fact we are constantly and consistently rewarded in our lives. Our paychecks are rewards, as are high grades and acceptance into our preferred school. Being praised for doing a good job and for passing a driver’s test is also a reward. Positive reinforcement as a learning tool is extremely effective. It has been found that one of the most effective ways to increase achievement in school districts with below-average reading scores was to pay the children to read. Specifically, second-grade students in Dallas were paid $2 each time they read a book and passed a short quiz about the book. The result was a significant increase in reading comprehension (Fryer, 2010). What do you think about this program? If Skinner were alive today, he would probably think this was a great idea. He was a strong proponent of using operant conditioning principles to influence students’ behavior at school. In fact, in addition to the Skinner box, he also invented what he called a teaching machine that was designed to reward small steps in learning (Skinner, 1961)—an early forerunner of computer-assisted learning. His teaching machine tested students’ knowledge as they worked through various school subjects. If students answered questions correctly, they received immediate positive reinforcement and could continue; if they answered incorrectly, they did not receive any reinforcement. The idea was that students would spend additional time studying the material to increase their chance of being reinforced the next time (Skinner, 1961).
In negative reinforcement , an undesirable stimulus is removed to increase a behavior. For example, car manufacturers use the principles of negative reinforcement in their seatbelt systems, which go “beep, beep, beep” until you fasten your seatbelt. The annoying sound stops when you exhibit the desired behavior, increasing the likelihood that you will buckle up in the future. Negative reinforcement is also used frequently in horse training. Riders apply pressure—by pulling the reins or squeezing their legs—and then remove the pressure when the horse performs the desired behavior, such as turning or speeding up. The pressure is the negative stimulus that the horse wants to remove.
Many people confuse negative reinforcement with punishment in operant conditioning, but they are two very different mechanisms. Remember that reinforcement, even when it is negative, always increases a behavior. In contrast, punishment always decreases a behavior. In positive punishment , you add an undesirable stimulus to decrease a behavior. An example of positive punishment is scolding a student to get the student to stop texting in class. In this case, a stimulus (the reprimand) is added in order to decrease the behavior (texting in class). In negative punishment , you remove an aversive stimulus to decrease behavior. For example, when a child misbehaves, a parent can take away a favorite toy. In this case, a stimulus (the toy) is removed in order to decrease the behavior.
Punishment, especially when it is immediate, is one way to decrease undesirable behavior. For example, imagine your four-year-old son, Brandon, hit his younger brother. You have Brandon write 100 times “I will not hit my brother” (positive punishment). Chances are he won’t repeat this behavior. While strategies like this are common today, in the past children were often subject to physical punishment, such as spanking. It’s important to be aware of some of the drawbacks in using physical punishment on children. First, punishment may teach fear. Brandon may become fearful of the street, but he also may become fearful of the person who delivered the punishment—you, his parent. Similarly, children who are punished by teachers may come to fear the teacher and try to avoid school (Gershoff et al., 2010). Consequently, most schools in the United States have banned corporal punishment. Second, punishment may cause children to become more aggressive and prone to antisocial behavior and delinquency (Gershoff, 2002). They see their parents resort to spanking when they become angry and frustrated, so, in turn, they may act out this same behavior when they become angry and frustrated. For example, because you spank Brenda when you are angry with her for her misbehavior, she might start hitting her friends when they won’t share their toys.
While positive punishment can be effective in some cases, Skinner suggested that the use of punishment should be weighed against the possible negative effects. Today’s psychologists and parenting experts favor reinforcement over punishment—they recommend that you catch your child doing something good and reward her for it.
In his operant conditioning experiments, Skinner often used an approach called shaping. Instead of rewarding only the target behavior, in shaping , we reward successive approximations of a target behavior. Why is shaping needed? Remember that in order for reinforcement to work, the organism must first display the behavior. Shaping is needed because it is extremely unlikely that an organism will display anything but the simplest of behaviors spontaneously. In shaping, behaviors are broken down into many small, achievable steps. The specific steps used in the process are the following:
Shaping is often used in teaching a complex behavior or chain of behaviors. Skinner used shaping to teach pigeons not only such relatively simple behaviors as pecking a disk in a Skinner box, but also many unusual and entertaining behaviors, such as turning in circles, walking in figure eights, and even playing ping pong; the technique is commonly used by animal trainers today. An important part of shaping is stimulus discrimination. Recall Pavlov’s dogs—he trained them to respond to the tone of a bell, and not to similar tones or sounds. This discrimination is also important in operant conditioning and in shaping behavior.
Here is a brief video of Skinner’s pigeons playing ping pong.
It’s easy to see how shaping is effective in teaching behaviors to animals, but how does shaping work with humans? Let’s consider parents whose goal is to have their child learn to clean his room. They use shaping to help him master steps toward the goal. Instead of performing the entire task, they set up these steps and reinforce each step. First, he cleans up one toy. Second, he cleans up five toys. Third, he chooses whether to pick up ten toys or put his books and clothes away. Fourth, he cleans up everything except two toys. Finally, he cleans his entire room.
Rewards such as stickers, praise, money, toys, and more can be used to reinforce learning. Let’s go back to Skinner’s rats again. How did the rats learn to press the lever in the Skinner box? They were rewarded with food each time they pressed the lever. For animals, food would be an obvious reinforcer.
What would be a good reinforce for humans? For your daughter Sydney, it was the promise of a toy if she cleaned her room. How about Joaquin, the soccer player? If you gave Joaquin a piece of candy every time he made a goal, you would be using a primary reinforcer . Primary reinforcers are reinforcers that have innate reinforcing qualities. These kinds of reinforcers are not learned. Water, food, sleep, shelter, sex, and touch, among others, are primary reinforcers. Pleasure is also a primary reinforcer. Organisms do not lose their drive for these things. For most people, jumping in a cool lake on a very hot day would be reinforcing and the cool lake would be innately reinforcing—the water would cool the person off (a physical need), as well as provide pleasure.
A secondary reinforcer has no inherent value and only has reinforcing qualities when linked with a primary reinforcer. Praise, linked to affection, is one example of a secondary reinforcer, as when you called out “Great shot!” every time Joaquin made a goal. Another example, money, is only worth something when you can use it to buy other things—either things that satisfy basic needs (food, water, shelter—all primary reinforcers) or other secondary reinforcers. If you were on a remote island in the middle of the Pacific Ocean and you had stacks of money, the money would not be useful if you could not spend it. What about the stickers on the behavior chart? They also are secondary reinforcers.
Sometimes, instead of stickers on a sticker chart, a token is used. Tokens, which are also secondary reinforcers, can then be traded in for rewards and prizes. Entire behavior management systems, known as token economies, are built around the use of these kinds of token reinforcers. Token economies have been found to be very effective at modifying behavior in a variety of settings such as schools, prisons, and mental hospitals. For example, a study by Cangi and Daly (2013) found that use of a token economy increased appropriate social behaviors and reduced inappropriate behaviors in a group of autistic school children. Autistic children tend to exhibit disruptive behaviors such as pinching and hitting. When the children in the study exhibited appropriate behavior (not hitting or pinching), they received a “quiet hands” token. When they hit or pinched, they lost a token. The children could then exchange specified amounts of tokens for minutes of playtime.
Parents and teachers often use behavior modification to change a child’s behavior. Behavior modification uses the principles of operant conditioning to accomplish behavior change so that undesirable behaviors are switched for more socially acceptable ones. Some teachers and parents create a sticker chart, in which several behaviors are listed ( [link] ). Sticker charts are a form of token economies, as described in the text. Each time children perform the behavior, they get a sticker, and after a certain number of stickers, they get a prize, or reinforcer. The goal is to increase acceptable behaviors and decrease misbehavior. Remember, it is best to reinforce desired behaviors, rather than to use punishment. In the classroom, the teacher can reinforce a wide range of behaviors, from students raising their hands, to walking quietly in the hall, to turning in their homework. At home, parents might create a behavior chart that rewards children for things such as putting away toys, brushing their teeth, and helping with dinner. In order for behavior modification to be effective, the reinforcement needs to be connected with the behavior; the reinforcement must matter to the child and be done consistently.
Time-out is another popular technique used in behavior modification with children. It operates on the principle of negative punishment. When a child demonstrates an undesirable behavior, she is removed from the desirable activity at hand ( [link] ). For example, say that Sophia and her brother Mario are playing with building blocks. Sophia throws some blocks at her brother, so you give her a warning that she will go to time-out if she does it again. A few minutes later, she throws more blocks at Mario. You remove Sophia from the room for a few minutes. When she comes back, she doesn’t throw blocks.
There are several important points that you should know if you plan to implement time-out as a behavior modification technique. First, make sure the child is being removed from a desirable activity and placed in a less desirable location. If the activity is something undesirable for the child, this technique will backfire because it is more enjoyable for the child to be removed from the activity. Second, the length of the time-out is important. The general rule of thumb is one minute for each year of the child’s age. Sophia is five; therefore, she sits in a time-out for five minutes. Setting a timer helps children know how long they have to sit in time-out. Finally, as a caregiver, keep several guidelines in mind over the course of a time-out: remain calm when directing your child to time-out; ignore your child during time-out (because caregiver attention may reinforce misbehavior); and give the child a hug or a kind word when time-out is over.
Remember, the best way to teach a person or animal a behavior is to use positive reinforcement. For example, Skinner used positive reinforcement to teach rats to press a lever in a Skinner box. At first, the rat might randomly hit the lever while exploring the box, and out would come a pellet of food. After eating the pellet, what do you think the hungry rat did next? It hit the lever again, and received another pellet of food. Each time the rat hit the lever, a pellet of food came out. When an organism receives a reinforcer each time it displays a behavior, it is called continuous reinforcement . This reinforcement schedule is the quickest way to teach someone a behavior, and it is especially effective in training a new behavior. Let’s look back at the dog that was learning to sit earlier in the chapter. Now, each time he sits, you give him a treat. Timing is important here: you will be most successful if you present the reinforcer immediately after he sits, so that he can make an association between the target behavior (sitting) and the consequence (getting a treat).
Watch this video clip where veterinarian Dr. Sophia Yin shapes a dog’s behavior using the steps outlined above.
Once a behavior is trained, researchers and trainers often turn to another type of reinforcement schedule—partial reinforcement. In partial reinforcement , also referred to as intermittent reinforcement, the person or animal does not get reinforced every time they perform the desired behavior. There are several different types of partial reinforcement schedules ( [link] ). These schedules are described as either fixed or variable, and as either interval or ratio. Fixed refers to the number of responses between reinforcements, or the amount of time between reinforcements, which is set and unchanging. Variable refers to the number of responses or amount of time between reinforcements, which varies or changes. Interval means the schedule is based on the time between reinforcements, and ratio means the schedule is based on the number of responses between reinforcements.
Reinforcement Schedule | Description | Result | Example |
---|---|---|---|
Fixed interval | Reinforcement is delivered at predictable time intervals (e.g., after 5, 10, 15, and 20 minutes). | Moderate response rate with significant pauses after reinforcement | Hospital patient uses patient-controlled, doctor-timed pain relief |
Variable interval | Reinforcement is delivered at unpredictable time intervals (e.g., after 5, 7, 10, and 20 minutes). | Moderate yet steady response rate | Checking Facebook |
Fixed ratio | Reinforcement is delivered after a predictable number of responses (e.g., after 2, 4, 6, and 8 responses). | High response rate with pauses after reinforcement | Piecework—factory worker getting paid for every x number of items manufactured |
Variable ratio | Reinforcement is delivered after an unpredictable number of responses (e.g., after 1, 4, 5, and 9 responses). | High and steady response rate | Gambling |
Now let’s combine these four terms. A fixed interval reinforcement schedule is when behavior is rewarded after a set amount of time. For example, June undergoes major surgery in a hospital. During recovery, she is expected to experience pain and will require prescription medications for pain relief. June is given an IV drip with a patient-controlled painkiller. Her doctor sets a limit: one dose per hour. June pushes a button when pain becomes difficult, and she receives a dose of medication. Since the reward (pain relief) only occurs on a fixed interval, there is no point in exhibiting the behavior when it will not be rewarded.
With a variable interval reinforcement schedule , the person or animal gets the reinforcement based on varying amounts of time, which are unpredictable. Say that Manuel is the manager at a fast-food restaurant. Every once in a while someone from the quality control division comes to Manuel’s restaurant. If the restaurant is clean and the service is fast, everyone on that shift earns a $20 bonus. Manuel never knows when the quality control person will show up, so he always tries to keep the restaurant clean and ensures that his employees provide prompt and courteous service. His productivity regarding prompt service and keeping a clean restaurant are steady because he wants his crew to earn the bonus.
With a fixed ratio reinforcement schedule , there are a set number of responses that must occur before the behavior is rewarded. Carla sells glasses at an eyeglass store, and she earns a commission every time she sells a pair of glasses. She always tries to sell people more pairs of glasses, including prescription sunglasses or a backup pair, so she can increase her commission. She does not care if the person really needs the prescription sunglasses, Carla just wants her bonus. The quality of what Carla sells does not matter because her commission is not based on quality; it’s only based on the number of pairs sold. This distinction in the quality of performance can help determine which reinforcement method is most appropriate for a particular situation. Fixed ratios are better suited to optimize the quantity of output, whereas a fixed interval, in which the reward is not quantity based, can lead to a higher quality of output.
In a variable ratio reinforcement schedule , the number of responses needed for a reward varies. This is the most powerful partial reinforcement schedule. An example of the variable ratio reinforcement schedule is gambling. Imagine that Sarah—generally a smart, thrifty woman—visits Las Vegas for the first time. She is not a gambler, but out of curiosity she puts a quarter into the slot machine, and then another, and another. Nothing happens. Two dollars in quarters later, her curiosity is fading, and she is just about to quit. But then, the machine lights up, bells go off, and Sarah gets 50 quarters back. That’s more like it! Sarah gets back to inserting quarters with renewed interest, and a few minutes later she has used up all her gains and is $10 in the hole. Now might be a sensible time to quit. And yet, she keeps putting money into the slot machine because she never knows when the next reinforcement is coming. She keeps thinking that with the next quarter she could win $50, or $100, or even more. Because the reinforcement schedule in most types of gambling has a variable ratio schedule, people keep trying and hoping that the next time they will win big. This is one of the reasons that gambling is so addictive—and so resistant to extinction.
In operant conditioning, extinction of a reinforced behavior occurs at some point after reinforcement stops, and the speed at which this happens depends on the reinforcement schedule. In a variable ratio schedule, the point of extinction comes very slowly, as described above. But in the other reinforcement schedules, extinction may come quickly. For example, if June presses the button for the pain relief medication before the allotted time her doctor has approved, no medication is administered. She is on a fixed interval reinforcement schedule (dosed hourly), so extinction occurs quickly when reinforcement doesn’t come at the expected time. Among the reinforcement schedules, variable ratio is the most productive and the most resistant to extinction. Fixed interval is the least productive and the easiest to extinguish ( [link] ).
Skinner (1953) stated, “If the gambling establishment cannot persuade a patron to turn over money with no return, it may achieve the same effect by returning part of the patron’s money on a variable-ratio schedule” (p. 397).
Skinner uses gambling as an example of the power and effectiveness of conditioning behavior based on a variable ratio reinforcement schedule. In fact, Skinner was so confident in his knowledge of gambling addiction that he even claimed he could turn a pigeon into a pathological gambler (“Skinner’s Utopia,” 1971). Beyond the power of variable ratio reinforcement, gambling seems to work on the brain in the same way as some addictive drugs. The Illinois Institute for Addiction Recovery (n.d.) reports evidence suggesting that pathological gambling is an addiction similar to a chemical addiction ( [link] ). Specifically, gambling may activate the reward centers of the brain, much like cocaine does. Research has shown that some pathological gamblers have lower levels of the neurotransmitter (brain chemical) known as norepinephrine than do normal gamblers (Roy, et al., 1988). According to a study conducted by Alec Roy and colleagues, norepinephrine is secreted when a person feels stress, arousal, or thrill; pathological gamblers use gambling to increase their levels of this neurotransmitter. Another researcher, neuroscientist Hans Breiter, has done extensive research on gambling and its effects on the brain. Breiter (as cited in Franzen, 2001) reports that “Monetary reward in a gambling-like experiment produces brain activation very similar to that observed in a cocaine addict receiving an infusion of cocaine” (para. 1). Deficiencies in serotonin (another neurotransmitter) might also contribute to compulsive behavior, including a gambling addiction.
It may be that pathological gamblers’ brains are different than those of other people, and perhaps this difference may somehow have led to their gambling addiction, as these studies seem to suggest. However, it is very difficult to ascertain the cause because it is impossible to conduct a true experiment (it would be unethical to try to turn randomly assigned participants into problem gamblers). Therefore, it may be that causation actually moves in the opposite direction—perhaps the act of gambling somehow changes neurotransmitter levels in some gamblers’ brains. It also is possible that some overlooked factor, or confounding variable, played a role in both the gambling addiction and the differences in brain chemistry.
Although strict behaviorists such as Skinner and Watson refused to believe that cognition (such as thoughts and expectations) plays a role in learning, another behaviorist, Edward C. Tolman , had a different opinion. Tolman’s experiments with rats demonstrated that organisms can learn even if they do not receive immediate reinforcement (Tolman & Honzik, 1930; Tolman, Ritchie, & Kalish, 1946). This finding was in conflict with the prevailing idea at the time that reinforcement must be immediate in order for learning to occur, thus suggesting a cognitive aspect to learning.
In the experiments, Tolman placed hungry rats in a maze with no reward for finding their way through it. He also studied a comparison group that was rewarded with food at the end of the maze. As the unreinforced rats explored the maze, they developed a cognitive map : a mental picture of the layout of the maze ( [link] ). After 10 sessions in the maze without reinforcement, food was placed in a goal box at the end of the maze. As soon as the rats became aware of the food, they were able to find their way through the maze quickly, just as quickly as the comparison group, which had been rewarded with food all along. This is known as latent learning : learning that occurs but is not observable in behavior until there is a reason to demonstrate it.
Latent learning also occurs in humans. Children may learn by watching the actions of their parents but only demonstrate it at a later date, when the learned material is needed. For example, suppose that Ravi’s dad drives him to school every day. In this way, Ravi learns the route from his house to his school, but he’s never driven there himself, so he has not had a chance to demonstrate that he’s learned the way. One morning Ravi’s dad has to leave early for a meeting, so he can’t drive Ravi to school. Instead, Ravi follows the same route on his bike that his dad would have taken in the car. This demonstrates latent learning. Ravi had learned the route to school, but had no need to demonstrate this knowledge earlier.
Have you ever gotten lost in a building and couldn’t find your way back out? While that can be frustrating, you’re not alone. At one time or another we’ve all gotten lost in places like a museum, hospital, or university library. Whenever we go someplace new, we build a mental representation—or cognitive map—of the location, as Tolman’s rats built a cognitive map of their maze. However, some buildings are confusing because they include many areas that look alike or have short lines of sight. Because of this, it’s often difficult to predict what’s around a corner or decide whether to turn left or right to get out of a building. Psychologist Laura Carlson (2010) suggests that what we place in our cognitive map can impact our success in navigating through the environment. She suggests that paying attention to specific features upon entering a building, such as a picture on the wall, a fountain, a statue, or an escalator, adds information to our cognitive map that can be used later to help find our way out of the building.
Watch this video to learn more about Carlson’s studies on cognitive maps and navigation in buildings.
Operant conditioning is based on the work of B. F. Skinner. Operant conditioning is a form of learning in which the motivation for a behavior happens after the behavior is demonstrated. An animal or a human receives a consequence after performing a specific behavior. The consequence is either a reinforcer or a punisher. All reinforcement (positive or negative) increases the likelihood of a behavioral response. All punishment (positive or negative) decreases the likelihood of a behavioral response. Several types of reinforcement schedules are used to reward behavior depending on either a set or variable period of time.
________ is when you take away a pleasant stimulus to stop a behavior.
Which of the following is not an example of a primary reinforcer?
Rewarding successive approximations toward a target behavior is ________.
Slot machines reward gamblers with money according to which reinforcement schedule?
What is a Skinner box and what is its purpose?
A Skinner box is an operant conditioning chamber used to train animals such as rats and pigeons to perform certain behaviors, like pressing a lever. When the animals perform the desired behavior, they receive a reward: food or water.
What is the difference between negative reinforcement and punishment?
In negative reinforcement you are taking away an undesirable stimulus in order to increase the frequency of a certain behavior (e.g., buckling your seat belt stops the annoying beeping sound in your car and increases the likelihood that you will wear your seatbelt). Punishment is designed to reduce a behavior (e.g., you scold your child for running into the street in order to decrease the unsafe behavior.)
What is shaping and how would you use shaping to teach a dog to roll over?
Shaping is an operant conditioning method in which you reward closer and closer approximations of the desired behavior. If you want to teach your dog to roll over, you might reward him first when he sits, then when he lies down, and then when he lies down and rolls onto his back. Finally, you would reward him only when he completes the entire sequence: lying down, rolling onto his back, and then continuing to roll over to his other side.
Explain the difference between negative reinforcement and punishment, and provide several examples of each based on your own experiences.
Think of a behavior that you have that you would like to change. How could you use behavior modification, specifically positive reinforcement, to change your behavior? What is your positive reinforcer?
Operant Conditioning Copyright © 2014 by OpenStaxCollege is licensed under a Creative Commons Attribution 4.0 International License , except where otherwise noted.
CogniFit Blog: Brain Health News
Brain Training, Mental Health, and Wellness
Operant conditioning might sound like something out of a dystopian novel. But it’s not. It’s a very real thing that was forged by a brilliant, yet quirky, psychologist. Today, we will take a quick look at his work as we as a few odd experiments that went with it…
There are few names in psychology more well-known than B. F. Skinner. First-year psychology students scribble endless lecture notes on him. Doctoral candidates cite his work in their dissertations as they test whether a rat’s behavior can be used to predict behavior in humans.
Skinner is one of the most well-known psychologists of our time that was famous for his experiments on operant conditioning. But how did he become such a central figure of these Intro to Psych courses? And, how did he develop his theories and methodologies cited by those sleep-deprived Ph.D. students?
Skinner spent his life studying the way we behave and act. But, more importantly, how this behavior can be modified.
He viewed Ivan Pavlov’s classical model of behavioral conditioning as being “too simplistic a solution” to fully explain the complexities of human (and animal) behavior and learning. It was because of this, that Skinner started to look for a better way to explain why we do things.
His early work was based on Edward Thorndike’s 1989 Law of Effect . Skinner went on to expand on the idea that most of our behavior is directly related to the consequences of said behavior. His expanded model of behavioral learning would be called operant conditioning. This centered around two things…
But, it’s important to note that the term “consequences” can be misleading. This is because there doesn’t need to be a causal relationship between the behavior and the operant. Skinner broke these responses down into three parts.
1. REINFORCERS – These give the organism a desirable stimulus and serve to increase the frequency of the behavior.
2. PUNISHERS – These are environmental responses that present an undesirable stimulus and serve to reduce the frequency of the behavior.
3. NEUTRAL OPERANTS – As the name suggests, these present stimuli that neither increase nor decrease the tested behavior.
Throughout his long and storied career, Skinner performed a number of strange experiments trying to test the limits of how punishment and reinforcement affect behavior.
Though Skinner was a professional through and through, he was also quite a quirky person. And, his unique ways of thinking are very clear in the strange and interesting experiments he performed while researching the properties of operant conditioning.
The Operant Conditioning Chamber, better known as the Skinner Box , is a device that B.F. Skinner used in many of his experiments. At its most basic, the Skinner Box is a chamber where a test subject, such as a rat or a pigeon, must ‘learn’ the desired behavior through trial and error.
B.F. Skinner used this device for several different experiments. One such experiment involves placing a hungry rat into a chamber with a lever and a slot where food is dispensed when the lever is pressed. Another variation involves placing a rat into an enclosure that is wired with a slight electric current on the floor. When the current is turned on, the rat must turn a wheel in order to turn off the current.
Though this is the most basic experiment in operant conditioning research, there is an infinite number of variations that can be created based on this simple idea.
Building on the basic ideas from his work with the Operant Conditioning Chamber, B. F. Skinner eventually began designing more and more complex experiments.
One of these experiments involved teaching a pigeon to read words presented to it in order to receive food. Skinner began by teaching the pigeon a simple task, namely, pecking a colored disk, in order to receive a reward. He then began adding additional environmental cues (in this case, they were words), which were paired with a specific behavior that was required in order to receive the reward.
Through this evolving process, Skinner was able to teach the pigeon to ‘read’ and respond to several unique commands.
Though the pigeon can’t actually read English, the fact that he was able to teach a bird multiple behaviors, each one linked to a specific stimulus, by using operant conditioning shows us that this form of behavioral learning can be a powerful tool for teaching both animals and humans complex behaviors based on environmental cues.
But Skinner wasn’t only concerned with teaching pigeons how to read. It seems he also made sure they had time to play games as well. In one of his more whimsical experiments , B. F. Skinner taught a pair of common pigeons how to play a simplified version of table tennis.
The pigeons in this experiment were placed on either side of a box and were taught to peck the ball to the other bird’s side. If a pigeon was able to peck the ball across the table and past their opponent, they were rewarded with a small amount of food. This reward served to reinforce the behavior of pecking the ball past their opponent.
Though this may seem like a silly task to teach a bird, the ping-pong experiment shows that operant conditioning can be used not only for a specific, robot-like action but also to teach dynamic, goal-based behaviors.
Thought pigeons playing ping-pong was as strange as things could get? Skinner pushed the envelope even further with his work on pigeon-guided missiles.
While this may sound like the crazy experiment of a deluded mad scientist, B. F. Skinner did actually do work to train pigeons to control the flight paths of missiles for the U.S. Army during the second world war.
Skinner began by training the pigeons to peck at shapes on a screen. Once the pigeons reliably tracked these shapes, Skinner was able to use sensors to track whether the pigeon’s beak was in the center of the screen, to one side or the other, or towards the top or bottom of the screen. Based on the relative location of the pigeon’s beak, the tracking system could direct the missile towards the target location.
Though the system was never used in the field due in part to advances in other scientific areas, it highlights the unique applications that can be created using operant training for animal behaviors.
B. F. Skinner is one of the most recognizable names in modern psychology, and with good reason. Though many of his experiments seem outlandish, the science behind them continues to impact us in ways we rarely think about.
The most prominent example is in the way we train animals for tasks such as search and rescue, companion services for the blind and disabled, and even how we train our furry friends at home—but the benefits of his research go far beyond teaching Fido how to roll over.
Operant conditioning research has found its way into the way schools motivate and discipline students, how prisons rehabilitate inmates, and even in how governments handle geopolitical relationships .
Share this post with your friends!
“To say that a reinforcement is contingent upon a response may mean nothing more than that it follows the response. It may follow because of some mechanical connection or because of the mediation of another organism; but conditioning takes place presumably because of the temporal relation only, expressed in terms of the order and proximity of response and reinforcement. Whenever we present a state of affairs which is known to be reinforcing at a given drive, we must suppose that conditioning takes place, even though we have paid no attention to the behavior of the organism in making the presentation.”
– B.F. Skinner, “Superstition’ in the Pigeon” (p. 168)
In the 20th century, many of the images that came to mind when thinking about experimental psychology were tied to the work of Burrhus Frederick Skinner. The stereotype of a bespectacled experimenter in a white lab coat, engaged in shaping behavior through the operant conditioning of lab rats or pigeons in contraptions known as Skinner boxes comes directly from Skinner’s immeasurably influential research.
Although he originally intended to make a career as a writer, Skinner received his Ph.D. in psychology from Harvard in 1931, and stayed on as a researcher until 1936, when he departed to take academic posts at the University of Minnesota and Indiana University. He returned to Harvard in 1948 as a professor, and was the Edgar Pierce Professor of Psychology from 1958 until he retired in 1974.
Skinner was influenced by John B. Watson’s philosophy of psychology called behaviorism, which rejected not just the introspective method and the elaborate psychoanalytic theories of Freud and Jung, but any psychological explanation based on mental states or internal representations such as beliefs, desires, memories, and plans. The very idea of “mind” was dismissed as a pre-scientific superstition, not amenable to empirical investigation. Skinner argued that the goal of a science of psychology was to predict and control an organism’s behavior from its current stimulus situation and its history of reinforcement. In a utopian novel called Walden Two and a 1971 bestseller called Beyond Freedom and Dignity, he argued that human behavior was always controlled by its environment. According to Skinner, the future of humanity depended on abandoning the concepts of individual freedom and dignity and engineering the human environment so that behavior was controlled systematically and to desirable ends rather than haphazardly.
In the laboratory, Skinner refined the concept of operant conditioning and the Law of Effect. Among his contributions were a systematic exploration of intermittent schedules of reinforcement, the shaping of novel behavior through successive approximations, the chaining of complex behavioral sequences via secondary (learned) reinforcers, and “superstitious” (accidentally reinforced) behavior.
Skinner was also an inveterate inventor. Among his gadgets were the “Skinner box” for shaping and counting lever-pressing in rats and key-pecking in pigeons; the cumulative recorder, a mechanism for recording rates of behavior as a pen tracing; a World-War II-era missile guidance system (never deployed) in which a trained pigeon in the missile’s transparent nose cone continually pecked at the target; and “teaching machines” for “programmed learning,” in which students were presented a sentence at a time and then filled in the blank in a similar sentence, shown in a small window. He achieved notoriety for a mid-1950s Life magazine article showcasing his “air crib,” a temperature-controlled glass box in which his infant daughter would play. This led to the urban legend, occasionally heard to this day, that Skinner “experimented on his daughter” or “raised her in a box” and that she grew up embittered and maladjusted, all of which are false.
B.F. Skinner was ranked by the American Psychological Association as the 20th century’s most eminent psychologist.
B. F. Skinner. (1998). Public Broadcasting Service. Retrieved December 12, 2007, from: http://www.pbs.org/wgbh/aso/databank/entries/bhskin.html
Eminent psychologists of the 20th century. (July/August, 2002). Monitor on Psychology, 33(7), p.29.
Skinner, B. F. (1947). ‘Superstition’ in the pigeon. Journal of Experimental Psychology, 38, 168-172.
Skinner, B. F. (1959) Cumulative record. New York: Appleton Century Crofts.
Bjork, D. W. (1991). Burrhus Frederick Skinner: The contingencies of a life. In: Kimble, G. A. & Wertheimer, M. [Eds.] Portraits of Pioneers in Psychology.
Filter: role.
Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.
Learning Objectives
By the end of this section, you will be able to:
The previous section of this chapter focused on the type of associative learning known as classical conditioning. Remember that in classical conditioning, something in the environment triggers a reflex automatically, and researchers train the organism to react to a different stimulus. Now we turn to the second type of associative learning, operant conditioning . In operant conditioning, organisms learn to associate a behaviour and its consequence ( Table L.1 ). A pleasant consequence makes that behaviour more likely to be repeated in the future. For example, Spirit, a dolphin at the National Aquarium in Baltimore, does a flip in the air when Spirit’s trainer blows a whistle. The consequence is that Spirit gets a fish.
Table L.1 Classical and Operant Conditioning Compared | ||
---|---|---|
Classical Conditioning | Operant Conditioning | |
Conditioning approach | An unconditioned stimulus (such as food) is paired with a neutral stimulus (such as a bell). The neutral stimulus eventually becomes the conditioned stimulus, which brings about the conditioned response (salivation). | The target behaviour is followed by reinforcement or punishment to either strengthen or weaken it, so that the learner is more likely to exhibit the desired behaviour in the future. |
Stimulus timing | The stimulus occurs immediately before the response. | The stimulus (either reinforcement or punishment) occurs soon after the response. |
Psychologist B. F. Skinner saw that classical conditioning is limited to existing behaviours that are reflexively elicited, and it doesn’t account for new behaviours such as riding a bike. He proposed a theory about how such behaviours come about. Skinner believed that behaviour is motivated by the consequences we receive for the behaviour: the reinforcements and punishments. His idea that learning is the result of consequences is based on the law of effect, which was first proposed by psychologist Edward Thorndike . According to the law of effect , behaviours that are followed by consequences that are satisfying to the organism are more likely to be repeated, and behaviours that are followed by unpleasant consequences are less likely to be repeated (Thorndike, 1911). Essentially, if an organism does something that brings about a desired result, the organism is more likely to do it again. If an organism does something that does not bring about a desired result, the organism is less likely to do it again. An example of the law of effect is in employment. One of the reasons (and often the main reason) we show up for work is because we get paid to do so. If we stop getting paid, we will likely stop showing up—even if we love our job.
Working with Thorndike’s law of effect as his foundation, Skinner began conducting scientific experiments on animals (mainly rats and pigeons) to determine how organisms learn through operant conditioning (Skinner, 1938). He placed these animals inside an operant conditioning chamber, which has come to be known as a “Skinner box” ( Figure L.10 ). A Skinner box contains a lever (for rats) or disk (for pigeons) that the animal can press or peck for a food reward via the dispenser. Speakers and lights can be associated with certain behaviours. A recorder counts the number of responses made by the animal.
LINK TO LEARNING
In discussing operant conditioning, we use several everyday words—positive, negative, reinforcement, and punishment—in a specialized manner. In operant conditioning, positive and negative do not mean good and bad. Instead, positive means you are adding something, and negative means you are taking something away. Reinforcement means you are increasing a behaviour, and punishment means you are decreasing a behaviour. Reinforcement can be positive or negative, and punishment can also be positive or negative. All reinforcers (positive or negative) increase the likelihood of a behavioural response. All punishers (positive or negative) decrease the likelihood of a behavioural response. Now let’s combine these four terms: positive reinforcement, negative reinforcement, positive punishment, and negative punishment ( Table L.2 ).
Table L.2 Positive and Negative Reinforcement and Punishment | ||
---|---|---|
Reinforcement | Punishment | |
Positive | Something is to the likelihood of a behaviour. | Something is to the likelihood of a behaviour. |
Negative | Something is to the likelihood of a behaviour. | Something is to the likelihood of a behaviour. |
The most effective way to teach a person or animal a new behaviour is with positive reinforcement. In positive reinforcement , a desirable stimulus is added to increase a behaviour.
For example, you tell your five-year-old kid, Karson, that if they clean their room, they will get a toy. Karson quickly cleans their room because they want a new art set. Let’s pause for a moment. Some people might say, “Why should I reward my child for doing what is expected?” But in fact we are constantly and consistently rewarded in our lives. Our paycheques are rewards, as are high grades and acceptance into our preferred school. Being praised for doing a good job and for passing a driver’s test is also a reward. Positive reinforcement as a learning tool is extremely effective. It has been found that one of the most effective ways to increase achievement in school districts with below-average reading scores was to pay the children to read. Specifically, second-grade students in Dallas were paid $2 each time they read a book and passed a short quiz about the book. The result was a significant increase in reading comprehension (Fryer, 2010). What do you think about this program? If Skinner were alive today, he would probably think this was a great idea. He was a strong proponent of using operant conditioning principles to influence students’ behaviour at school. In fact, in addition to the Skinner box, he also invented what he called a teaching machine that was designed to reward small steps in learning (Skinner, 1961)—an early forerunner of computer-assisted learning. His teaching machine tested students’ knowledge as they worked through various school subjects. If students answered questions correctly, they received immediate positive reinforcement and could continue; if they answered incorrectly, they did not receive any reinforcement. The idea was that students would spend additional time studying the material to increase their chance of being reinforced the next time (Skinner, 1961).
In negative reinforcement , an undesirable stimulus is removed to increase a behaviour. For example, car manufacturers use the principles of negative reinforcement in their seatbelt systems, which go “beep, beep, beep” until you fasten your seatbelt. The annoying sound stops when you exhibit the desired behaviour, increasing the likelihood that you will buckle up in the future. Negative reinforcement is also used frequently in horse training. Riders apply pressure—by pulling the reins or squeezing their legs—and then remove the pressure when the horse performs the desired behaviour, such as turning or speeding up. The pressure is the negative stimulus that the horse wants to remove.
Many people confuse negative reinforcement with punishment in operant conditioning, but they are two very different mechanisms. Remember that reinforcement, even when it is negative, always increases a behaviour. In contrast, punishment always decreases a behaviour. In positive punishment , you add an undesirable stimulus to decrease a behaviour. An example of positive punishment is scolding a student to get the student to stop texting in class. In this case, a stimulus (the reprimand) is added in order to decrease the behaviour (texting in class). In negative punishment , you remove a pleasant stimulus to decrease behaviour. For example, when a child misbehaves, a parent can take away a favourite toy. In this case, a stimulus (the toy) is removed in order to decrease the behaviour.
Punishment, especially when it is immediate, is one way to decrease undesirable behaviour. For example, imagine your four-year-old, Sasha, hit another kid. You have Sasha write 100 times “I will not hit other children” (positive punishment). Chances are Sasha won’t repeat this behaviour. While strategies like this are common today, in the past children were often subject to physical punishment, such as spanking. It’s important to be aware of some of the drawbacks in using physical punishment on children. First, punishment may teach fear. Sasha may become fearful of the street, but Sasha also may become fearful of the person who delivered the punishment—you, the parent. Similarly, children who are punished by teachers may come to fear the teacher and try to avoid school (Gershoff et al., 2010). Consequently, most schools in the United States have banned corporal punishment. Second, punishment may cause children to become more aggressive and prone to antisocial behaviour and delinquency (Gershoff, 2002). They see their parents resort to spanking when they become angry and frustrated, so, in turn, they may act out this same behaviour when they become angry and frustrated. For example, because you spank Sasha when you are angry with them for misbehaving, Sasha might start hitting their friends when they won’t share their toys.
While positive punishment can be effective in some cases, Skinner suggested that the use of punishment should be weighed against the possible negative effects. Today’s psychologists and parenting experts favour reinforcement over punishment—they recommend that you catch your child doing something good and reward them for it.
In his operant conditioning experiments, Skinner often used an approach called shaping. Instead of rewarding only the target behaviour, in shaping , we reward successive approximations of a target behaviour. Why is shaping needed? Remember that in order for reinforcement to work, the organism must first display the behaviour. Shaping is needed because it is extremely unlikely that an organism will display anything but the simplest of behaviours spontaneously. In shaping, behaviours are broken down into many small, achievable steps. The specific steps used in the process are the following:
Shaping is often used in teaching a complex behaviour or chain of behaviours. Skinner used shaping to teach pigeons not only such relatively simple behaviours as pecking a disk in a Skinner box, but also many unusual and entertaining behaviours, such as turning in circles, walking in figure eights, and even playing ping pong; the technique is commonly used by animal trainers today. An important part of shaping is stimulus discrimination. Recall Pavlov’s dogs—he trained them to respond to the tone of a bell, and not to similar tones or sounds. This discrimination is also important in operant conditioning and in shaping behaviour.
It’s easy to see how shaping is effective in teaching behaviours to animals, but how does shaping work with humans? Let’s consider a parent whose goal is to have their child learn to clean their room. The parent shaping to help the child master steps toward the goal. Instead of performing the entire task, they set up these steps and reinforce each step. First, the child cleans up one toy. Second, the child cleans up five toys. Third, the child chooses whether to pick up ten toys or put their books and clothes away. Fourth, the child cleans up everything except two toys. Finally, the child cleans their entire room.
Rewards such as stickers, praise, money, toys, and more can be used to reinforce learning. Let’s go back to Skinner’s rats again. How did the rats learn to press the lever in the Skinner box? They were rewarded with food each time they pressed the lever. For animals, food would be an obvious reinforcer.
What would be a good reinforcer for humans? For your child Karson, it was the promise of a toy when they cleaned their room. How about Sydney, the soccer player? If you gave Sydney a piece of candy every time Sydney scored a goal, you would be using a primary reinforcer . Primary reinforcers are reinforcers that have innate reinforcing qualities. These kinds of reinforcers are not learned. Water, food, sleep, shelter, sex, and touch, among others, are primary reinforcers. Pleasure is also a primary reinforcer. Organisms do not lose their drive for these things. For most people, jumping in a cool lake on a very hot day would be reinforcing and the cool lake would be innately reinforcing—the water would cool the person off (a physical need), as well as provide pleasure.
A secondary reinforcer has no inherent value and only has reinforcing qualities when linked with a primary reinforcer. Praise, linked to affection, is one example of a secondary reinforcer, as when you called out “Great shot!” every time Sydney made a goal. Another example, money, is only worth something when you can use it to buy other things—either things that satisfy basic needs (food, water, shelter—all primary reinforcers) or other secondary reinforcers. If you were on a remote island in the middle of the Pacific Ocean and you had stacks of money, the money would not be useful if you could not spend it. What about the stickers on the behaviour chart? They also are secondary reinforcers.
Sometimes, instead of stickers on a sticker chart, a token is used. Tokens, which are also secondary reinforcers, can then be traded in for rewards and prizes. Entire behaviour management systems, known as token economies, are built around the use of these kinds of token reinforcers. Token economies have been found to be very effective at modifying behaviour in a variety of settings such as schools, prisons, and mental hospitals. For example, a study by Cangi and Daly (2013) found that use of a token economy increased appropriate social behaviours and reduced inappropriate behaviours in a group of autistic school children. Autistic children tend to exhibit disruptive behaviours such as pinching and hitting. When the children in the study exhibited appropriate behaviour (not hitting or pinching), they received a “quiet hands” token. When they hit or pinched, they lost a token. The children could then exchange specified amounts of tokens for minutes of playtime.
EVERYDAY CONNECTION
Parents and teachers often use behaviour modification to change a child’s behaviour. Behaviour modification uses the principles of operant conditioning to accomplish behaviour change so that undesirable behaviours are switched for more socially acceptable ones. Some teachers and parents create a sticker chart, in which several behaviours are listed ( Figure L.11 ). Sticker charts are a form of token economies, as described in the text. Each time children perform the behaviour, they get a sticker, and after a certain number of stickers, they get a prize, or reinforcer. The goal is to increase acceptable behaviours and decrease misbehaviour. Remember, it is best to reinforce desired behaviours, rather than to use punishment. In the classroom, the teacher can reinforce a wide range of behaviours, from students raising their hands, to walking quietly in the hall, to turning in their homework. At home, parents might create a behaviour chart that rewards children for things such as putting away toys, brushing their teeth, and helping with dinner. In order for behaviour modification to be effective, the reinforcement needs to be connected with the behaviour; the reinforcement must matter to the child and be done consistently.
Time-out is another popular technique used in behaviour modification with children. It operates on the principle of negative punishment. When a child demonstrates an undesirable behaviour, she is removed from the desirable activity at hand ( Figure L.12 ). For example, say that Paton and their sibling Bennet are playing with building blocks. Paton throws some blocks at Bennet, so you give Paton a warning that they will go to time-out if they do it again. A few minutes later, Paton throws more blocks at Bennet. You remove Paton from the room for a few minutes. When Paton comes back, they don’t throw blocks.
There are several important points that you should know if you plan to implement time-out as a behaviour modification technique. First, make sure the child is being removed from a desirable activity and placed in a less desirable location. If the activity is something undesirable for the child, this technique will backfire because it is more enjoyable for the child to be removed from the activity. Second, the length of the time-out is important. The general rule of thumb is one minute for each year of the child’s age. Sophia is five; therefore, she sits in a time-out for five minutes. Setting a timer helps children know how long they have to sit in time-out. Finally, as a caregiver, keep several guidelines in mind over the course of a time-out: remain calm when directing your child to time-out; ignore your child during time-out (because caregiver attention may reinforce misbehaviour); and give the child a hug or a kind word when time-out is over.
Remember, the best way to teach a person or animal a behaviour is to use positive reinforcement. For example, Skinner used positive reinforcement to teach rats to press a lever in a Skinner box. At first, the rat might randomly hit the lever while exploring the box, and out would come a pellet of food. After eating the pellet, what do you think the hungry rat did next? It hit the lever again, and received another pellet of food. Each time the rat hit the lever, a pellet of food came out. When an organism receives a reinforcer each time it displays a behaviour, it is called continuous reinforcement . This reinforcement schedule is the quickest way to teach someone a behaviour, and it is especially effective in training a new behaviour. Let’s look back at the dog that was learning to sit earlier in the chapter. Now, each time the dog sits, you give the dog a treat. Timing is important here: you will be most successful if you present the reinforcer immediately after the dog sits, so that the dog can make an association between the target behaviour (sitting) and the consequence (getting a treat).
Once a behaviour is trained, researchers and trainers often turn to another type of reinforcement schedule— partial reinforcement. In partial reinforcement , also referred to as intermittent reinforcement, the person or animal does not get reinforced every time they perform the desired behaviour. There are several different types of partial reinforcement schedules ( Table L.3 ). These schedules are described as either fixed or variable, and as either interval or ratio. Fixed refers to the number of responses between reinforcements, or the amount of time between reinforcements, which is set and unchanging. Variable refers to the number of responses or amount of time between reinforcements, which varies or changes. Interval means the schedule is based on the time between reinforcements, and ratio means the schedule is based on the number of responses between reinforcements.
Table L.3 Reinforcement Schedules | |||
---|---|---|---|
Reinforcement Schedule | Description | Result | Example |
Fixed interval | Reinforcement is delivered at predictable time intervals (e.g., after 5, 10, 15, and 20 minutes). | Moderate response rate with significant pauses after reinforcement | Hospital patient uses patient-controlled, doctor-timed pain relief |
Variable interval | Reinforcement is delivered at unpredictable time intervals (e.g., after 5, 7, 10, and 20 minutes). | Moderate yet steady response rate | Checking Facebook |
Fixed ratio | Reinforcement is delivered after a predictable number of responses (e.g., after 2, 4, 6, and 8 responses). | High response rate with pauses after reinforcement | Piecework—factory worker getting paid for every x number of items manufactured |
Variable ratio | Reinforcement is delivered after an unpredictable number of responses (e.g., after 1, 4, 5, and 9 responses). | High and steady response rate | Gambling |
Now let’s combine these four terms. A fixed interval reinforcement schedule is when behaviour is rewarded after a set amount of time. For example, June undergoes major surgery in a hospital. During recovery, June is expected to experience pain and will require prescription medications for pain relief. June is given an IV drip with a patient-controlled painkiller. June’s doctor sets a limit: one dose per hour. June pushes a button when pain becomes difficult, and they receive a dose of medication. Since the reward (pain relief) only occurs on a fixed interval, there is no point in exhibiting the behaviour when it will not be rewarded.
With a variable interval reinforcement schedule , the person or animal gets the reinforcement based on varying amounts of time, which are unpredictable. Say that Tate is the manager at a fast-food restaurant. Every once in a while someone from the quality control division comes to Tate’s restaurant. If the restaurant is clean and the service is fast, everyone on that shift earns a $20 bonus. Tate never knows when the quality control person will show up, so they always tries to keep the restaurant clean and ensures that their employees provide prompt and courteous service. Tate’s productivity regarding prompt service and keeping a clean restaurant are steady because Tate wants their crew to earn the bonus.
With a fixed ratio reinforcement schedule , there are a set number of responses that must occur before the behaviour is rewarded. Reed sells glasses at an eyeglass store, and earns a commission every time they sell a pair of glasses. Reed always tries to sell people more pairs of glasses, including prescription sunglasses or a backup pair, so they can increase their commission. Reed does not care if the person really needs the prescription sunglasses, they just wants the bonus. The quality of what Reed sells does not matter because Reed’s commission is not based on quality; it’s only based on the number of pairs sold. This distinction in the quality of performance can help determine which reinforcement method is most appropriate for a particular situation. Fixed ratios are better suited to optimize the quantity of output, whereas a fixed interval, in which the reward is not quantity based, can lead to a higher quality of output.
In a variable ratio reinforcement schedule , the number of responses needed for a reward varies. This is the most powerful partial reinforcement schedule. An example of the variable ratio reinforcement schedule is gambling. Imagine that Quinn—generally a smart, thrifty person—visits Las Vegas for the first time. Quinn is not a gambler, but out of curiosity they put a quarter into the slot machine, and then another, and another. Nothing happens. Two dollars in quarters later, Quinn’s curiosity is fading, and they are just about to quit. But then, the machine lights up, bells go off, and Quinn gets 50 quarters back. That’s more like it! Quinn gets back to inserting quarters with renewed interest, and a few minutes later they have used up all the gains and is $10 in the hole. Now might be a sensible time to quit. And yet, Quinn keeps putting money into the slot machine because they never know when the next reinforcement is coming. Quinn keeps thinking that with the next quarter they could win $50, or $100, or even more. Because the reinforcement schedule in most types of gambling has a variable ratio schedule, people keep trying and hoping that the next time they will win big. This is one of the reasons that gambling is so addictive—and so resistant to extinction.
In operant conditioning, extinction of a reinforced behaviour occurs at some point after reinforcement stops, and the speed at which this happens depends on the reinforcement schedule. In a variable ratio schedule, the point of extinction comes very slowly, as described above. But in the other reinforcement schedules, extinction may come quickly. For example, if June presses the button for the pain relief medication before the allotted time their doctor has approved, no medication is administered. June is on a fixed interval reinforcement schedule (dosed hourly), so extinction occurs quickly when reinforcement doesn’t come at the expected time. Among the reinforcement schedules, variable ratio is the most productive and the most resistant to extinction. Fixed interval is the least productive and the easiest to extinguish ( Figure L.13 ).
CONNECT THE CONCEPTS
Skinner (1953) stated, “If the gambling establishment cannot persuade a patron to turn over money with no return, it may achieve the same effect by returning part of the patron’s money on a variable-ratio schedule” (p. 397).
Skinner uses gambling as an example of the power of the variable-ratio reinforcement schedule for maintaining behaviour even during long periods without any reinforcement. In fact, Skinner was so confident in his knowledge of gambling addiction that he even claimed he could turn a pigeon into a pathological gambler (“Skinner’s Utopia,” 1971). It is indeed true that variable-ratio schedules keep behaviour quite persistent—just imagine the frequency of a child’s tantrums if a parent gives in even once to the behaviour. The occasional reward makes it almost impossible to stop the behaviour.
Recent research in rats has failed to support Skinner’s idea that training on variable-ratio schedules alone causes pathological gambling (Laskowski et al., 2019). However, other research suggests that gambling does seem to work on the brain in the same way as most addictive drugs, and so there may be some combination of brain chemistry and reinforcement schedule that could lead to problem gambling ( Figure L.14 ). Specifically, modern research shows the connection between gambling and the activation of the reward centres of the brain that use the neurotransmitter (brain chemical) dopamine (Murch & Clark, 2016). Interestingly, gamblers don’t even have to win to experience the “rush” of dopamine in the brain. “Near misses,” or almost winning but not actually winning, also have been shown to increase activity in the ventral striatum and other brain reward centres that use dopamine (Chase & Clark, 2010). These brain effects are almost identical to those produced by addictive drugs like cocaine and heroin (Murch & Clark, 2016). Based on the neuroscientific evidence showing these similarities, the DSM-5 now considers gambling an addiction, while earlier versions of the DSM classified gambling as an impulse control disorder.
In addition to dopamine, gambling also appears to involve other neurotransmitters, including norepinephrine and serotonin (Potenza, 2013). Norepinephrine is secreted when a person feels stress, arousal, or thrill. It may be that pathological gamblers use gambling to increase their levels of this neurotransmitter. Deficiencies in serotonin might also contribute to compulsive behaviour, including a gambling addiction (Potenza, 2013).
It may be that pathological gamblers’ brains are different than those of other people, and perhaps this difference may somehow have led to their gambling addiction, as these studies seem to suggest. However, it is very difficult to ascertain the cause because it is impossible to conduct a true experiment (it would be unethical to try to turn randomly assigned participants into problem gamblers). Therefore, it may be that causation actually moves in the opposite direction—perhaps the act of gambling somehow changes neurotransmitter levels in some gamblers’ brains. It also is possible that some overlooked factor, or confounding variable, played a role in both the gambling addiction and the differences in brain chemistry.
TRICKY TOPIC: SCHEDULES OF REINFORCEMENT
Strict behaviourists like Watson and Skinner focused exclusively on studying behaviour rather than cognition (such as thoughts and expectations). In fact, Skinner was such a staunch believer that cognition didn’t matter that his ideas were considered radical behaviorism . Skinner considered the mind a “black box”—something completely unknowable—and, therefore, something not to be studied. However, another behaviourist, Edward C. Tolman, had a different opinion. Tolman’s experiments with rats demonstrated that organisms can learn even if they do not receive immediate reinforcement (Tolman & Honzik, 1930; Tolman, Ritchie, & Kalish, 1946). This finding was in conflict with the prevailing idea at the time that reinforcement must be immediate in order for learning to occur, thus suggesting a cognitive aspect to learning.
Edward Tolman studied the behaviour of three groups of rats that were learning to navigate through mazes (Tolman & Honzik, 1930). The first group always received a reward of food at the end of the maze but the second group never received any reward. The third group did not receive any reward for the first 10 days an then began receiving rewards on the 11th day of the experimental period. As you might expect when considering the principles of conditioning, the rats in the first group quickly learned to negotiate the maze, while the rats of the second group seemed to wander aimlessly through it. The rats in the third group, however, although they wandered aimlessly for the first 10 days, quickly learned to navigate to the end of the maze as soon as they received food on day 11. By the next day, the rats in the third group had caught up in their learning to the rats that had been rewarded from the beginning. Tolman argued that this was because as the unreinforced rats explored the maze, they developed a cognitive map : a mental picture of the layout of the maze ( Figure 6.15 ). As soon as the rats became aware of the food (beginning on the 11th day), they were able to find their way through the maze quickly, just as quickly as the comparison group, which had been rewarded with food all along. This is known as latent learning : learning that occurs but is not observable in behaviour until there is a reason to demonstrate it.
Latent learning also occurs in humans. Children may learn by watching the actions of their parents but only demonstrate it at a later date, when the learned material is needed. For example, suppose that Zan’s parent drives Zan to school every day. In this way, Zan learns the route from their house to their school, but Zan’s never driven there themselves, so they have not had a chance to demonstrate that they’ve learned the way. One morning Zan’s parent has to leave early for a meeting, so they can’t drive Zan to school. Instead, Zan follows the same route on their bike that Zan’s parent would have taken in the car. This demonstrates latent learning. Zan had learned the route to school, but had no need to demonstrate this knowledge earlier.
Introduction to Psychology & Neuroscience Copyright © 2020 by Edited by Leanne Stevens is licensed under a Creative Commons Attribution 4.0 International License , except where otherwise noted.
Audrey watters.
This is the transcript of the talk I gave at the Tech4Good event I'm at this weekend in Albuquerque, New Mexico. The complete slide deck is here .
I want to talk a little bit about a problem I see – or rather, a problem I see in the “solutions” that some scientists and technologists and engineers seem to gravitate towards. So I want to talk to you about pigeons, operant conditioning, and social control, which I recognize is a bit of a strange and academic title. I toyed with some others:
I spent last week at the Harvard University archives, going through the papers of Professor B. F. Skinner, arguably one of the most important psychologists of the twentieth century. (The other, of course, being Sigmund Freud.)
I don’t know how familiar this group is with Skinner – he’s certainly a name that those working in educational psychology have heard of. I’d make a joke here about software engineers having no background in the humanities or social sciences but I hear Mark Zuckerberg was actually a psych major at Harvard. (So that’s the joke.)
I actually want to make the case this morning that Skinner’s work – behavioral psychology in particular – has had profound influence on the development of computer science, particularly when it comes to the ways in which “programming” has become a kind of social engineering. I’m not sure this lineage is always explicitly considered – like I said, there’s that limited background in or appreciation for history thing your field seems to have got going on.
B. F. Skinner was a behaviorist. Indeed, almost all the American psychologists in the early twentieth century were. Unlike Freud, who was concerned with the subconscious mind, behaviorists like Skinner were interested in – well, as the name suggests – behaviors. Observable behaviors. Behaviors that could be conditioned or controlled.
Skinner’s early work was with animals. As a graduate student at Harvard, he devised the operant conditioning chamber – better known as the Skinner box – that was used to study animal behavior. The chamber provided some sort of response mechanism that the animal would be trained to use, typically by rewarding the animal with food.
During World War II, Skinner worked on a program called Project Pigeon – also known as Project Orcon, short for Organic Control – an experimental project to create pigeon-guided missiles.
The pigeons were trained by Skinner to peck at a target, and they rewarded with food when they completed the task correctly. Skinner designed a missile that carried pigeons which could see the target through the windows. The pigeons would peck at the target; the pecking in turn would control the missile’s tail fins, keeping it on course, via a metal conductor connected to the birds’ beak, transmitting the force of the pecking to the missile’s guidance system. The pigeons’ accuracy, according to Skinner’s preliminary tests: nearly perfect.
As part of their training, Skinner also tested the tenacity of the pigeons – testing their psychological fitness, if you will, for battle. He fired a pistol next to their heads to see if loud noise would disrupt their pecking. He put the pigeons in a pressure chamber, setting the altitude at 10,000 feet. The pigeons were whirled around in a centrifuge meant to simulate massive G forces; they were exposed to bright flashes meant to simulate shell bursts. The pigeons kept pecking. They had been trained, conditioned to do so.
The military canceled and revived Project Pigeon a couple of times, but Skinner’s ideas were never used in combat. “Our problem,” Skinner admitted, “was no one would take us seriously.” And by 1953, the military had devised an electronic system for missile guidance, so animal-guided systems were no longer necessary (if they ever were).
This research was all classified, and when the American public were introduced to Skinner’s well-trained pigeons in the 1950s, there was no reference to their proposed war-time duties. Rather, the media talked about his pigeons that could play ping-pong and piano.
Admittedly, part of my interest in Skinner’s papers at Harvard involved finding more about his research on pigeons. I use the pigeons as a visual metaphor throughout my work. And I could talk to you for an hour, easily, about the birds – indeed, I have given a keynote like that before. But I’m writing a book on the history of education technology, and B. F. Skinner is probably the name best known with “teaching machines” – that is, programmed instruction (pre-computer).
Skinner’s work on educational technology – on teaching and learning with machines – is connected directly, explicitly to his work with animals. Hence my usage of the pigeon imagery. Skinner believed that there was not enough (if any) of the right kind of behavior modification undertaken in schools. He pointed that that students are punished when they do something wrong – that’s the behavioral reinforcement that they receive: aversion. But students are rarely rewarded when they do something right. And again, this isn’t simply about “classroom behavior” – the kind of thing you get a grade for “good citizenship” on (not talking in class or cutting in the lunch line). Learning, to Skinner, was a behavior – and a behavior that needed what he called “contingencies of reinforcement.” These should be positive. They should minimize the chances of doing something wrong – getting the wrong answer, for example. (That’s why Skinner didn’t like multiple choice tests.) The reinforcement should be immediate.
Skinner designed a teaching machine that he said would do all these things – allow the student to move at her own pace through the material. The student would know instantaneously if she had the answer right. (The reward was getting to move on to the next exciting question or concept.) And you can hear all this echoed in today’s education technology designers and developers and school reformers – from Sal Khan and Khan Academy to US Secretary of Education Betsy DeVos. It’s called “personalized learning.” But it’s essentially pigeon training with a snazzier interface.
“Once we have arranged the particular type of consequence called a reinforcement,” Skinner wrote in 1954 in “The Science of Learning and the Art of Teaching,” "our techniques permit us to shape the behavior of an organism almost at will. It has become a routine exercise to demonstrate this in classes in elementary psychology by conditioning such an organism as a pigeon.”
“ …Such an organism as a pigeon .” We often speak of “lab rats” as shorthand for the animals used in scientific experiments. We use the phrase too to describe people who work in labs, who are completely absorbed in performing their tasks again and again and again. In education and in education technology, students are also the subjects of experimentation and conditioning. In Skinner’s framework, they are not “lab rats”; they are pigeons . As he wrote,
…Comparable results have been obtained with pigeons, rats, dogs, monkeys, human children… and psychotic subjects. In spite of great phylogenetic differences, all these organisms show amazingly similar properties of the learning process. It should be emphasized that this has been achieved by analyzing the effects of reinforcement and by designing techniques that manipulate reinforcement with considerable precision. Only in this way can the behavior of the individual be brought under such precise control.
If we do not bring students’ behavior under control, Skinner cautioned, we will find ourselves “losing our pigeon.” The animal will be beyond our control.
Like I said, I’m writing a book. So I can talk at great length about Skinner and teaching machines. But I want folks to consider how behaviorism hasn’t just found its way into education reform or education technology. Indeed, Skinner and many others envisioned that application of operant conditioning outside of the laboratory, outside of the classroom – the usage (past and present) of behavior modification for social engineering is at the heart of a lot of “fixes” that people think they’re doing “for the sake of the children,” or “for the good of the country,” or “to make the world a better place.”
Among the discoveries I made – new to me, not new to the world, to be clear: in the mid–1960s, B. F. Skinner was contacted by the Joseph P. Kennedy Jr. Foundation, a non-profit that funded various institutions and research projects that dealt with mental disabilities. Eunice Kennedy Shriver was apparently interested in his work on operant behavior and child-rearing, and her husband Sargent Shriver who’d been appointed by President Johnson to head the newly formed Office of Economic Opportunity was also keen to find ways to use operant conditioning as part of the War on Poverty.
There was a meeting. Skinner filed a report. But as he wrote in his autobiography, nothing came of it. “A year later,” he added, “one of Shriver’s aides came to see me about motivating the peasants in Venezuela.”
Motivating pigeons or poor people or peasants (or motivating peasants and poor people as pigeons) – it’s all offered, quite earnestly no doubt – as the ways in which science and scientific management will make the world better.
But if nothing else, the application of behavior modification to poverty implies that this is a psychological problem and not a structural one. Focus on the individual and their “mindset” – to use the language that education technology and educational psychology folks invoke these days – not on the larger, societal problems.
I recognize, of course, that you can say “it’s for their own good” – but it involves a great deal of hubris (and often historical and cultural ignorance, quite frankly) to assume that you know what “their own good” actually entails.
You’ll sometimes hear that B. F. Skinner’s theories are no longer in fashion – the behaviorist elements of psychology have given way to the cognitive turn. And with or without developments in cognitive and neuroscience, Skinner’s star had certainly lost some of its luster towards the end of his career, particularly, as many like to tell the story, after Noam Chomsky penned a brutal review of his book Beyond Freedom and Dignity in the December 1971 issue of The New York Review of Books . In the book, Skinner argues that our ideas of freedom and free will and human dignity stand in the way of a behavioral science that can better organize and optimize society.
“Skinner’s science of human behavior, being quite vacuous, is as congenial to the libertarian as to the fascist,” writes Chomsky, adding that “there is nothing in Skinner’s approach that is incompatible with a police state in which rigid laws are enforced by people who are themselves subject to them and the threat of dire punishment hangs over all.”
Skinner argues in Beyond Freedom and Dignity that the goal of behavioral technologies should be to “design a world in which behavior likely to be punished seldom or never occurs” – a world of “automatic goodness.“ We should not be concerned with freedom, Skinner argues – that’s simply mysticism. We should pursue ”effectiveness of techniques of control“ which will ”make the world safer." Or make the world totalitarian, as Chomsky points out.
Building behavioral technologies is, of course, what many computer scientists now do (perhaps what some of you do cough FitBit) – most, I’d say, firmly believing that they’re also building a world of “automatic goodness.” “Persuasive technologies,” as Stanford professor B. J. Fogg calls it. And in true Silicon Valley fashion, Fogg erases the long history of behavioral psychology in doing so: “the earliest signs of persuasive technology appeared in the 1970s and 1980s when a few computing systems were designed to promote health and increase workplace productivity,” he writes in his textbook. His students at his Behavioral Design Lab at Stanford have included Mike Krieger, the co-founder of Instagram, and Tristan Harris, a former Googler, founder of the Center for Humane Technology, and best known figure in what I call the “tech regrets industry” – he’s into “ethical” persuasive technologies now, you see.
Behavior modification. Behavioral conditioning. Behavioral design. Gamification. Operant conditioning. All practices and products and machines that are perhaps so ubiquitous in technology that we don’t see them – we just feel the hook and the urge for the features that reward us for behaving like those Project Pigeon birds pecking away at their target – not really aware of why there’s a war or what’s at stake or that we’re going to suffer and die if this missile runs its course. But nobody asked the pigeons. And even with the best of intentions for pigeons – promising pigeons an end to poverty and illiteracy, nobody asked the pigeons. Folks just assumed that because the smart men at Harvard (or Stanford or Silicon Valley or the US government) were on it, that it was surely right “fix.”
Published 15 Jun 2018
The history of the future of education technology.
An official website of the United States government
The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.
The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.
Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .
Takayuki sakagami.
Department of Psychology, Keio University, 2-15-45 Minato-ku, Tokyo, 108-8345 Japan
Department of Psychology, West Virginia University, Morgantown, WV 26506-6040 USA
We describe an early operant conditioning chamber fabricated by Harvard University instrument maker Ralph Gerbrands and shipped to Japan in 1952 in response to a request of Professor B. F. Skinner by Japanese psychologists. It is a rare example, perhaps the earliest still physically existing, of such a chamber for use with pigeons. Although the overall structure and many of the components are similar to contemporary pigeon chambers, several differences are noted and contrasted to evolutionary changes in this most important laboratory tool in the experimental analysis of behavior. The chamber also is testimony to the early internationalization of behavior analysis.
In 1952, B. F. Skinner arranged with Ralph Gerbrands a shipment of operant conditioning apparatus to two universities in Japan: Tokyo University and Keio University (Skinner, 1983 , p. 38). The two names in Japan that have been most prominently mentioned in the Keio shipment are M. Yokoyama and Takashi Ogawa, who were respectively professor and associate professor of psychology at Keio University at that time. Yokoyama traveled to the USA in 1950, supported by the Government Aid and Relief Fund in Occupied Areas GARIOA Program. During that trip he met, among others, E. G. Boring, who was in the Psychology Department at Harvard, and reportedly was impressed by American social psychology and animal psychology. He may have been introduced to operant conditioning apparatus during that visit, but this has not been corroborated. Asano and Lattal ( 2012 ) noted that both Skinner and Yokoyama thereafter attended an international congress in Stockholm (Skinner, 1951 ) and speculated that that encounter may have been part of the impetus for the apparatus arriving in Japan as well. Professor Masao Tachibana of the University of Tokyo did confirm that a rat chamber delivered to Tokyo University was purchased in April, 1952 for 259,000 Japanese Yen (which corresponds to 717.00 1952 US dollars or 2290.00 2016 US dollars). He also indicated that the chamber was discarded in 2008 (Tachibana, personal communication, March 27, 2015), following the fate of much old research apparatus. Asano and Lattal ( 2012 ) described the cumulative recorder that was shipped to Keio University. Here the other apparatus shoe to that discovery is dropped with a description of the operant chamber for pigeons that accompanied the cumulative recorder.
Early physical examples of operant conditioning chambers for either rats or pigeons are rare. One pre-1950s example of a rat chamber of the type described by Heron and Skinner ( 1939 ) is held in the Department of Psychology at the University of Minnesota. Although Skinner started conducting experiments with pigeons at the University of Minnesota in the days of what has come to be known as Project Pelican (Skinner, 1960 ), other than the Project Pelican apparatus (which is a part of the Smithsonian Museum’s permanent collection) no examples of pre-1950s operant chambers for pigeons have been forthcoming, at the University of Minnesota or anywhere else. The early 1950s operant chamber for pigeons that was shipped to Japan as noted above was recently rediscovered in one of the operant behavior laboratories at Keio University in Tokyo. In the first part of this paper and in the Appendix we describe, in some detail for the historical record, this seminal apparatus for the experimental analysis of behavior. In the second part, we review some early uses of the chamber in Japan, putting it into the historical context of what would become behavior analysis in that country. In the third and final part, we discuss the chamber in the context of the broader history of operant chamber technology. We focus specifically on the differences between this chamber and ones that followed, and some of the implications of these differences for conducting an experimental analysis of behavior.
Figure Figure1 1 shows the exterior of the chamber. The shell is a J. C. Higgins™ ice chest. “J. C. Higgins” was a signature brand of sporting and outdoor equipment sold by the Sears and Roebuck Company of Chicago, IL between 1908 and the early 1960s. Then, as now, such ice chests were popular with picnickers, campers, and other outdoor enthusiasts, as well as with operant conditioners. Indeed, there are operant laboratories around the world that continue to use similar ice chests for chambers. This one is in excellent condition, although, as Fig. 1 shows, some of the brand-name decal has worn off. Its remaining part, however, still is readable. A label attached to the top of the chamber (Fig. 2 ) at some point in its past reads as follows: “Skinner box: It was sent from [donated by] Harvard University. Handle with care.” The chamber, when closed, is sealed except for a ventilation fan.
Exterior view of the operant conditioning chamber for pigeons. The cable resting atop the chamber makes the connection between the chamber work panel and the equipment used to program the contingencies to which the pigeon is exposed
Back side of the chamber showing the ventilation fan and the label indicating the chamber came from Harvard University (see text)
The interior of the chamber is shown in Fig. 3 . The chamber is divided by a 3-mm thick aluminum panel into a work area, where the pigeon is placed, and a service area, where the response key, discriminative stimulus lights, reinforcement dispenser, and plugs to connect the chamber to the control equipment are located. Figure Figure4 4 is a pigeon’s-eye-level view of the face of the work panel. Its most salient features are a single response key located behind the circular opening near the top center of the chamber and a food magazine (also called a food hopper, feeder, or grain dispenser in laboratory jargon) located behind the square opening below the response key. The small dark circles are screw heads, and the dark bar across the middle of the panel is a brace.
Top view of chamber with the lid open. The left photograph shows the electrical components in the service area that translate the programs into specific experimental operations in the lower portion of the photograph and the wooden floor in the work area in the top portion . For orientation purposes, the arrow marks the top of the food storage bin. The right photograph shows the work area, where the pigeon is stationed during an experimental session, in the lower part of the photograph (the wooden floor was removed from the work area in this photograph). The rubber insulation around the top is marked by two arrows , which are joined together on the foam piece atop the work panel
Front view of the work panel. The dark line across the middle of the photograph is a metal bar. The dark material at the top of the work panel is a foam rubber cushion that seals the front side of the chamber for the control equipment behind the work panel face. It is held in place on either side by two pieces of twine. The arrow denotes the electric plug described in the text
Figure Figure5 5 shows two views of the service area. The two most unique and technologically interesting components in this chamber are the response key and the food magazine. The location of the response key is described in the Fig. 5 caption and the key is shown in more detail in Fig. 6 . The center portion of the key, made of white plastic, is suspended such that when it is pecked through the circle on the work-area side (see Fig. 4 ), it closes a small switch, circumscribed by the rectangle in the right photograph of Fig. 6 . Details of the key’s operation are provided in the Appendix , along with further description of the stimulus lights.
The left photograph shows the rear of the work panel, with its electrical and mechanical components. On the left of the work panel is the ( black ) connection box, where the wires come in through the 12-prong male Jones plug ( arrow ) and then are connected to the other electrical components. To its right is the food magazine. Above the food magazine the two lights (housed in black plastic cylinders) used to transilluminate the key are visible, and in front of the lights mounted on the chamber wall and appearing as a partial rectangle is the response key. The photograph on the right shows a side view of the electrical components. The metal bars set at an angle are the support for the work panel. Behind the metal bar, the connection box is seen open, revealing two electromechanical relays and the electrical connections to other locations on the work panel. The male Jones plug protrudes from the lower part of the rear side of the connection box
The left photograph, showing a portion of the rear of the work panel, was taken from slightly above the plane of and to the left of the response key (the black and white partial rectangle is the response key). The two key-light sockets can be seen in the center foreground of the photograph ( shorter arrows ). The longer arrow points to an opaque shield that diffuses the light from the key lights. The right photograph shows the switch for the key, circumscribed by a rectangle, which closes to record a response when the key is operated from the other side of the work panel. The arrow points to the electrical contacts that close to define the response when the key is operated
The food magazine is shown in the left photograph of Fig. 7 . The magazine consists of a frame that supports a vertically mounted storage bin (hopper) emptying into a horizontally mounted tray. At one end of the tray is an opening in its top side and at the other end is a lead counterweight. Reinforcement consists of raising the tray to a small aperture located behind and just below the square opening on the work-area side of the work panel (see Fig. 4 ). This is accomplished by operating an electric motor that turns the cogwheel (medium-length arrows in the left and lower right photographs in Fig. 7 ). The tray lifts when the raised portion of the cogwheel pushes on a lever attached to the tray (longer arrow in the left and lower right photographs in Fig. 7 ) and lowers when the indented portion of the wheel releases the lever.
The left photograph shows the food magazine with its unique cogwheel ( longer arrow ) and lever ( mid-length arrow ) arrangement for raising the food tray to the level of the aperture, to which the pigeon has access. The key lights are marked in this photograph by the shorter arrows . The lower right photograph shows a closer view of the food tray raising mechanism. The lever is marked by the longer arrow and the cogwheel by the shorter one . The upper right photograph shows a slightly later Gerbrands design for a pigeon food magazine. Rather than using a cogwheel arrangement, the food magazine in the right figure uses a solenoid ( longer arrow ) attached to the food tray ( mid-length arrow ) by a spring ( shorter arrow ) to raise the tray to the aperture
Professor (Emeritus) Toshiro Yoshida unpacked the parcel containing both the operant conditioning chamber and the cumulative recorder (Asano & Lattal, 2012 ) when it arrived at Keio from Harvard via sea mail. He told the authors that there were no detailed explanations or instruction manuals accompanying the apparatus, making it difficult to understand the operation of both pieces of equipment. He recalled that the chamber was used first by Sukeo Sugimoto (at that time, a doctoral student in psychology who was supervised by Associate Professor Ogawa), but he did not publish his experiments using it. Both Professor Yoshida and retired Professor Satoko Ohinata (personal communication, 12 May 2015), who authored the experiment described below (and who also was supervised by Ogawa), noted that the box was difficult to use for flexible experimental conditions because the space within the box was so limited. They both also noted that the control equipment (probably electromechanical relays, timers, and counters) for displaying discriminative stimuli and magazine was so modest that the apparatus could only be used for relatively basic contingencies like simple discrimination, reinforcement schedules, or extinction.
The first published experiment based on research conducted using the chamber was conducted by Professor Ohinata ( 1955 ). The English version of the abstract of the paper reads in part:
The present study on the instrumental conditioning of color discrimination by pigeons was undertaken to determine whether the learning was based on absolute or on relative discrimination. It was assumed that if the learning was based upon relative discrimination, the luminance relation of the stimuli would be transferred regardless of their wavelength and, on the other hand, if it was based upon absolute discrimination, pigeons would respond to wavelengths without regards to luminance relations.
The paper otherwise is written in Japanese, with only a few words written using the Latin alphabet; however, the following description of the apparatus is included: “装置: Harvard大学製鳩用Skinner-Box.色光刺戟呈示のための附属装置は特に慶応義塾大学心理学研究室に於いて設備された。” (p 313). The words with Latin letters describe the present chamber and its origin. Professor Ohinata (personal communication, 12 May 2015) reported that the chamber continued to be used for a number of years for various undergraduate, graduate, and faculty research projects at Keio. One of these was conducted and reported by Professor Masaya Sato ( 1963 ) (the first president of the Association for Behavior Analysis International from outside the United States) related to deprivation level and discrimination performance.
The concerns noted above of Professors Ogawa and Ohinata with the limitations of the chamber probably relate to Ogawa’s dedication to comparative psychology. His interests in discrimination and perception were in that context. Thus, the early research involving the chamber was not focused on “operant conditioning” in the sense characterized by the work of Ferster and Skinner ( 1957 ), but rather on experiments and problems related to comparative psychology. The chamber had evolved in the USA to meet the emphasis and special needs of operant conditioning, which was concerned largely with the basic contingencies mentioned above (e.g., Ferster, 1953 ; Ferster & Skinner, 1957 ). It is interesting to consider the possibility that this chamber, along with the cumulative recorder and rat chamber mentioned in the introduction, may have created an environment in which operant conditioning could be shaped in Japan. Given the kinds of research for which the chambers and recorder developed, it seems feasible that, with this apparatus available, research related to problems more typical of the experimental analysis of behavior might have developed through successive approximations as subsequent Japanese psychologists with different research interests came into contact with the apparatus. We know that the cumulative recorder Skinner shipped to Japan became the model for a Japanese-manufactured version of the cumulative recorder (Asano & Lattal, 2008 ). We do not know whether or to what degree the chambers of this shipment became models for construction of other chambers in Japan, but it is not hard to imagine that they did. As time passed there was increasing contact between Japanese and American psychologists of many theoretical orientations, but Skinner’s shipment of apparatus can be considered among the factors leading to the development of Japanese behavior analysis.
The most striking thing about the chamber is its “modernity,” give that is more than 64 years old. With the exception of the response key and food magazine, described above, this chamber could as readily be used in any behavior analysis laboratory today as one built in the past year. Indeed, its functions are identical to its contemporary counterparts. This could, and probably will be, taken by some as evidence that the research methods of the experimental analysis of behavior are too entrenched, stuck in the halcyon days of “operant conditioning” with only a dim future ahead. An alternative perspective, which we prefer, is that Skinner developed a powerful tool when he invented the operant conditioning chamber. Its utility persists, and we have only begun to exploit its potential to enhance our understanding of behavior.
The overarching function of an operant chamber, then and now, is to provide a more or less constant, distraction-free environment in which the interactions between organism and environment can be studied. Such isolation requires that the chamber be ventilated to maintain a constant, comfortable temperature for the animal (see Ferster, 1953 ). This was accomplished by the ventilation fan described above; however, the location of the ventilation fan resulted in it pulling air across the pigeon in the work area and then the service area before exhausting through the fan. This design resulted in more exposure of the mechanical and electrical control and recording devices in the service area to more pigeon dust (the lubricant generated by the pigeon’s feathers) than would occur had the ventilation circulation been reversed. This problem was not always recognized even by later commercial manufacturers, who often similarly placed the ventilation fan as it is in this chamber. A quick perusal of the pigeon chambers in two of the operant laboratories at West Virginia University uncovered four different commercial models of pigeon chambers, three manufactured in the 1960s and 1970s and one after 2010. Tellingly, all were vented as this chamber. By contrast, all of the home-made chambers were vented such that the exhaust was in the work area rather than the service area.
Except for the fan, the chamber is completely isolated from the external environment when it is closed. One consequence of this is that there is no way, short of leaving the lid open, to observe the animal in the chamber. The early rat chamber housed at the University of Minnesota’s Department of Psychology is similarly completely isolated, because it too lacks any means by which the behaving animal can be seen when the chamber is closed. Even though Skinner developed many demonstrations in which the animal was placed in an open environment for all to see (one of these environments is shown in a popular photograph of him, see Skinner, 1979 , photographic display between pages 184 and 185 that is labeled “Demonstrating operant conditioning of a pigeon, Indiana, 1948”). The balance between the risk of disturbing the animal while working and the need to see what the animal actually is doing was later resolved in the construction of pigeon and rat chambers by including a means for observing the animal when the experiment was in progress. In earlier days it might have been a glass- or plastic-covered aperture with a curtain over it that could be lifted to allow the experimenter to watch the pigeon. Or it could have been a peep hole of the sort found in entry doors of homes and apartments. Today it often is a miniature camera mounted in an unobtrusive spot inside the chamber. The absence of a means of seeing the subject directly in these early chambers made it impossible to observe behavior other than the recorded operant. The absence of an observation port early in the history of operant conditioning may have contributed to an unfortunate behavioral precedent for some experimenters and laboratories of not only ignoring, but perhaps even dismissing, observational data as too subjective and of limited value. Although there have been many demonstrations to the contrary (e.g., Laties, Weiss, Clark, & Reynolds, 1965 ; Staddon & Simmelhag, 1971 ), precedents sometimes are hard to undo. Perhaps our science would be further along had some of the time spent mesmerized by cumulative records been spent looking through observation ports and peep holes to see what was going on that was not always reflected in those cumulative records.
Cumulative records were created by routing the electrical impulses generated when a response key was activated to a cumulative recorder (Lattal, 2004 ). To do this, a switch closure was required. That switch was the response key. Close inspection of the actual switch circumscribed by the rectangle in the right photograph of Fig. 6 reveals it to be a normally open switch (arrow). This means that operation of the key created an electrical pulse that in turn could be translated to a standard duration (usually 50 ms) and then counted with an electromechanical counting device and/or routed to the cumulative recorder to “step” the response pen one unit with each switch operation (Lattal, 2004 ). The electrical response pulse also operated programming devices that delivered the reinforcer. It takes longer to close a switch than it does to open one. Skipping the electronic details, suffice to say that normally open circuits soon were discovered to simply not be fast enough to accurately capture the pecks operating the key switch. So, at some point normally open response keys like the one on the present chamber were replaced by response keys that operated when a circuit was broken rather than “made.” This change greatly improved the capture of responses by the electromechanical circuitry; making the obtained data more closely reflect the actual key pecking of the pigeon (though still not always with 100 % accuracy). Contemporary use of touch-screens to record pecking responses of pigeons, when they work properly (a caveat for any item of equipment, of course), allow recording of the location of the response relative to the target area. So-called “off key” pecks occurring when a switch-type response key is used are lost without special techniques to capture them (e.g., Dunham, Mariner, & Adams, 1969 ). Modern response key technology also can allow for the possibility of capturing variations in response force (ordinary response keys that are switches require a minimum force and cannot differentiate between force requirements above or below that limit). The upshot of this is that the key in this chamber is truly a dinosaur, one of an earlier era that has been extinct for a very long time.
The other unique feature of this chamber noted above, is the food-magazine operation system. By the time Ferster ( 1953 ) published his description of the methods of operant conditioning, food magazines of the sort on this box were, as far as we can tell, extinct. A typical food magazine of the mid 1950s is shown in Ferster and Skinner’s ( 1957 ) Fig. 2 and in the previously described upper right photograph of Fig. 7 previously described. The skeleton of the latter and the one on the work panel of this chamber are almost identical. Both consist of a food storage bin that releases grain into a food tray. The food tray is connected to a device to raise the food tray to an aperture at the base of a chute through which the pigeon could stick its beak and obtain a few bits of food. The present food magazine accomplished the raising of the food tray by the cogwheel mechanism as described above. The one shown in Ferster and Skinner’s Fig. 2 and in the upper right photograph of Fig. 7 above accomplish the raising by activating a solenoid attached to the food tray by a spring. This activation pulls the food tray up into position such that access to the food through the aperture is possible. We can only speculate as to the reasons for the demise of the cogwheel mechanism. One possibility is that it was too large. The motor that operates the cam is bulky and covers most of the area above the food tray, but it does not seem to obstruct anything. A second possibility is that the cogwheels did not operate reliably. There is no evidence one way or the other on this. The cogwheel on the present food magazine appears to be quite sturdy and, when operated manually, raised and lowered the food magazine with precision. A third possibility is that the motor operation was not sufficiently loud to result in reliable eating. That is, the raising of the food magazine did not function as a conditioned reinforcer. Iversen (personal communication, 2013) found with contemporary “silent operation” pellet dispensers for use with rats that the rat often left the food pellets in the tray after they were delivered. Only when he added a sound that occurred simultaneously with the dispenser operation did the rats rapidly approach the food cup and consume the pellet. A silent feeder could be an even greater problem with pigeons, because, unlike a pellet that stays in the food cup after it is delivered, grain is available only so long as the food tray is raised. The effect of a silent motor, however, would be compensated for by the fact that there is a light above the food aperture that presumably operated when the magazine motor was operated. A fourth possibility is that changing the duration of the reinforcement cycle would have required changing the cogwheel, because the magazine is raised so long as the upper portion of the cogwheel is in contact with the lever. Thus, reinforcement duration is fixed by the length of the upper portion of the cogwheel such that changing reinforcement magnitude would require replacing the cogwheel with one configured another way. A final possibility is that solenoids were cheaper than the cogwheels, which, as noted, required an electric motor to operate. The solenoid, however, required an independent timing device external to the chamber to hold current on the solenoid throughout the reinforcement cycle. Thus, whatever monetary savings there was in using the solenoid may have been offset by the need for an external timer. It is difficult to assess after the fact which, if any of these factors contributed to the switch to solenoids. Whatever the reason, the solenoid has been an enduring feature of pigeon food magazines up from their first use in the 1950s to today.
Solenoids in early chambers typically were operated by a 110-v AC current. Indeed, most of the early electromechanical programming equipment operated off of this high voltage (Catania, 2002 ; Dinsmoor, 1990 ). These solenoids created no problems with most electromechanical circuitry of the era of this chamber, but when, beginning in the early 1960s, transistorized circuitry began replacing or complementing electromechanical equipment in operant conditioning laboratories, problems arose because of the electrical interference created by the operating and deactivating of these high-voltage solenoids. This interference caused transistors to operate at unscheduled times, thereby disrupting programming and recording equipment. These relatively high-voltage solenoids eventually were replaced by ones that were operated by a 28-v DC current, which generally did not disrupt sensitive equipment. These remain the standard today.
We also should comment on the lights for transilluminating the response keys, which thus served as discriminative stimuli. We could not discern whether the lights were operated by a 110-v alternating or 28-v direct current. The generation of chambers in the era of Ferster and Skinner ( 1957 ) commonly were equipped with low-wattage 110 v AC Christmas tree lights as the discriminative stimuli. These lights were used because 28 v DC lights used to transilluminate response keys would flicker due to voltage fluctuations when the key was pecked and recorded or even when reinforcers were “set up” by the electromechanical equipment used to control the experiment. The result of this could be to provide a reliable visual cue as to the availability of a reinforcer, with the effect of undermining the experiment. Low-voltage (28 v DC) bulbs came into use only when it was feasible to operate them from a second power supply that operated independently of the one controlling the relay programming apparatus. Enclosed “pilot lights” located directly behind the key—typically much closer than the distance between the lights in this chamber and the back of the response key—offered some protection of the lights from the fine covering of pigeon dust described above. The most contemporary device for presenting discriminative stimuli is a computer screen, which offers the investigator almost unlimited control over the type, and location, of these stimuli.
Over time, this pigeon chamber found its way to the back of a shelf in an operant laboratory at Keio University, where it lay fallow until it recently was retrieved by the first author after an inquiry by the second. It is a truly rare item and as such is an important part of the collective heritage of our discipline. Beyond its obvious significance in the history of the experimental analysis of behavior, the chamber also is testimony to the strong international connections between behavior analysts, exemplified by the one between the early Japanese behavior analysts and Skinner that helped bring mid-twentieth century cutting-edge behavioral research apparatus to Japan.
Details of the chamber omitted from the general description in the section above labeled “ The Chamber ” appear in this Appendix.
The exterior dimensions of the aluminum chamber are 56 cm long by 41 cm high by 33 cm wide. There are latches at either end (part of the latch on the work-area end is missing), centered on the short sides of the chest. Except for the attachment of a ventilation fan and a single aperture to accommodate the electrical cable, the chest otherwise looks like any other of this product line. The connecting cable that protrudes from the work panel through the rear wall appears to be original. It is 180 cm long, excluding the male Jones plugs (12 prong) connected to either end. The connector at the end distal to the chamber was attached to either directly to the programming apparatus or to a female connector, which in turn attached to another cable that connected to the programming apparatus that controlled the contingencies to which the pigeon in the chamber was to be exposed. The cable covering is of a heavy fabric, rather than the later plastic cable coatings/coverings. The ventilation fan (shown in Fig. 2 on the rear long side of the chamber) is powered from a plug that connects through the ice chest wall to another plug located on the back side of the work panel. The fan housing is 11.5 cm long by 8 cm high, and protrudes 9.5 cm from the outer wall. It is powered by a 110 v AC motor, manufactured by Fasco Industries of Rochester, NY (model number 507451N (the last letter is slightly marred, so it could be another letter). The opening for the fan on the inside of the chamber is 15.5 cm from the top and 9.5 cm from the rear wall of the service area. The hinges for the chamber lid are located on either end of the lid, as can be seen in the right photograph of Fig. 3 . Attached around the inside perimeter of the lip of the ice chest is a rubber gasket (indicated by two arrows in the right photograph of Fig. 3 ), which has come unattached in several places, but is not deteriorated.
The inside of the chamber is 50.5 cm long by 28 cm wide by 33 cm high. It is divided by an aluminum panel (hereafter, the work panel) into a work area (where the pigeon is placed), shown at the top of the left photograph and at the bottom of the right photograph of the chamber in Fig. 3 , and a service area, shown at the bottom of the left photograph and the top of the right photograph in this figure. The work area measures 32.5 cm long by 28 cm wide by 33 cm high and the service area 18 cm long by 28 cm wide by 33 cm high. The opening for the ventilation fan is on the right wall of the service area (when viewing the rear of the work panel from the service area), as shown in the left photograph of Fig. 3 . The floor of the work area is covered by a piece of wood, raising the work area by 3.8 cm, but it could not be determined whether this was part of the original design or was added later. We speculate that it may have been added in Japan because Japanese pigeons may not have been as tall as the ones used in the USA. If so, this may have made it more difficult for the pigeons to reach the response key.
The work panel, shown in Figs. 4 and and5, 5 , is 27.2 cm wide by 32 cm high. Figures 4 shows a piece of black foam rubber (somewhat deteriorated, and difficult to determine whether original or added later) across the top such that it covers the small space that otherwise would exist at the top of the work panel between the work and service areas of the chamber (thus accounting for the difference in the chamber height and the work panel height). The foam is seen most clearly in the left photograph of Fig. 3 , where the twine holding it onto the work panel also can be seen. The response key is located behind a 7.1 cm diameter opening, the center of which is about 26.8 from the chamber floor (23 cm from the wooden platform floor), on the midline (13.6 cm from the left wall) of the work panel, shown in Fig. 4 . Below it is the food magazine aperture through which grain can be accessed. This aperture is 5.2 cm high by 5.6 cm wide, with its center also on the midline of the panel (13.6 cm from the left wall) and about 9.7 cm from the chamber floor (5.9 cm. from the floor of the wooden platform).
There is no means of providing general illumination through devices built into the panel; however, there is a small two-prong electric plug in the top right corner of the work panel (Fig. 4 , arrow), with an unconnected wire attached to it. Inside the chamber in the work area are two candelabra type 100 v lamp holders (one of the holders contained a 110 v bulb) placed unattached on the floor directly below the loose wires. These can be seen in Fig. 3 in the upper right corner of the work area shown in the right photograph. The wires connected to the two candelabra bases appear to be old, but it cannot be determined whether they and the bases were original or not. The insulation on the wires is not plastic; rather, they are of the same fabric material as the afore-described cable wire, seemingly revealing something of its age. Whether these constituted a houselight for general illumination of the work area is not known.
Figure Figure5 5 shows rear (left photograph) and side (right photograph) views of the control side of the work panel. The single sheet of aluminum that comprises the panel is bent at a 90° angle such that its base covers the floor of the control area. The panel is braced by (now) rusty iron bars set at an angle and attached to the side lip of the work panel and its base, apparently to prevent the panel from coming out of position in the chamber as the pigeon pecks the key (there are no grooves for holding the work panel or other means of stabilizing it in the chamber). A similar, but horizontal, iron bar braces the panel from the work-area side, as noted in the “ The Chamber ” section above. The wiring and solder connections on the control panel appear to be original, although it is difficult to determine whether some of the connections have been re-soldered. Many of the individual wires leading to various components, however, are bundled with a wire binder holding them together and they appear to be unmodified over the years of the chamber’s residence at Keio. Located on the work panel are a connection box to which the cable connects, a response key, two stimulus lights used to transilluminate the response key, and a device for delivering mixed grain through the aforementioned aperture on the work-area side of the work panel.
A metal connection box is located in the lower left corner of the control side of the metal panel (viewing from the control area; see the left and right photographs of Fig. 5 ). The connector cord (connecting the box to the programming and recording equipment) is plugged into the box through a male Jones plug visible at the lower rear of the box. The box contains two 2-pole double-throw relays, function unknown. These relays sometimes were used to channel power to lights or food magazines. Some of the wires from the male 12-prong Jones plug connector go through these relays, but other wires go directly from the Jones plug connector to the various components.
The response key, shown in Fig. 6 , is composed of a 6.5 cm square piece of thin black plastic on which is mounted a piece of white opaque plastic (6.4 cm high by 4 cm wide). The unit is located behind the circular opening in the work panel. The white plastic piece is unhinged and can move off its fixed location in four directions. There appears to be a small spring attached to the bracket at the top of the key assembly that holds the white, moveable portion of the key in place and ensures the return of the key to its neutral position at the end of each peck. The face of the key (the pecking surface) is recessed about 3 mm from the face of the work panel. The force requirement of the key does not appear to be adjustable. The key operation is described in the “ The Chamber ” section.
As noted above, the two stimulus lamps are shown in the left photographs of both Figs. 6 and and7 7 as the two black cylinders (marked by the short arrows, above the food magazine in Fig. 7 ). The one on the right (from the rear of the work panel) is placed above the plane passing through the center of the key aperture and the one on the left is placed below this plane. Their location is precise and they do not appear to have been added later, suggesting that this arrangement was part of the original chamber design, although some of the wires connected to the lights may have been cut and re-soldered. It was difficult to determine by visual inspection whether these jewel lamps (pilot lamps) were 24 v DC or 110 v AC. Twenty cm in front of the lamps is a piece of frosted glass (Fig. 6 , left photograph, longer arrow), perhaps used to diffuse the light coming from the key lights and diffuse it evenly across the response key. The frosted glass is 65 mm behind the response key.
The food magazine is shown in the left photograph of Fig. 7 . It is located behind the square aperture on the work-area side of the work panel. The cogwheel and lever are shown in the lower right photograph of Fig. 7 . The details of its operation, and a comparison of it with the later Gerbrands model shown in the upper right photograph of Fig. 7 are described in the “ The Chamber ” section above. The cogwheel is rotated by a 110-v AC synchronous motor, which is not readily visible because of the cogwheel. We could not determine whether the motor operates from a single pulse and continues to operate through a reinforcement cycle or whether continuous application of current to the motor is required to ensure raising and lowering of the food tray. There is a light above the feeder aperture that presumably illuminates with the operation of the magazine.
Author’s note
The second author’s participation in this project resulted from his receipt of a Global Professorship from Keio University.
Takayuki Sakagami, Email: [email protected] .
Kennon A. Lattal, Email: ude.uvw@lattalk .
Principles of operant conditioning.
Read this text, which discusses the definition of operant conditioning, describes the difference between reinforcement and punishment , and introduces reinforcement schedules. Make sure you can respond to these questions. What is a Skinner box, and what is its purpose? What is the difference between negative reinforcement and punishment? What is shaping, and how would you use shaping to teach a dog to roll over?
The previous section of this chapter focused on the type of associative learning known as classical conditioning. Remember that in classical conditioning, something in the environment triggers a reflex automatically, and researchers train the organism to react to a different stimulus. Now we turn to the second type of associative learning, operant conditioning . In operant conditioning, organisms learn to associate a behavior and its consequence (Table 6.1). A pleasant consequence makes that behavior more likely to be repeated in the future. For example, Spirit, a dolphin at the National Aquarium in Baltimore, does a flip in the air when her trainer blows a whistle. The consequence is that she gets a fish.
Classical Conditioning | Operant Conditioning | |
---|---|---|
Conditioning approach | An unconditioned stimulus (such as food) is paired with a neutral stimulus (such as a bell). The neutral stimulus eventually becomes the conditioned stimulus, which brings about the conditioned response (salivation). | The target behavior is followed by reinforcement or punishment to either strengthen or weaken it, so that the learner is more likely to exhibit the desired behavior in the future. |
Stimulus timing | The stimulus occurs immediately before the response. | The stimulus (either reinforcement or punishment) occurs soon after the response. |
Psychologist B. F. Skinner saw that classical conditioning is limited to existing behaviors that are reflexively elicited, and it doesn't account for new behaviors such as riding a bike. He proposed a theory about how such behaviors come about. Skinner believed that behavior is motivated by the consequences we receive for the behavior: the reinforcements and punishments. His idea that learning is the result of consequences is based on the law of effect, which was first proposed by psychologist Edward Thorndike.
According to the law of effect , behaviors that are followed by consequences that are satisfying to the organism are more likely to be repeated, and behaviors that are followed by unpleasant consequences are less likely to be repeated. Essentially, if an organism does something that brings about a desired result, the organism is more likely to do it again. If an organism does something that does not bring about a desired result, the organism is less likely to do it again. An example of the law of effect is in employment. One of the reasons (and often the main reason) we show up for work is because we get paid to do so. If we stop getting paid, we will likely stop showing up - even if we love our job.
Working with Thorndike's law of effect as his foundation, Skinner began conducting scientific experiments on animals (mainly rats and pigeons) to determine how organisms learn through operant conditioning. He placed these animals inside an operant conditioning chamber, which has come to be known as a "Skinner box" (Figure 6.10). A Skinner box contains a lever (for rats) or disk (for pigeons) that the animal can press or peck for a food reward via the dispenser. Speakers and lights can be associated with certain behaviors. A recorder counts the number of responses made by the animal.
Figure 6.10 (a) B. F. Skinner developed operant conditioning for systematic study of how behaviors are strengthened or weakened according to their consequences. (b) In a Skinner box, a rat presses a lever in an operant conditioning chamber to receive a food reward.
In discussing operant conditioning, we use several everyday words - positive, negative, reinforcement, and punishment - in a specialized manner. In operant conditioning, positive and negative do not mean good and bad. Instead, positive means you are adding something, and negative means you are taking something away. Reinforcement means you are increasing a behavior, and punishment means you are decreasing a behavior. Reinforcement can be positive or negative, and punishment can also be positive or negative. All reinforcers (positive or negative) increase the likelihood of a behavioral response. All punishers (positive or negative) decrease the likelihood of a behavioral response. Now let's combine these four terms: positive reinforcement, negative reinforcement, positive punishment, and negative punishment (Table 6.2).
Reinforcement | Punishment | |
---|---|---|
Positive | Something is to the likelihood of a behavior. | Something is to the likelihood of a behavior. |
Negative | Something is to the likelihood of a behavior. | Something is to the likelihood of a behavior. |
The most effective way to teach a person or animal a new behavior is with positive reinforcement. In positive reinforcement , a desirable stimulus is added to increase a behavior. For example, you tell your five-year-old son, Jerome, that if he cleans his room, he will get a toy. Jerome quickly cleans his room because he wants a new art set. Let's pause for a moment. Some people might say, "Why should I reward my child for doing what is expected?" But in fact we are constantly and consistently rewarded in our lives. Our paychecks are rewards, as are high grades and acceptance into our preferred school.
Being praised for doing a good job and for passing a driver's test is also a reward. Positive reinforcement as a learning tool is extremely effective. It has been found that one of the most effective ways to increase achievement in school districts with below-average reading scores was to pay the children to read. Specifically, second-grade students in Dallas were paid $2 each time they read a book and passed a short quiz about the book. The result was a significant increase in reading comprehension. What do you think about this program? If Skinner were alive today, he would probably think this was a great idea. He was a strong proponent of using operant conditioning principles to influence students' behavior at school. In fact, in addition to the Skinner box, he also invented what he called a teaching machine that was designed to reward small steps in learning - an early forerunner of computer-assisted learning. His teaching machine tested students' knowledge as they worked through various school subjects. If students answered questions correctly, they received immediate positive reinforcement and could continue; if they answered incorrectly, they did not receive any reinforcement. The idea was that students would spend additional time studying the material to increase their chance of being reinforced the next time.
In negative reinforcement , an undesirable stimulus is removed to increase a behavior. For example, car manufacturers use the principles of negative reinforcement in their seatbelt systems, which go "beep, beep, beep" until you fasten your seatbelt. The annoying sound stops when you exhibit the desired behavior, increasing the likelihood that you will buckle up in the future. Negative reinforcement is also used frequently in horse training. Riders apply pressure - by pulling the reins or squeezing their legs - and then remove the pressure when the horse performs the desired behavior, such as turning or speeding up. The pressure is the negative stimulus that the horse wants to remove.
While positive punishment can be effective in some cases, Skinner suggested that the use of punishment should be weighed against the possible negative effects. Today's psychologists and parenting experts favor reinforcement over punishment - they recommend that you catch your child doing something good and reward them for it.
It's easy to see how shaping is effective in teaching behaviors to animals, but how does shaping work with humans? Let's consider parents whose goal is to have their child learn to clean his room. They use shaping to help him master steps toward the goal. Instead of performing the entire task, they set up these steps and reinforce each step. First, he cleans up one toy. Second, he cleans up five toys. Third, he chooses whether to pick up ten toys or put his books and clothes away. Fourth, he cleans up everything except two toys. Finally, he cleans his entire room.
Sometimes, instead of stickers on a sticker chart, a token is used. Tokens, which are also secondary reinforcers, can then be traded in for rewards and prizes. Entire behavior management systems, known as token economies, are built around the use of these kinds of token reinforcers. Token economies have been found to be very effective at modifying behavior in a variety of settings such as schools, prisons, and mental hospitals.
For example, a study by Adibsereshki and Abkenar (2014) found that use of a token economy increased appropriate social behaviors and reduced inappropriate behaviors in a group of eight grade students. Similar studies show demonstrable gains on behavior and academic achievement for groups ranging from first grade to high school, and representing a wide array of abilities and disabilities. For example, during studies involving younger students, when children in the study exhibited appropriate behavior (not hitting or pinching), they received a "quiet hands" token. When they hit or pinched, they lost a token. The children could then exchange specified amounts of tokens for minutes of playtime.
Parents and teachers often use behavior modification to change a child's behavior. Behavior modification uses the principles of operant conditioning to accomplish behavior change so that undesirable behaviors are switched for more socially acceptable ones. Some teachers and parents create a sticker chart, in which several behaviors are listed (Figure 6.11). Sticker charts are a form of token economies, as described in the text. Each time children perform the behavior, they get a sticker, and after a certain number of stickers, they get a prize, or reinforcer. The goal is to increase acceptable behaviors and decrease misbehavior.
Remember, it is best to reinforce desired behaviors, rather than to use punishment. In the classroom, the teacher can reinforce a wide range of behaviors, from students raising their hands, to walking quietly in the hall, to turning in their homework. At home, parents might create a behavior chart that rewards children for things such as putting away toys, brushing their teeth, and helping with dinner. In order for behavior modification to be effective, the reinforcement needs to be connected with the behavior; the reinforcement must matter to the child and be done consistently.
Figure 6.11 Sticker charts are a form of positive reinforcement and a tool for behavior modification. Once this child earns a certain number of stickers for demonstrating a desired behavior, she will be rewarded with a trip to the ice cream parlor.
Time-out is another popular technique used in behavior modification with children. It operates on the principle of negative punishment. When a child demonstrates an undesirable behavior, they are removed from the desirable activity at hand (Figure 6.12). For example, say that Sophia and her brother Mario are playing with building blocks. Sophia throws some blocks at her brother, so you give her a warning that she will go to time-out if she does it again. A few minutes later, she throws more blocks at Mario. You remove Sophia from the room for a few minutes. When she comes back, she doesn't throw blocks.
There are several important points that you should know if you plan to implement time-out as a behavior modification technique. First, make sure the child is being removed from a desirable activity and placed in a less desirable location. If the activity is something undesirable for the child, this technique will backfire because it is more enjoyable for the child to be removed from the activity. Second, the length of the time-out is important.
The general rule of thumb is one minute for each year of the child's age. Sophia is five; therefore, she sits in a time-out for five minutes. Setting a timer helps children know how long they have to sit in time-out. Finally, as a caregiver, keep several guidelines in mind over the course of a time-out: remain calm when directing your child to time-out; ignore your child during time-out (because caregiver attention may reinforce misbehavior); and give the child a hug or a kind word when time-out is over.
Figure 6.12 Time-out is a popular form of negative punishment used by caregivers. When a child misbehaves, they are removed from a desirable activity in an effort to decrease the unwanted behavior. For example, (a) a child might be playing on the playground with friends and push another child; (b) the child who misbehaved would then be removed from the activity for a short period of time.
Remember, the best way to teach a person or animal a behavior is to use positive reinforcement. For example, Skinner used positive reinforcement to teach rats to press a lever in a Skinner box. At first, the rat might randomly hit the lever while exploring the box, and out would come a pellet of food. After eating the pellet, what do you think the hungry rat did next? It hit the lever again, and received another pellet of food. Each time the rat hit the lever, a pellet of food came out. When an organism receives a reinforcer each time it displays a behavior, it is called continuous reinforcement .
This reinforcement schedule is the quickest way to teach someone a behavior, and it is especially effective in training a new behavior. Let's look back at the dog that was learning to sit earlier in the chapter. Now, each time he sits, you give him a treat. Timing is important here: you will be most successful if you present the reinforcer immediately after he sits, so that he can make an association between the target behavior (sitting) and the consequence (getting a treat).
Once a behavior is trained, researchers and trainers often turn to another type of reinforcement schedule - partial reinforcement. In partial reinforcement , also referred to as intermittent reinforcement, the person or animal does not get reinforced every time they perform the desired behavior. There are several different types of partial reinforcement schedules (Table 6.3). These schedules are described as either fixed or variable, and as either interval or ratio. Fixed refers to the number of responses between reinforcements, or the amount of time between reinforcements, which is set and unchanging. Variable refers to the number of responses or amount of time between reinforcements, which varies or changes. Interval means the schedule is based on the time between reinforcements, and ratio means the schedule is based on the number of responses between reinforcements.
Reinforcement Schedule | Description | Result | Example |
---|---|---|---|
Fixed interval | Reinforcement is delivered at predictable time intervals (e.g., after 5, 10, 15, and 20 minutes). | Moderate response rate with significant pauses after reinforcement | Hospital patient uses patient-controlled, doctor-timed pain relief |
Variable interval | Reinforcement is delivered at unpredictable time intervals (e.g., after 5, 7, 10, and 20 minutes). | Moderate yet steady response rate | Checking social media |
Fixed ratio | Reinforcement is delivered after a predictable number of responses (e.g., after 2, 4, 6, and 8 responses). | High response rate with pauses after reinforcement | Piecework—factory worker getting paid for every x number of items manufactured |
Variable ratio | Reinforcement is delivered after an unpredictable number of responses (e.g., after 1, 4, 5, and 9 responses). | High and steady response rate | Gambling |
Figure 6.13 The four reinforcement schedules yield different response patterns. The variable ratio schedule is unpredictable and yields high and steady response rates, with little if any pause after reinforcement (e.g., gambler). A fixed ratio schedule is predictable and produces a high response rate, with a short pause after reinforcement (e.g., eyeglass saleswoman). The variable interval schedule is unpredictable and produces a moderate, steady response rate (e.g., restaurant manager). The fixed interval schedule yields a scallop-shaped response pattern, reflecting a significant pause after reinforcement (e.g., surgery patient).
Skinner (1953) stated, "If the gambling establishment cannot persuade a patron to turn over money with no return, it may achieve the same effect by returning part of the patron's money on a variable-ratio schedule."
Skinner uses gambling as an example of the power of the variable-ratio reinforcement schedule for maintaining behavior even during long periods without any reinforcement. In fact, Skinner was so confident in his knowledge of gambling addiction that he even claimed he could turn a pigeon into a pathological gambler. It is indeed true that variable-ratio schedules keep behavior quite persistent - just imagine the frequency of a child's tantrums if a parent gives in even once to the behavior. The occasional reward makes it almost impossible to stop the behavior.
Recent research in rats has failed to support Skinner's idea that training on variable-ratio schedules alone causes pathological gambling. However, other research suggests that gambling does seem to work on the brain in the same way as most addictive drugs, and so there may be some combination of brain chemistry and reinforcement schedule that could lead to problem gambling (Figure 6.14). Specifically, modern research shows the connection between gambling and the activation of the reward centers of the brain that use the neurotransmitter (brain chemical) dopamine. Interestingly, gamblers don't even have to win to experience the "rush" of dopamine in the brain.
"Near misses," or almost winning but not actually winning, also have been shown to increase activity in the ventral striatum and other brain reward centers that use dopamine. These brain effects are almost identical to those produced by addictive drugs like cocaine and heroin. Based on the neuroscientific evidence showing these similarities, the DSM-5 now considers gambling an addiction, while earlier versions of the DSM classified gambling as an impulse control disorder.
Figure 6.14 Some research suggests that pathological gamblers use gambling to compensate for abnormally low levels of the hormone norepinephrine, which is associated with stress and is secreted in moments of arousal and thrill.
In addition to dopamine, gambling also appears to involve other neurotransmitters, including norepinephrine and serotonin. Norepinephrine is secreted when a person feels stress, arousal, or thrill. It may be that pathological gamblers use gambling to increase their levels of this neurotransmitter. Deficiencies in serotonin might also contribute to compulsive behavior, including a gambling addiction.
It may be that pathological gamblers' brains are different than those of other people, and perhaps this difference may somehow have led to their gambling addiction, as these studies seem to suggest.
However, it is very difficult to ascertain the cause because it is impossible to conduct a true experiment (it would be unethical to try to turn randomly assigned participants into problem gamblers). Therefore, it may be that causation actually moves in the opposite direction - perhaps the act of gambling somehow changes neurotransmitter levels in some gamblers' brains. It also is possible that some overlooked factor, or confounding variable, played a role in both the gambling addiction and the differences in brain chemistry.
Figure 6.15 Psychologist Edward Tolman found that rats use cognitive maps to navigate through a maze. Have you ever worked your way through various levels on a video game? You learned when to turn left or right, move up or down. In that case you were relying on a cognitive map, just like the rats in a maze.
Latent learning also occurs in humans. Children may learn by watching the actions of their parents but only demonstrate it at a later date, when the learned material is needed. For example, suppose that Ravi's dad drives him to school every day. In this way, Ravi learns the route from his house to his school, but he's never driven there himself, so he has not had a chance to demonstrate that he's learned the way. One morning Ravi's dad has to leave early for a meeting, so he can't drive Ravi to school. Instead, Ravi follows the same route on his bike that his dad would have taken in the car. This demonstrates latent learning. Ravi had learned the route to school, but had no need to demonstrate this knowledge earlier.
Have you ever gotten lost in a building and couldn't find your way back out? While that can be frustrating, you're not alone. At one time or another we've all gotten lost in places like a museum, hospital, or university library. Whenever we go someplace new, we build a mental representation - or cognitive map - of the location, as Tolman's rats built a cognitive map of their maze. However, some buildings are confusing because they include many areas that look alike or have short lines of sight.
Because of this, it's often difficult to predict what's around a corner or decide whether to turn left or right to get out of a building. Psychologist Laura Carlson (2010) suggests that what we place in our cognitive map can impact our success in navigating through the environment. She suggests that paying attention to specific features upon entering a building, such as a picture on the wall, a fountain, a statue, or an escalator, adds information to our cognitive map that can be used later to help find our way out of the building.
Operant conditioning, learning objectives.
By the end of this section, you will be able to:
The previous section of this chapter focused on the type of associative learning known as classical conditioning. Remember that in classical conditioning, something in the environment triggers a reflex automatically, and researchers train the organism to react to a different stimulus. Now we turn to the second type of associative learning, operant conditioning . In operant conditioning, organisms learn to associate a behavior and its consequence ( [link] ). A pleasant consequence makes that behavior more likely to be repeated in the future. For example, Spirit, a dolphin at the National Aquarium in Baltimore, does a flip in the air when her trainer blows a whistle. The consequence is that she gets a fish.
Classical Conditioning | Operant Conditioning | |
---|---|---|
Conditioning approach | An unconditioned stimulus (such as food) is paired with a neutral stimulus (such as a bell). The neutral stimulus eventually becomes the conditioned stimulus, which brings about the conditioned response (salivation). | The target behavior is followed by reinforcement or punishment to either strengthen or weaken it, so that the learner is more likely to exhibit the desired behavior in the future. |
Stimulus timing | The stimulus occurs immediately before the response. | The stimulus (either reinforcement or punishment) occurs soon after the response. |
Psychologist B. F. Skinner saw that classical conditioning is limited to existing behaviors that are reflexively elicited, and it doesn’t account for new behaviors such as riding a bike. He proposed a theory about how such behaviors come about. Skinner believed that behavior is motivated by the consequences we receive for the behavior: the reinforcements and punishments. His idea that learning is the result of consequences is based on the law of effect, which was first proposed by psychologist Edward Thorndike . According to the law of effect , behaviors that are followed by consequences that are satisfying to the organism are more likely to be repeated, and behaviors that are followed by unpleasant consequences are less likely to be repeated (Thorndike, 1911). Essentially, if an organism does something that brings about a desired result, the organism is more likely to do it again. If an organism does something that does not bring about a desired result, the organism is less likely to do it again. An example of the law of effect is in employment. One of the reasons (and often the main reason) we show up for work is because we get paid to do so. If we stop getting paid, we will likely stop showing up—even if we love our job.
Working with Thorndike’s law of effect as his foundation, Skinner began conducting scientific experiments on animals (mainly rats and pigeons) to determine how organisms learn through operant conditioning (Skinner, 1938). He placed these animals inside an operant conditioning chamber, which has come to be known as a “Skinner box” ( [link] ). A Skinner box contains a lever (for rats) or disk (for pigeons) that the animal can press or peck for a food reward via the dispenser. Speakers and lights can be associated with certain behaviors. A recorder counts the number of responses made by the animal.
(a) B. F. Skinner developed operant conditioning for systematic study of how behaviors are strengthened or weakened according to their consequences. (b) In a Skinner box, a rat presses a lever in an operant conditioning chamber to receive a food reward. (credit a: modification of work by “Silly rabbit”/Wikimedia Commons)
Link to Learning
Watch this brief video clip to learn more about operant conditioning: Skinner is interviewed, and operant conditioning of pigeons is demonstrated.
In discussing operant conditioning, we use several everyday words—positive, negative, reinforcement, and punishment—in a specialized manner. In operant conditioning, positive and negative do not mean good and bad. Instead, positive means you are adding something, and negative means you are taking something away. Reinforcement means you are increasing a behavior, and punishment means you are decreasing a behavior. Reinforcement can be positive or negative, and punishment can also be positive or negative. All reinforcers (positive or negative) increase the likelihood of a behavioral response. All punishers (positive or negative) decrease the likelihood of a behavioral response. Now let’s combine these four terms: positive reinforcement, negative reinforcement, positive punishment, and negative punishment ( [link] ).
Reinforcement | Punishment | |
---|---|---|
Positive | Something is to the likelihood of a behavior. | Something is to the likelihood of a behavior. |
Negative | Something is to the likelihood of a behavior. | Something is to the likelihood of a behavior. |
The most effective way to teach a person or animal a new behavior is with positive reinforcement. In positive reinforcement , a desirable stimulus is added to increase a behavior.
For example, you tell your five-year-old son, Jerome, that if he cleans his room, he will get a toy. Jerome quickly cleans his room because he wants a new art set. Let’s pause for a moment. Some people might say, “Why should I reward my child for doing what is expected?” But in fact we are constantly and consistently rewarded in our lives. Our paychecks are rewards, as are high grades and acceptance into our preferred school. Being praised for doing a good job and for passing a driver’s test is also a reward. Positive reinforcement as a learning tool is extremely effective. It has been found that one of the most effective ways to increase achievement in school districts with below-average reading scores was to pay the children to read. Specifically, second-grade students in Dallas were paid $2 each time they read a book and passed a short quiz about the book. The result was a significant increase in reading comprehension (Fryer, 2010). What do you think about this program? If Skinner were alive today, he would probably think this was a great idea. He was a strong proponent of using operant conditioning principles to influence students’ behavior at school. In fact, in addition to the Skinner box, he also invented what he called a teaching machine that was designed to reward small steps in learning (Skinner, 1961)—an early forerunner of computer-assisted learning. His teaching machine tested students’ knowledge as they worked through various school subjects. If students answered questions correctly, they received immediate positive reinforcement and could continue; if they answered incorrectly, they did not receive any reinforcement. The idea was that students would spend additional time studying the material to increase their chance of being reinforced the next time (Skinner, 1961).
In negative reinforcement , an undesirable stimulus is removed to increase a behavior. For example, car manufacturers use the principles of negative reinforcement in their seatbelt systems, which go “beep, beep, beep” until you fasten your seatbelt. The annoying sound stops when you exhibit the desired behavior, increasing the likelihood that you will buckle up in the future. Negative reinforcement is also used frequently in horse training. Riders apply pressure—by pulling the reins or squeezing their legs—and then remove the pressure when the horse performs the desired behavior, such as turning or speeding up. The pressure is the negative stimulus that the horse wants to remove.
Many people confuse negative reinforcement with punishment in operant conditioning, but they are two very different mechanisms. Remember that reinforcement, even when it is negative, always increases a behavior. In contrast, punishment always decreases a behavior. In positive punishment , you add an undesirable stimulus to decrease a behavior. An example of positive punishment is scolding a student to get the student to stop texting in class. In this case, a stimulus (the reprimand) is added in order to decrease the behavior (texting in class). In negative punishment , you remove a pleasant stimulus to decrease a behavior. For example, a driver might blast her horn when a light turns green, and continue blasting the horn until the car in front moves.
Punishment, especially when it is immediate, is one way to decrease undesirable behavior. For example, imagine your four-year-old son, Brandon, runs into the busy street to get his ball. You give him a time-out (positive punishment) and tell him never to go into the street again. Chances are he won’t repeat this behavior. While strategies like time-outs are common today, in the past children were often subject to physical punishment, such as spanking. It’s important to be aware of some of the drawbacks in using physical punishment on children. First, punishment may teach fear. Brandon may become fearful of the street, but he also may become fearful of the person who delivered the punishment—you, his parent. Similarly, children who are punished by teachers may come to fear the teacher and try to avoid school (Gershoff et al., 2010). Consequently, most schools in the United States have banned corporal punishment. Second, punishment may cause children to become more aggressive and prone to antisocial behavior and delinquency (Gershoff, 2002). They see their parents resort to spanking when they become angry and frustrated, so, in turn, they may act out this same behavior when they become angry and frustrated. For example, because you spank Brenda when you are angry with her for her misbehavior, she might start hitting her friends when they won’t share their toys.
While positive punishment can be effective in some cases, Skinner suggested that the use of punishment should be weighed against the possible negative effects. Today’s psychologists and parenting experts favor reinforcement over punishment—they recommend that you catch your child doing something good and reward her for it.
In his operant conditioning experiments, Skinner often used an approach called shaping. Instead of rewarding only the target behavior, in shaping , we reward successive approximations of a target behavior. Why is shaping needed? Remember that in order for reinforcement to work, the organism must first display the behavior. Shaping is needed because it is extremely unlikely that an organism will display anything but the simplest of behaviors spontaneously. In shaping, behaviors are broken down into many small, achievable steps. The specific steps used in the process are the following: Reinforce any response that resembles the desired behavior. Then reinforce the response that more closely resembles the desired behavior. You will no longer reinforce the previously reinforced response. Next, begin to reinforce the response that even more closely resembles the desired behavior. Continue to reinforce closer and closer approximations of the desired behavior. Finally, only reinforce the desired behavior.
Shaping is often used in teaching a complex behavior or chain of behaviors. Skinner used shaping to teach pigeons not only such relatively simple behaviors as pecking a disk in a Skinner box, but also many unusual and entertaining behaviors, such as turning in circles, walking in figure eights, and even playing ping pong; the technique is commonly used by animal trainers today. An important part of shaping is stimulus discrimination. Recall Pavlov’s dogs—he trained them to respond to the tone of a bell, and not to similar tones or sounds. This discrimination is also important in operant conditioning and in shaping behavior.
Here is a brief video of Skinner’s pigeons playing ping pong.
It’s easy to see how shaping is effective in teaching behaviors to animals, but how does shaping work with humans? Let’s consider parents whose goal is to have their child learn to clean his room. They use shaping to help him master steps toward the goal. Instead of performing the entire task, they set up these steps and reinforce each step. First, he cleans up one toy. Second, he cleans up five toys. Third, he chooses whether to pick up ten toys or put his books and clothes away. Fourth, he cleans up everything except two toys. Finally, he cleans his entire room.
Rewards such as stickers, praise, money, toys, and more can be used to reinforce learning. Let’s go back to Skinner’s rats again. How did the rats learn to press the lever in the Skinner box? They were rewarded with food each time they pressed the lever. For animals, food would be an obvious reinforcer.
What would be a good reinforce for humans? For your daughter Sydney, it was the promise of a toy if she cleaned her room. How about Joaquin, the soccer player? If you gave Joaquin a piece of candy every time he made a goal, you would be using a primary reinforcer . Primary reinforcers are reinforcers that have innate reinforcing qualities. These kinds of reinforcers are not learned. Water, food, sleep, shelter, sex, and touch, among others, are primary reinforcers. Pleasure is also a primary reinforcer. Organisms do not lose their drive for these things. For most people, jumping in a cool lake on a very hot day would be reinforcing and the cool lake would be innately reinforcing—the water would cool the person off (a physical need), as well as provide pleasure.
A secondary reinforcer has no inherent value and only has reinforcing qualities when linked with a primary reinforcer. Praise, linked to affection, is one example of a secondary reinforcer, as when you called out “Great shot!” every time Joaquin made a goal. Another example, money, is only worth something when you can use it to buy other things—either things that satisfy basic needs (food, water, shelter—all primary reinforcers) or other secondary reinforcers. If you were on a remote island in the middle of the Pacific Ocean and you had stacks of money, the money would not be useful if you could not spend it. What about the stickers on the behavior chart? They also are secondary reinforcers.
Sometimes, instead of stickers on a sticker chart, a token is used. Tokens, which are also secondary reinforcers, can then be traded in for rewards and prizes. Entire behavior management systems, known as token economies, are built around the use of these kinds of token reinforcers. Token economies have been found to be very effective at modifying behavior in a variety of settings such as schools, prisons, and mental hospitals. For example, a study by Cangi and Daly (2013) found that use of a token economy increased appropriate social behaviors and reduced inappropriate behaviors in a group of autistic school children. Autistic children tend to exhibit disruptive behaviors such as pinching and hitting. When the children in the study exhibited appropriate behavior (not hitting or pinching), they received a “quiet hands” token. When they hit or pinched, they lost a token. The children could then exchange specified amounts of tokens for minutes of playtime.
Parents and teachers often use behavior modification to change a child’s behavior. Behavior modification uses the principles of operant conditioning to accomplish behavior change so that undesirable behaviors are switched for more socially acceptable ones. Some teachers and parents create a sticker chart, in which several behaviors are listed ( [link] ). Sticker charts are a form of token economies, as described in the text. Each time children perform the behavior, they get a sticker, and after a certain number of stickers, they get a prize, or reinforcer. The goal is to increase acceptable behaviors and decrease misbehavior. Remember, it is best to reinforce desired behaviors, rather than to use punishment. In the classroom, the teacher can reinforce a wide range of behaviors, from students raising their hands, to walking quietly in the hall, to turning in their homework. At home, parents might create a behavior chart that rewards children for things such as putting away toys, brushing their teeth, and helping with dinner. In order for behavior modification to be effective, the reinforcement needs to be connected with the behavior; the reinforcement must matter to the child and be done consistently.
Sticker charts are a form of positive reinforcement and a tool for behavior modification. Once this little girl earns a certain number of stickers for demonstrating a desired behavior, she will be rewarded with a trip to the ice cream parlor. (credit: Abigail Batchelder)
Time-out is another popular technique used in behavior modification with children. It operates on the principle of negative punishment. When a child demonstrates an undesirable behavior, she is removed from the desirable activity at hand ( [link] ). For example, say that Sophia and her brother Mario are playing with building blocks. Sophia throws some blocks at her brother, so you give her a warning that she will go to time-out if she does it again. A few minutes later, she throws more blocks at Mario. You remove Sophia from the room for a few minutes. When she comes back, she doesn’t throw blocks.
There are several important points that you should know if you plan to implement time-out as a behavior modification technique. First, make sure the child is being removed from a desirable activity and placed in a less desirable location. If the activity is something undesirable for the child, this technique will backfire because it is more enjoyable for the child to be removed from the activity. Second, the length of the time-out is important. The general rule of thumb is one minute for each year of the child’s age. Sophia is five; therefore, she sits in a time-out for five minutes. Setting a timer helps children know how long they have to sit in time-out. Finally, as a caregiver, keep several guidelines in mind over the course of a time-out: remain calm when directing your child to time-out; ignore your child during time-out (because caregiver attention may reinforce misbehavior); and give the child a hug or a kind word when time-out is over.
Time-out is a popular form of negative punishment used by caregivers. When a child misbehaves, he or she is removed from a desirable activity in an effort to decrease the unwanted behavior. For example, (a) a child might be playing on the playground with friends and push another child; (b) the child who misbehaved would then be removed from the activity for a short period of time. (credit a: modification of work by Simone Ramella; credit b: modification of work by “JefferyTurner”/Flickr)
Remember, the best way to teach a person or animal a behavior is to use positive reinforcement. For example, Skinner used positive reinforcement to teach rats to press a lever in a Skinner box. At first, the rat might randomly hit the lever while exploring the box, and out would come a pellet of food. After eating the pellet, what do you think the hungry rat did next? It hit the lever again, and received another pellet of food. Each time the rat hit the lever, a pellet of food came out. When an organism receives a reinforcer each time it displays a behavior, it is called continuous reinforcement . This reinforcement schedule is the quickest way to teach someone a behavior, and it is especially effective in training a new behavior. Let’s look back at the dog that was learning to sit earlier in the chapter. Now, each time he sits, you give him a treat. Timing is important here: you will be most successful if you present the reinforcer immediately after he sits, so that he can make an association between the target behavior (sitting) and the consequence (getting a treat).
Watch this video clip where veterinarian Dr. Sophia Yin shapes a dog’s behavior using the steps outlined above.
Once a behavior is trained, researchers and trainers often turn to another type of reinforcement schedule—partial reinforcement. In partial reinforcement , also referred to as intermittent reinforcement, the person or animal does not get reinforced every time they perform the desired behavior. There are several different types of partial reinforcement schedules ( [link] ). These schedules are described as either fixed or variable, and as either interval or ratio. Fixed refers to the number of responses between reinforcements, or the amount of time between reinforcements, which is set and unchanging. Variable refers to the number of responses or amount of time between reinforcements, which varies or changes. Interval means the schedule is based on the time between reinforcements, and ratio means the schedule is based on the number of responses between reinforcements.
Reinforcement Schedule | Description | Result | Example |
---|---|---|---|
Fixed interval | Reinforcement is delivered at predictable time intervals (e.g., after 5, 10, 15, and 20 minutes). | Moderate response rate with significant pauses after reinforcement | Hospital patient uses patient-controlled, doctor-timed pain relief |
Variable interval | Reinforcement is delivered at unpredictable time intervals (e.g., after 5, 7, 10, and 20 minutes). | Moderate yet steady response rate | Checking Facebook |
Fixed ratio | Reinforcement is delivered after a predictable number of responses (e.g., after 2, 4, 6, and 8 responses). | High response rate with pauses after reinforcement | Piecework—factory worker getting paid for every x number of items manufactured |
Variable ratio | Reinforcement is delivered after an unpredictable number of responses (e.g., after 1, 4, 5, and 9 responses). | High and steady response rate | Gambling |
Now let’s combine these four terms. A fixed interval reinforcement schedule is when behavior is rewarded after a set amount of time. For example, June undergoes major surgery in a hospital. During recovery, she is expected to experience pain and will require prescription medications for pain relief. June is given an IV drip with a patient-controlled painkiller. Her doctor sets a limit: one dose per hour. June pushes a button when pain becomes difficult, and she receives a dose of medication. Since the reward (pain relief) only occurs on a fixed interval, there is no point in exhibiting the behavior when it will not be rewarded.
With a variable interval reinforcement schedule , the person or animal gets the reinforcement based on varying amounts of time, which are unpredictable. Say that Manuel is the manager at a fast-food restaurant. Every once in a while someone from the quality control division comes to Manuel’s restaurant. If the restaurant is clean and the service is fast, everyone on that shift earns a $20 bonus. Manuel never knows when the quality control person will show up, so he always tries to keep the restaurant clean and ensures that his employees provide prompt and courteous service. His productivity regarding prompt service and keeping a clean restaurant are steady because he wants his crew to earn the bonus.
With a fixed ratio reinforcement schedule , there are a set number of responses that must occur before the behavior is rewarded. Carla sells glasses at an eyeglass store, and she earns a commission every time she sells a pair of glasses. She always tries to sell people more pairs of glasses, including prescription sunglasses or a backup pair, so she can increase her commission. She does not care if the person really needs the prescription sunglasses, Carla just wants her bonus. The quality of what Carla sells does not matter because her commission is not based on quality; it’s only based on the number of pairs sold. This distinction in the quality of performance can help determine which reinforcement method is most appropriate for a particular situation. Fixed ratios are better suited to optimize the quantity of output, whereas a fixed interval, in which the reward is not quantity based, can lead to a higher quality of output.
In a variable ratio reinforcement schedule , the number of responses needed for a reward varies. This is the most powerful partial reinforcement schedule. An example of the variable ratio reinforcement schedule is gambling. Imagine that Sarah—generally a smart, thrifty woman—visits Las Vegas for the first time. She is not a gambler, but out of curiosity she puts a quarter into the slot machine, and then another, and another. Nothing happens. Two dollars in quarters later, her curiosity is fading, and she is just about to quit. But then, the machine lights up, bells go off, and Sarah gets 50 quarters back. That’s more like it! Sarah gets back to inserting quarters with renewed interest, and a few minutes later she has used up all her gains and is $10 in the hole. Now might be a sensible time to quit. And yet, she keeps putting money into the slot machine because she never knows when the next reinforcement is coming. She keeps thinking that with the next quarter she could win $50, or $100, or even more. Because the reinforcement schedule in most types of gambling has a variable ratio schedule, people keep trying and hoping that the next time they will win big. This is one of the reasons that gambling is so addictive—and so resistant to extinction.
In operant conditioning, extinction of a reinforced behavior occurs at some point after reinforcement stops, and the speed at which this happens depends on the reinforcement schedule. In a variable ratio schedule, the point of extinction comes very slowly, as described above. But in the other reinforcement schedules, extinction may come quickly. For example, if June presses the button for the pain relief medication before the allotted time her doctor has approved, no medication is administered. She is on a fixed interval reinforcement schedule (dosed hourly), so extinction occurs quickly when reinforcement doesn’t come at the expected time. Among the reinforcement schedules, variable ratio is the most productive and the most resistant to extinction. Fixed interval is the least productive and the easiest to extinguish ( [link] ).
The four reinforcement schedules yield different response patterns. The variable ratio schedule is unpredictable and yields high and steady response rates, with little if any pause after reinforcement (e.g., gambler). A fixed ratio schedule is predictable and produces a high response rate, with a short pause after reinforcement (e.g., eyeglass saleswoman). The variable interval schedule is unpredictable and produces a moderate, steady response rate (e.g., restaurant manager). The fixed interval schedule yields a scallop-shaped response pattern, reflecting a significant pause after reinforcement (e.g., surgery patient).
Connect the Concepts: Gambling and the Brain
Skinner (1953) stated, “If the gambling establishment cannot persuade a patron to turn over money with no return, it may achieve the same effect by returning part of the patron’s money on a variable-ratio schedule” (p. 397).
Skinner uses gambling as an example of the power and effectiveness of conditioning behavior based on a variable ratio reinforcement schedule. In fact, Skinner was so confident in his knowledge of gambling addiction that he even claimed he could turn a pigeon into a pathological gambler (“Skinner’s Utopia,” 1971). Beyond the power of variable ratio reinforcement, gambling seems to work on the brain in the same way as some addictive drugs. The Illinois Institute for Addiction Recovery (n.d.) reports evidence suggesting that pathological gambling is an addiction similar to a chemical addiction ( [link] ). Specifically, gambling may activate the reward centers of the brain, much like cocaine does. Research has shown that some pathological gamblers have lower levels of the neurotransmitter (brain chemical) known as norepinephrine than do normal gamblers (Roy, et al., 1988). According to a study conducted by Alec Roy and colleagues, norepinephrine is secreted when a person feels stress, arousal, or thrill; pathological gamblers use gambling to increase their levels of this neurotransmitter. Another researcher, neuroscientist Hans Breiter, has done extensive research on gambling and its effects on the brain. Breiter (as cited in Franzen, 2001) reports that “Monetary reward in a gambling-like experiment produces brain activation very similar to that observed in a cocaine addict receiving an infusion of cocaine” (para. 1). Deficiencies in serotonin (another neurotransmitter) might also contribute to compulsive behavior, including a gambling addiction.
It may be that pathological gamblers’ brains are different than those of other people, and perhaps this difference may somehow have led to their gambling addiction, as these studies seem to suggest. However, it is very difficult to ascertain the cause because it is impossible to conduct a true experiment (it would be unethical to try to turn randomly assigned participants into problem gamblers). Therefore, it may be that causation actually moves in the opposite direction—perhaps the act of gambling somehow changes neurotransmitter levels in some gamblers’ brains. It also is possible that some overlooked factor, or confounding variable, played a role in both the gambling addiction and the differences in brain chemistry.
Some research suggests that pathological gamblers use gambling to compensate for abnormally low levels of the hormone norepinephrine, which is associated with stress and is secreted in moments of arousal and thrill. (credit: Ted Murphy)
Although strict behaviorists such as Skinner and Watson refused to believe that cognition (such as thoughts and expectations) plays a role in learning, another behaviorist, Edward C. Tolman , had a different opinion. Tolman’s experiments with rats demonstrated that organisms can learn even if they do not receive immediate reinforcement (Tolman & Honzik, 1930; Tolman, Ritchie, & Kalish, 1946). This finding was in conflict with the prevailing idea at the time that reinforcement must be immediate in order for learning to occur, thus suggesting a cognitive aspect to learning.
In the experiments, Tolman placed hungry rats in a maze with no reward for finding their way through it. He also studied a comparison group that was rewarded with food at the end of the maze. As the unreinforced rats explored the maze, they developed a cognitive map : a mental picture of the layout of the maze ( [link] ). After 10 sessions in the maze without reinforcement, food was placed in a goal box at the end of the maze. As soon as the rats became aware of the food, they were able to find their way through the maze quickly, just as quickly as the comparison group, which had been rewarded with food all along. This is known as latent learning : learning that occurs but is not observable in behavior until there is a reason to demonstrate it.
Psychologist Edward Tolman found that rats use cognitive maps to navigate through a maze. Have you ever worked your way through various levels on a video game? You learned when to turn left or right, move up or down. In that case you were relying on a cognitive map, just like the rats in a maze. (credit: modification of work by “FutUndBeidl”/Flickr)
Latent learning also occurs in humans. Children may learn by watching the actions of their parents but only demonstrate it at a later date, when the learned material is needed. For example, suppose that Ravi’s dad drives him to school every day. In this way, Ravi learns the route from his house to his school, but he’s never driven there himself, so he has not had a chance to demonstrate that he’s learned the way. One morning Ravi’s dad has to leave early for a meeting, so he can’t drive Ravi to school. Instead, Ravi follows the same route on his bike that his dad would have taken in the car. This demonstrates latent learning. Ravi had learned the route to school, but had no need to demonstrate this knowledge earlier.
Have you ever gotten lost in a building and couldn’t find your way back out? While that can be frustrating, you’re not alone. At one time or another we’ve all gotten lost in places like a museum, hospital, or university library. Whenever we go someplace new, we build a mental representation—or cognitive map—of the location, as Tolman’s rats built a cognitive map of their maze. However, some buildings are confusing because they include many areas that look alike or have short lines of sight. Because of this, it’s often difficult to predict what’s around a corner or decide whether to turn left or right to get out of a building. Psychologist Laura Carlson (2010) suggests that what we place in our cognitive map can impact our success in navigating through the environment. She suggests that paying attention to specific features upon entering a building, such as a picture on the wall, a fountain, a statue, or an escalator, adds information to our cognitive map that can be used later to help find our way out of the building.
Watch this video to learn more about Carlson’s studies on cognitive maps and navigation in buildings.
Operant conditioning is based on the work of B. F. Skinner. Operant conditioning is a form of learning in which the motivation for a behavior happens after the behavior is demonstrated. An animal or a human receives a consequence after performing a specific behavior. The consequence is either a reinforcer or a punisher. All reinforcement (positive or negative) increases the likelihood of a behavioral response. All punishment (positive or negative) decreases the likelihood of a behavioral response. Several types of reinforcement schedules are used to reward behavior depending on either a set or variable period of time.
Critical thinking questions.
1. What is a Skinner box and what is its purpose?
2. What is the difference between negative reinforcement and punishment?
3. What is shaping and how would you use shaping to teach a dog to roll over?
4. Explain the difference between negative reinforcement and punishment, and provide several examples of each based on your own experiences.
5. Think of a behavior that you have that you would like to change. How could you use behavior modification, specifically positive reinforcement, to change your behavior? What is your positive reinforcer?
1. A Skinner box is an operant conditioning chamber used to train animals such as rats and pigeons to perform certain behaviors, like pressing a lever. When the animals perform the desired behavior, they receive a reward: food or water.
2. In negative reinforcement you are taking away an undesirable stimulus in order to increase the frequency of a certain behavior (e.g., buckling your seat belt stops the annoying beeping sound in your car and increases the likelihood that you will wear your seatbelt). Punishment is designed to reduce a behavior (e.g., you scold your child for running into the street in order to decrease the unsafe behavior.)
3. Shaping is an operant conditioning method in which you reward closer and closer approximations of the desired behavior. If you want to teach your dog to roll over, you might reward him first when he sits, then when he lies down, and then when he lies down and rolls onto his back. Finally, you would reward him only when he completes the entire sequence: lying down, rolling onto his back, and then continuing to roll over to his other side.
COMMENTS
Operant conditioning breaks down a task into increments. If you want to teach a pigeon to turn in a circle to the left, you give it a reward for any small movement it makes in that direction. Soon ...
Pigeons could be pilots in World War II missions, fighting Nazi Germany. When Skinner proposed this idea to the military, he was met with skepticism. Yet, he received $25,000 to start his work on "Project Pigeon." The device worked! Operant conditioning trained pigeons to navigate missiles appropriately and hit their targets.
Operant conditioning is a method of learning that occurs through rewards and punishments for behavior. Through operant conditioning, an individual makes an association between a particular behavior and a consequence. ... Skinner's Pigeon Experiment. B.F. Skinner conducted several experiments with pigeons to demonstrate the principles of ...
The Behavioral Psychology Theory That Explains Learned Behavior. A Skinner box is an enclosed apparatus that contains a bar or key that an animal subject can manipulate in order to obtain reinforcement. Developed by B. F. Skinner and also known as an operant conditioning chamber, this box also has a device that records each response provided by ...
An operant conditioning chamber (also known as a "Skinner box") is a laboratory apparatus used in the experimental analysis of animal behavior. It was invented by Skinner while he was a graduate student at Harvard University. As used by Skinner, the box had a lever (for rats), or a disk in one wall (for pigeons).
B. F. Skinner is a psychologist and a behaviorist he is amongst the firsts to study operant conditioning. He made many experiments on animals he would put th...
A simple experiment demonstrates this to be the case. ... operant conditioning usually takes place. In six out of eight cases the resulting responses were so clearly defined that two observers could agree perfectly in counting instances. ... the speedier and more marked the conditioning. One reason is that the pigeon's behavior becomes more ...
The Skinner Box is a chamber, often small, that is used to conduct operant conditioning research with animals. Within this chamber, there is usually a lever or key that an individual animal can operate to obtain a food or water source within the chamber as a reinforcer. The chamber is connected to electronic equipment that records the animal ...
Operant conditioning is a system of learning that happens by changing external variables called 'punishments' and 'rewards.'. Throughout time and repetition, learning happens when an association is created between a certain behavior and the consequence of that behavior (good or bad). You might also hear this concept as "instrumental ...
Figure 6.10 (a) B. F. Skinner developed operant conditioning for systematic study of how behaviors are strengthened or weakened according to their consequences. (b) In a Skinner box, a rat presses a lever in an operant conditioning chamber to receive a food reward. (credit a: modification of work by "Silly rabbit"/Wikimedia Commons)
Working with Thorndike's law of effect as his foundation, Skinner began conducting scientific experiments on animals (mainly rats and pigeons) to determine how organisms learn through operant conditioning (Skinner, 1938). He placed these animals inside an operant conditioning chamber, which has come to be known as a "Skinner box" (). A ...
Experiment #3: Pigeon Ping-Pong. But Skinner wasn't only concerned with teaching pigeons how to read. It seems he also made sure they had time to play games as well. In one of his more whimsical experiments, B. F. Skinner taught a pair of common pigeons how to play a simplified version of table tennis.. The pigeons in this experiment were placed on either side of a box and were taught to ...
Operant conditioning can be used to shape behavior. If the goal is to have a pigeon turn in a circle to the left, a reward is given for any small movement to the left.
A classic experiment in operant conditioning conducted by the famous behaviourist. ... They reveal another side to the man famous for his operant conditioning experiments with rats and pigeons ...
The stereotype of a bespectacled experimenter in a white lab coat, engaged in shaping behavior through the operant conditioning of lab rats or pigeons in contraptions known as Skinner boxes comes directly from Skinner's immeasurably influential research. ... Skinner refined the concept of operant conditioning and the Law of Effect. Among his ...
In operant conditioning, organisms learn to associate a behaviour and its consequence ( Table L.1 ). A pleasant consequence makes that behaviour more likely to be repeated in the future. For example, Spirit, a dolphin at the National Aquarium in Baltimore, does a flip in the air when Spirit's trainer blows a whistle.
During World War II, Skinner worked on a program called Project Pigeon - also known as Project Orcon, short for Organic Control - an experimental project to create pigeon-guided missiles. The pigeons were trained by Skinner to peck at a target, and they rewarded with food when they completed the task correctly.
One of Skinner's most famous works, his novel, Walden Two, was first published in the same year as his article on pigeon superstition. Walden Two was Skinner's personal vision of a utopian society governed by his principles of operant conditioning in which everyone is happy, content, safe, and pro-ductive.
The first published experiment based on research conducted using the chamber was conducted by Professor Ohinata . The English version of the abstract of the paper reads in part: ... 1979, photographic display between pages 184 and 185 that is labeled "Demonstrating operant conditioning of a pigeon, Indiana, 1948"). The balance between the ...
Working with Thorndike's law of effect as his foundation, Skinner began conducting scientific experiments on animals (mainly rats and pigeons) to determine how organisms learn through operant conditioning. He placed these animals inside an operant conditioning chamber, which has come to be known as a "Skinner box" (Figure 6.10).
Operant Conditioning is introduced and explained. COntains great old footage of Skinner demonstrating shaping and explaining his principles of this kind of ...
Working with Thorndike's law of effect as his foundation, Skinner began conducting scientific experiments on animals (mainly rats and pigeons) to determine how organisms learn through operant conditioning (Skinner, 1938). He placed these animals inside an operant conditioning chamber, which has come to be known as a "Skinner box" (). A ...
Skinner box. An operant conditioning chamber (also known as a Skinner box) is a laboratory apparatus used to study animal behavior.The operant conditioning chamber was created by B. F. Skinner while he was a graduate student at Harvard University.The chamber can be used to study both operant conditioning and classical conditioning.. Skinner created the operant conditioning chamber as a ...