One measure of intelligence is the ability to use past experience when one encounters new learning. Harlow (1949) referred to this as learning to learn. In a variation of this principle, Mackintosh, McGonigle, Holgate, and Vanderver (1968) trained rats on a simple discrimination and then repeatedly reversed that discrimination. They found that the more reversals that were trained, the faster the rats acquired them.

Rayburn-Reeves, Molet, and Zentall (2011; see also Cook & Rosen, 2010) trained pigeons on a version of the multiple-reversal task, in which on each session the same stimulus (S1) is correct for the first half of the session, and the other stimulus (S2) is correct for the last half of each session. Following a large number of training sessions, several strategies may be used to near optimally perform this task. Animals could learn to count the number of trials to the reversal, but as most research has used an 80-trial session, that would be beyond the ability of most animals. Alternatively, one could choose S1 until it stops being correct (the first trial of the reversal), and choose S2 from there on. This strategy is sometimes referred to as win-stay/lose-shift (Graf, Bullock, & Bitterman, 1964; Rayburn-Reeves, Stagner, Kirk, & Zentall, 2013b) and would result in a single error per session. Rats, when presented with the midsession reversal task involving spatial stimuli, respond essentially in this manner (Rayburn-Reeves, Stagner, et al., 2013b; Smith, Pattison, & Zentall, 2016; but see McMillan, Kirk, & Roberts, 2014); as do human participants with the midsession reversal task with visual discriminative stimuli (Rayburn-Reeves et al., 2011). A third strategy would be to attempt to time when half the session is over, but timing should be relatively inaccurate, compared with win-stay/lose-shift, given the duration of most half sessions and the variability in the animal’s choice response.

Pigeons, however, seem unable to adopt anything close to this optimal win-stay/lose-shift strategy (Rayburn-Reeves et al., 2011; Rayburn-Reeves, Qadri, Brooks, Keller, & Cook, 2017). They begin to choose S2 well before the reversal (making anticipatory errors) and continue to choose S1 well after the reversal (making perseverative errors). In fact, the resulting psychophysical function describing choice of S1 as a function of trials within a session suggests that the pigeons are attempting to time the duration of the session from the start of the session to the reversal (McMillan & Roberts, 2012; Smith, Beckmann, & Zentall, 2017; Stagner, Michler, Rayburn-Reeves, Laude, & Zentall, 2013). For example, Rayburn-Reeves et al. (2011) found that randomly changing the point in the session at which the reversal occurs does not eliminate the pigeons’ tendency to use time into the session as a cue to reverse. That is, when the reversal occurred early in a session, pigeons made many more perseverative errors, and when the reversal occurred late in the session, they made many more anticipatory errors.

Other evidence that the pigeons use time from the start of the session to the reversal as a cue to reverse comes from training the pigeons on one intertrial interval and then testing them on either an increase or a decrease in the intertrial interval (Laude, Stagner, Rayburn-Reeves, & Zentall, 2014; McMillan & Roberts, 2012). When the intertrial interval was shortened, there was a large increase in the number of perseverative errors, whereas when the intertrial interval was lengthened, it resulted in a large increase in the number of anticipatory errors.

Several interventions have been implemented to attempt to increase pigeons’ accuracy on the midsession reversal task (Rayburn-Reeves & Cook, 2016). When McMillan and Roberts (2012) made the S1–S2 discrimination a redundant visual-spatial discrimination, it improved accuracy by decreasing (but not eliminating) the number of anticipatory and perseverative errors. Also, when pigeons were trained on a spatial midsession reversal task, similar to the task used with rats, with a 5-s intertrial interval, their accuracy was not better than it was with a visual discrimination (Rayburn-Reeves, Laude, & Zentall, 2013a).

Smith et al. (2017) suggested that as the pigeons approached the reversal and for several trials beyond, pigeons may not have been able to remember what stimulus they had chosen on the preceding trial and, as important, the outcome of that choice. Although one might think that would be easy to remember, Randall and Zentall (1997) found that the location pecked and the consequence of that peck were not remembered well when a delay as short as 4 s occurred after the consequence. Smith et al. tested this hypothesis by, during the intertrial interval, providing the pigeons with a reminder of what stimulus they had chosen and the results of that choice. If the pigeon had chosen S1 a panel houselight was illuminated during the intertrial interval. If the pigeon had chosen S2 a ceiling houselight was illuminated during the intertrial interval. If the correct choice response had been made, the feeder light remained on during the intertrial interval. The reminder cues did increase overall accuracy, but they did not eliminate either anticipatory or perseverative errors.

One procedure that resulted in a large improvement in midsession reversal accuracy consisted of decreasing the probability of reinforcement for correct choice of the S2 stimulus to 20% while maintaining the probability of reinforcement for correct choice of the S1 stimulus at 100% (Santos, Soares, Vasconcelos, & Machado, 2019). As expected, this procedure biased the pigeons to choose the S1 stimulus and virtually eliminated anticipatory errors. However, surprisingly, it did not result in a concomitant increase in perseverative errors. Thus, decreasing the probability of reinforcement for correct choice of the S2 stimulus actually improved overall accuracy on the midsession reversal task.

After replicating the Santos et al. (2019) effect, Zentall, Andrews, Case, and Peng (2019) proposed that by decreasing the probability of reinforcement for correct choice of S2, the pigeons were forced to use the feedback from choice of S1 as the major basis for detection of the reversal. That is, they proposed that by making the feedback from choice of S2 unreliable, they reduced the competition between S1 and S2 and thereby made the occurrence of the reversal more discriminable. Furthermore, Zentall et al. suggested that the problem with the original “symmetrical” task was that as the pigeons approached the reversal, the competition between S1 and S2 stimuli caused the pigeons to use the time from the start of the session as a cue to reverse.

Zentall et al. (2019) tested this hypothesis by manipulating the asymmetry in the consequence of choice of S1 and S2 in a different way. They increased the number of responses required to choose S2 (but not S1). The experimental group was required to make 10 pecks to choose S2, but only peck once to choose S1, while the control group was required to make a single peck to both S1 and S2. Zentall et al. (2019) found, once again, that the pigeons in the experimental group made virtually no anticipatory errors and no more perseverative errors than pigeons in the control group. To further test the hypothesis that increasing the response requirement for choice of S2 reduced the competition between S1 and S2 and caused the pigeons to attend primarily to the feedback from choice of S1, Zentall et al. tested the pigeons with a novel S2 stimulus. The S2 color was replaced by vertical black and white lines. During the first half of the test session, because both groups had presumably learned to response to S1, both groups showed little disruption in choice accuracy. During the last half of the session, however, the control pigeons showed a very large disruption in choice accuracy, whereas the experimental pigeons showed only a small disruption in choice accuracy. The difference in accuracy between the two groups, during the last half of the session, resulted presumably because the control pigeons were trying to select the previous S2 stimulus that was no longer present, whereas the experimental pigeons were rejecting the still present S1 stimulus.

It should be noted that both Santos et al. (2019) and Zentall et al. (2019) also found that the increase in accuracy resulting from an asymmetry in the value of the two discriminative stimuli is unique to devaluing the S2 stimulus. When they devalued the S1 stimulus, pigeons showed significantly worse performance compared with the control group. In both studies, by biasing the pigeons to choose the S2 stimulus, the pigeons made many more anticipatory errors.

It appears that generating an asymmetry between the S1 and S2 stimuli that favors the S1 stimulus not only stops the pigeons from using the time from the start of the session to the reversal as a major cue to reverse but also encourages the pigeons to use the feedback from the choice of S1, without competition from S2, as the major cue to reverse the discrimination.

The purpose of the present experiment was to further test the hypothesis that, if correct choice of S1 was made more favorable than correct choice of S2, it would reduce the number of anticipatory errors while not increasing the number of perseverative errors, and by so doing would lead to more accurate performance of the midsession reversal task by pigeons. To accomplish this, we manipulated the magnitude of reinforcement between S1 and S2. For the experimental group, correct choice of S1 resulted in five pellets of reinforcement, whereas correct choice of S2 resulted in only one pellet of reinforcement. For the control group, correct choice of both S1 and S2 resulted in three pellets of reinforcement.

Methods

Subjects

Twelve unsexed pigeons 8–12 years of age, originally purchased from the Palmetto Pigeon Plant (Sumter, SC) were used in this experiment. The pigeons had prior experience making simultaneous color discriminations. The pigeons were housed in individual wired cages measuring 28 cm × 38 cm × 30.5 cm on a 12-hour light/dark cycle, with free access to grit and water. Over the course of the experiment, all pigeons were maintained at 85% free-feeding weight and were given free access to grit and water. They were cared for in accordance with University of Kentucky Animal Care Guidelines.

Apparatus

A Med Associates (St Albans, VT) ENV–008 modular operant test cage was used surrounded by a sound-attenuating cabinet. The response panel in the chamber has a horizontal row of three response keys. Behind each key is a 12-stimulus inline projector (Industrial Electronics Engineering, Van Nuys, CA) that projects red, green, and white colored lights. Reinforcement was delivered by a pellet dispenser (Med Associates ENV-203-45) mounted behind the response panel. A 28 V, 0.1 A house light is centered above the response panel. A microcomputer in the adjacent room controlled the experiment.

Procedure

Subjects were randomly assigned to either the experimental (n = 6) or the control group (n = 6). For pigeons in both groups, a trial started with illumination of the right and left response keys, one red and one green. For the first 40 trials of each session, one color was correct (S1), and for the remaining 40 trials of each session, the other color was correct (S2). For pigeons in the control group, all correct choices (a single peck) were reinforced with three pellets. For pigeons in the experimental group, correct choice of S1 was reinforced with five pellets, whereas correct choice of S2 was reinforced with one pellet. Red and green side key presentations were randomized and counterbalanced over trials. Half of the pigeons in both groups received red as the S1 and green the S2, and the other half received the reverse. Reinforcement followed all correct responses. Incorrect responses were not reinforced. All choices terminated the current trial and initiated a 5-s intertrial interval. Sessions were conducted 6 days a week until the pigeons’ accuracy on the task had stabilized (50 sessions).

Results

The choice of S1 for each group, plotted as a function of trial, pooled over the last 10 sessions of training, appears in Fig. 1. Santos et al. (2019) and Zentall et al. (2019) found that the asymmetry in outcome (100% reinforcement for correct choice of S1 and 20% reinforcement for correct choice of S2) and effort resulted in a decrease in anticipatory errors, but no increase in perseverative errors. For this reason, we analyzed separately trials from the first half and second half of each session from the last 10 sessions (41–50) pooled. Accuracy on Trial 41 was reverse coded, as the feedback from Trial 41 was not received until after the choice was made. A 2 × 2 mixed factor analysis of variance (ANOVA) was conducted to analyze proportion of correct choices over the last 10 sessions by group (experimental or control) and by session half (first or second). There was a significant main effect of group, F(1, 10) = 17.11, p < .002, indicating that the experimental group (M = 0.92, SD = 0.036) had a significantly greater proportion of correct choices than the control group (M = 0.83, SD = 0.049). The main effect of first half versus second half was not significant (F < 1), indicating that there was not a significant difference in proportion of correct choices between the first half (M = 0.88, SD = 0.078) and second half (M = 0.87, SD = 0.043) of the block of sessions. However, the Group × Session Half interaction was significant, F(1, 10) = 11.32, p = .007.

Fig. 1
figure 1

Proportion of S1 choices made as a function of the 80 trials in the session, averaged over the last 10 sessions of training. Error bars = ± one standard error of the mean

A separate test of anticipatory errors pooled over the last 10 sessions of training indicated that there were significantly more errors made by the control group (18.13%) than by the experimental group (5.85%), t(10) = 4.52, p = .0027. A similar analysis performed on perseverative errors pooled over the last 10 sessions of training also indicated that there were significantly more errors made by the control group (15.30%) than by the experimental group (9.91%), t(10) = 2.75, p = .02.

A more sensitive measure of the pigeons’ sensitivity to the reversal is their accuracy in the vicinity of the reversal. For this reason, another 2 × 2 factorial ANOVA was conducted to analyze proportion of correct choices for the five trials immediately preceding the reversal (Trials 37–41 pooled) and those following the reversal (Trials 42–46 pooled) over the last 10 training sessions for each group. The analysis indicated that there was a significant main effect of group, F(1, 10) = 19.19, p = .001. The experimental group (M = 0.72, SD = 0.04) had a significantly greater proportion of correct choices than the control group (M = 0.52, SD = 0.07). The main effect of trials preceding versus following the reversal was not significant, F(1, 10) = 3.56, p = .09. The pigeons were correct significantly more on Trials 37–41 (M = 0.62, SD = 0.12) than they were on were Trials 42–46 (M = 0.56, SD = 0.07). Furthermore, there was a significant Group × Trials interaction, F(1, 10) = 13.41, p = .004. As one can see in Fig. 1, the interaction suggests that the group difference was greater before the reversal than after the reversal. In fact, pigeons in the experimental condition had a significantly greater proportion of correct choices preceding the reversal (M = 0.72, SD = 0.04) than following the reversal (M = 0.55, SD = 0.083), t(5) = 4.19, p = 0.009, but pigeons in the control condition did not have a significantly greater proportion of correct choices preceding the reversal (M = 0.52, SD = 0.05), than following the reversal (M = 0.57, SD = 0.048), t(5) = 1.57, p = .18.

A separate test of anticipatory errors pooled over the last 10 sessions of training for the five trials immediately preceding the reversal (Trials 37–41) indicated that there were significantly more errors made by the control group (48%) than by the experimental group (28%), t(10) = 5.98, p = .0003. A similar analysis performed on perseverative errors pooled over the last 10 sessions of training for the five trials immediate following the reversal (Trials 42–46) indicated that the errors made by the control group (42.7%) were not significantly different from those made by the experimental group (44.7%; t < 1).

Discussion

In the midsession reversal task, pigeons make errors both before and after the reversal (Rayburn-Reeves et al., 2011). Asymmetry in outcome or response to S1 and S2, whether created by a difference in the probability of obtaining a reinforcer (Santos et al., 2019; Zentall et al., 2019) or by a difference in the response required for reinforcement (Zentall et al., 2019), improves performance primarily by reducing anticipatory errors. In the present experiment, we arranged asymmetry in the midsession reversal task by using different reinforcer magnitudes for choice of S1 and S2. This manipulation significantly increased accuracy by decreasing the number of anticipatory errors without increasing perseverative errors.

Consistent with prior literature, it appears that generating some asymmetry between the two stimuli by reinforcing correct choices of S1 with a larger magnitude of reinforcement than correct choices of S2 significantly increases overall accuracy compared with a control group for which the magnitude of reinforcement for correct S1 and S2 choices is equal. Furthermore, the facilitation can be attributed to the decrease in the proportion of anticipatory errors, relative to the control group.

The improved accuracy resulting from the greater magnitude of reinforcement for correct S1 responses as well as the results Santos et al. (2019) and Zentall et al. (2019) bear some resemblance to a differential outcomes effect (Peterson, Wheeler, & Trapold, 1980). In the differential outcomes effect, when in a conditional discrimination, correct choice of one alternative is reinforced with one magnitude of reinforcement, and correct choice of the other alternative is reinforced with a different magnitude of reinforcement, relative to controls, the conditional discrimination is often acquired more rapidly (Carlson & Wielkiewicz, 1976).

In fact, Rayburn-Reeves et al. (2017) trained pigeons on a spatial midsession reversal task with differential outcomes in the form of differential feeder location (correct left responses were reinforced at a left feeder, correct right responses were reinforced at a right feeder). They found that the differential outcomes resulted in a significant decrease in anticipatory errors, but little change in perseverative errors. Although differential outcomes may play a role in the present results, it cannot be the only mechanism responsible for the improved accuracy by the experimental group in the present experiment and earlier research (Santos et al., 2019; Zentall et al., 2019). When the S1/S2 reinforcement probability asymmetry is such that the probability of reinforcement favors correct choice of S2 over S1, facilitation has not been found (Santos et al., 2019; Zentall et al., 2019). In fact, relative to controls, such an asymmetry results a large increase in anticipatory errors, with no decrease in perseverative errors.

The benefit in accuracy for the experimental group that results from the asymmetry in the magnitude of reinforcement for correct choice of the S1 stimulus, relative to the S2 stimulus, appears as a reduction in anticipatory errors. Furthermore, this overall pattern of errors more closely approximates the optimal win-stay/lose-shift strategy characteristic of humans (Rayburn-Reeves et al., 2011) and rats trained on a spatial midsession reversal task (Rayburn-Reeves et al., 2013a, b). Zentall et al. (2019) suggested that the reason that pigeons make anticipatory and perseverative errors in the region around the reversal is because the response strengths of S1 and S2 are in greatest competition in that region. That is, as the time from the start of the session increases, the response strength associated with correct choice of S2 grows and begins to compete with the response strength associated with correct choice of S1, resulting in anticipatory errors. If, however, the response strength associated with correct choice of S2 grows more slowly because of the smaller magnitude of reinforcement associated with S2, there should be less competition in response strength between S1 and S2. That could explain the great decrease in anticipatory errors. Similar predictions are made by learning-to-time theory (LeT) proposed by Machado (1997). But why is it that the increased value of correct S1 outcomes does not result in a greater number of perseverative errors?

Zentall et al. (2019) propose that the reduction in competition between S1 and S2 response strength may result in the pigeons focusing attention on the outcome of choice of the S1 stimulus. Attention to the outcome of choice of the S1 stimulus (without regard to the outcome of choice of the S2 stimulus) may make the results of early trials following the reversal more discriminable. What results may be akin to what one sees when comparing reversal learning following continuous versus partial reinforcement. Reversal learning following continuous reinforcement is typically faster than following partial reinforcement because at the time of the reversal, nonreinforcement is more easily detected against the background of continuous reinforcement, as compared with a background of partial reinforcement (Mackintosh, 1965).

The paradox of the finding that a decrease in the probability of reinforcement of correct choice of the S2 stimulus results in an increase in midsession reversal accuracy is at least partially resolved by the understanding that there may be too much information present in the symmetrical midsession reversal task. That is, the feedback from choice of either the S1 or S2 stimulus, at any point in the session, provides complete information about which half of the session the pigeon is in. By making the feedback provided by choice of the S2 stimulus less informative (nonreinforcement of choice of S2 often provides no information), the pigeons must rely on feedback from choice of the S1 stimulus, and that feedback provides all the information needed to decide when the reversal has occurred.

Interestingly, a decrease in the probability of reinforcement of correct choice of the S1 stimulus (with reinforcement of all correct choices of the S2 stimulus) does not result in a similar reduction in anticipatory errors (Santos et al., 2019; Zentall et al., 2019). A decrease in the probability of reinforcement of correct choice of the S1 stimulus does not result in a similar reduction in anticipatory errors because, during the first half of the session, choice of the “correct” stimulus, S1, provides little feedback that it is correct (20% reinforcement in the first half of the session, 0% reinforcement in the second half). Detection of the reversal is easier to detect by choosing S2 (0% reinforcement in the first half of the session, 100% reinforcement in the second half), but each choice of the S2 stimulus in the first half of the session results in an error. This finding of an absence of symmetry (decreasing the value of S1 relative to S2 rather than the reverse) has been well demonstrated by both Santos et al. (2019) and Zentall et al. (2019).

The present results provide converging evidence that pigeons’ production of anticipatory and perseverative errors in the midsession reversal task result from competition between the response strengths of S1 and S2 in the region of the session around the reversal. Various procedures that reduce that competition, by giving greater value to S1, relative to S2, appear to improve accuracy on the midsession reversal task for pigeons.

Open practices statement

The data and materials from the present study are available from the second author.