Introduction

The Two-way Shuttle Box Avoidance Paradigm is a well-established laboratory method for studying learning, memory, and pharmacological and physiological interventions to brain circuitry (Moscarello & Ledoux, 2013; Vogel, 2008; Wadenberg, 2010). It was first proposed by Warner (1932) as a convenient paradigm for the study of association in rats, and since then, along with the Skinner box, became a model in the field of behavioral neurobiology (Beninger, 1989; LeDoux, Moscarello, Sears, & Campese, 2017). A shuttle box apparatus consists of two compartments with a metal grid floor, separated by a hurdle or a doorway. A conditioned stimulus (CS) represented by a visual or auditory signal in a compartment, where the animal is present, is contingently followed by an aversive unconditioned stimulus (US) – a foot shock across the metal grid floor. Subjects can either escape or avoid the US by shuttling between the compartments in response to the CS – a reaction representing the formation of a potential CS-US and R-O association (Rescorla & Solomon, 1967). The major advantage of this model is the locomotor response being measured – a shuttling reaction. Unlike the manipulatory responses in a Skinner Box (lever pressing), it does not demand extensive training or shaping and is present from the beginning, which makes it similar to other unconditioned responses (Bolles, 1970). The shuttling response, evoked as a part of an escape reaction, is easy to make contingent upon conditioned stimuli with training. Given this feature, avoidance learning can be easily automated, and the impact of animal handling can be minimized (Capaldi & Capaldi, 1972). The use of the electric shock as an unconditioned stimulus in shuttle box learning provides negative reinforcement relatively independent of the animal's state and deprivation level, unlike appetitive tasks. Thus, the simple motor response together with the convenient methodology of automated experiment and the great prognostic value for revealing brain circuitry for negative reinforcement, fear memory, and learning may explain the scientific utility of this model (LeDoux et al., 2017).

However, the mechanisms underlying active avoidance response appear to be rather ambiguous. The question of environmental variables (e.g., CS, US, apparatus properties) affecting the learning efficiency in active avoidance still attracts attention (Bignami, Alleva, Amorico, De Acetis, & Giardini, 1985; Fernandez-Teruel et al., 1991; Savonenko, Brush, & Zielinski, 1999; Zielinski, 1993). The lack of a safe compartment and the demand to establish bi-directional responses are two sources of uncertainty that may contribute to generating a conflict that undermines the analysis of behavior and learning in the shuttle box. Animals experience a two-way escape problem, while avoiding any given compartment, they are escaping to the compartment where shocked in the previous trial (Savonenko et al., 1999). The presence of bi-directional avoidance conflict on initial trials limits the use of aversive shuttling for the study of instrumental learning. The principles of active avoidance, once thought of as general learning rules (Beninger, 1989; Rescorla & Solomon, 1967; Warner, 1932), are more likely attributed to the functioning of specific fear circuitry (Fernandez-Teruel et al., 1991; LeDoux et al., 2017; Moscarello & LeDoux, 2013). For example, instrumental response acquisition in the shuttle box is highly dependent on amygdala functioning, mediated by anxiety level (Fernandez-Teruel et al., 1991) and potentially interfered with by Pavlovian fear reactions to the CS (LeDoux et al., 2017; Moscarello & LeDoux, 2013). At the same time, the source of reinforcement for the acquisition of the shuttling response and even the instrumentality of avoidance itself remain a matter of controversy (LeDoux et al., 2017). Punishment and negative reinforcement are both present in the course of aversive learning (Jean-Richard-Dit-Bressel, Killcross, & McNally, 2018), and affect the response in opposite directions, generating an approach-avoidance conflict: the former supressing locomotor activity while the latter potentiates it. Thereby, the use of electric shock in the course of instrumental learning imposes additional restrictions on the conditioned behavior. This leads to a conceptual problem, i.e. that findings from experiments using the shuttle box paradigm (e.g., characteristic learning curves, effects of physiological and pharmacological interventions) cannot be separated from the specificity of fear motivation without additional controls.

Since experimental studies usually require the complex analysis of instrumental learning, there is a need for adequate comparison models to verify or contrast the data on aversive instrumental conditioning. Appetitive conditioning is usually used in these cases as it shares all the learning variables, but incorporates different reinforcement valence (Beninger, 1989; Campese, Gonzaga, Moscarello, & LeDoux, 2015; Ilango, Wetzel, Scheich, & Ohl, 2010). A comparison of appetitive and aversive conditioning can be useful for the study of both the influence of reinforcement valence on learning and universal behavioral processes independent of motivation. However, typical forms of appetitive instrumental conditioning used in comparison with the shuttle box involve Skinner box type of tasks (Beninger, 1989), which differ from shuttle box in several dimensions. In the traditional Skinner box, the animal is free to behave and interact with the manipulanda available (e.g., measure different operants – level press vs. shuttling response). Other forms of appetitive learning include place-preference tasks (Everitt, Morris, O'Brien, & Robbins, 1991), nose-poke tasks (Bhandari, Daya, & Mishra, 2016), and mazes (Asem & Holland, 2015), but all share the same problem that complicates their direct comparison with the avoidance task. For instance, locomotor tasks like mazes measure locomotion and do not involve conditioning of a specific precise response to a stimulus, and implement both place and response conditioning (Asem & Holland, 2015). For manipulatory tasks, like the 5-Choice Serial-Reaction Time Task (5-CSRTT), difficult magazine training and shaping procedures affect the response (Bhandari et al., 2016). It seems that more adequate comparison to the shuttle box could be provided by a similar conditioning task in the same apparatus, but relying on a different motivational system. Thus, there is a need for a learning paradigm that trains the same behavior using different reinforcers to rule out specific processes, characteristic of the two types of motivation (approaching appetitive and avoiding aversive stimuli). The shuttle box task seems to be a good candidate, and recent studies, utilizing both appetitive and aversive reinforcers in the shuttle box, show promising results (Campese et al., 2015; Ilango et al., 2010). The shuttle box has been used for a place-preference task with both brain stimulation and food reinforcement (Everitt et al., 1991; Ilango et al., 2010), and the effectiveness of the brain stimulation reward was compared against an aversive reinforcer. A similar comparison but with a natural food reinforcement was utilized in a study with Pavlovian to Instrumental Transfer (PIT) in rats (Campese et al., 2015). However, in none of this work was an instrumental task superimposed on the particular behavioral constraints of the shuttle box. To explore the utility of this approach, the present paper provides a comparison of the behavior elicited in the shuttle box using either an appetitive or an aversive reinforcer. The difference between our investigation and earlier studies is that we propose an appetitive conditioning task (Berezhnoy & Inozemtsev, 2014) utilizing both Pavlovian and instrumental contingencies, while mirroring the technical parameters of the classical shuttle box active avoidance paradigm (Capaldi & Capaldi, 1972). Natural food reward in this task was delivered in a similar contingent way to the footshock that provides the comparison with instrumental conditioning. The primary goal of this study was to describe the new appetitive task in the shuttle box.

Materials and methods

Subjects

The experiments were conducted with 80-day-old male Wistar rats (weight 280–300 g, n=17 for appetitive, n=13 for aversive model), in accordance with the National Institutes of Health international rules for the care and use of laboratory animals (http://www.nap.edu/books/0309083893/html/R1.html). The animals were maintained in a controlled environmental animal facility with a 12-h light/dark cycle and free access to water. All testing was conducted at the end of the light phase. The behavioral training was held every other day, at the same time; every training session was preceded by 24-h food restriction and after each experiment animals were fed ad libitum. Animals were maintained at 90% of their free-feeding weight. Prior to training, animals were handled 15 min daily for 5 days.

Apparatus

The experimental chamber was a metal box 50 cm long, 25 cm wide, 30 cm high with a grid floor. It was divided by a metal partition with a square doorway (5 cm high, 5 cm wide) separating two equal-sized compartments (25 cm long, 25 cm wide, 30 cm high). The front side of the box was made out of transparent plexiglas to enable the observation of the animals' behavior. Each compartment was equipped with an LED element for the light stimulus presentation (400–700 nm, 2 lm, 5 × 2 cm), mounted on the lateral wall of the box 7 cm above the grid floor. Computer-controlled food dispensers (stirring disks with 20 cells for single portions of food – sunflower seeds, 50 mg) were mounted on the front wall of each compartment 2 cm above the grid floor in the appetitive task. In the active avoidance task, the grid floors in both compartments were connected to the computer-controlled stabilized current source (0.3 mA). Rat locomotion was monitored by the weight sensor, integrated in the construction of the swinging grid floor, and light-beam sensors integrated in the food dispensers. All the training procedures were controlled by custom designed software in LabVIEW (National Instruments) and performed in a room with dimmed lights and reduced background noise (20 Db) .

Procedure for the active avoidance task

No prior habituation to the experimental box was provided and all the components of the task were presented at once. On each conditioning day, the animal was placed in the right-hand side compartment of the chamber with the partition doorway opened. This daily procedure lasted for 10–17 min and consisted of 20 trials as presented in the left panel of Fig. 1: 10-s presentation of the light-conditioned stimulus (CS) alone, followed by a 10-s presentation of CS paired with electric shock (0.3 mA) from the grid floor (unconditioned stimulus, US) and fixed 30-s intertrial interval (ITI). The CS was switched in the compartment containing the animal, and switched off either following the animal's transition to the other compartment, or after 20 s together with electric shock termination. The animal's transition to the other compartment during the CS period prevented the shock and started the next trial after the ITI. During the ITI the doorway was also opened and the animal could shuttle between the compartments, which was scored as intertrial responses (ITRs), but no stimuli were administered. The next trial was started in the compartment where the rat was present at the end of the ITI.

Fig. 1
figure 1

Schematic representation of the two models: left - active avoidance, right – appetitive task. A simplified overall view and notated stimulus presentation scheme is shown. From the reactions shown in the pictures (transition), only the last ones are reinforced (denoted as avoidance or food). In the appetitive task, loaded feeder (As the picture here is in color, should be "green". "Dark gray" was for the printed version) was always in the opposite compartment from the animal and switched depending on the animal position; the same rule is used in active avoidance for the safe compartment

Procedure for appetitive task

Learning during appetitively motivated tasks usually requires animals to be familiarized with the feeder. This procedure demands additional training and time. In this study, we tried to minimize the impact of previous experience on behavior acquisition and so the animal was introduced to the chamber and all the stimuli at once, without prior habituation to the experimental box, shaping behavior, or magazine training. The only change on the first day was that both feeders contained one portion of food from the beginning to attract the animal. On each conditioning day, the animal was placed in the right-side compartment of the chamber with the partition doorway opened. The basic daily procedure lasted 10–17 min (depending upon how fast the animal transitioned to the other compartment) and consisted of 20 trials (see the right panel of Fig. 1): 20-s presentation of the conditioned stimulus (CS) followed in every trial by the fixed 30-s ITI. The conditioned light stimulus was switched on in the compartment where the animal was at that moment, and switched off either following the animal's transition to the other compartment, or after 20 s. Animal transition during CS was reinforced with a single portion of food (sunflower seed, 50 mg) and started the next trial after ITI. The animal could never get the food reinforcement from the feeder in the same compartment without making the transition. During the ITI, the door was also opened and the animal could move freely between compartments; transitions and nose pokes to the feeders were scored as ITRs, but not reinforced.

The following behavioral parameters were automatically measured: number and latency of transitions between compartments and number of nose pokes to the feeders. The transitions were classified based on their relation to the CS: those made during the CS were scored as conditioned responses and those during ITI as ITRs. Conditioned responses during the active avoidance task were additionally classified on the basis of latency from CS as avoidance – transition before the shock onset (LP from CS <10 s) and escape – transition during the shock (LP from CS >10 s) responses. The number of different emotional reactions (grooming, freezing, jumping, and defecation) was measured manually by two independent observers in both tasks (Cohen's kappa = 0.9)

Statistical analyses

Behavioral data were analyzed using factorial analysis of variance (ANOVA) considering the daily behavioral session and model (aversive/appetitive) as grouping factors in order to analyze the higher-order interactive effects of multiple categorical independent variables (factors) and to test for significant differences between the models. Additional analysis was undertaken using post hoc comparisons (Tukey’s HSD test) in cases where significant effects were found. The Kolmogorov-Smirnov test was used to compare the distributions. Differences were considered statistically significant at p < 0.05. All the data are presented as mean ± SEM values.

Results

Despite the lack of prior magazine training in the appetitive task, all 17 animals made multiple nose pokes to the feeders in the first session. Only six animals (35%) took food from both feeders in the first session, but all animals did by the second session. All animals made at least one low-latency (<10 s) response to the CS during the first 2 days. Thus, the first 2 days may be considered analogous to the procedure of familiarization with the chamber. All animals exceeded the learning criterion of “75% low-latency responses to the CS” by the fifth day (80–100th trial) and nine animals (53%) reached this criterion by the fourth day (60–80th trial). Another learning criterion of three conditioned reactions in a row was reached by all animals by the sixth day (100–120th trial), and seven animals (41%) already reached this criterion by the fourth day. All animals successfully mastered the task and there no animals were excluded from the analysis. Correct reaction included transition to the opposite compartment of the chamber and nose poke to the feeder, and so contained more actions than were required in the active avoidance task. Even so, appetitive learning in the novel task was more effective than active avoidance learning, taken for comparison (see Table 1). Only six out of 13 (46%) animals in the active avoidance task met the 75% learning criterion even by the tenth day. The criterion of three conditioned reactions in a row was reached in all animals by the seventh day (120–140th trial) but nine animals (69%) already reached this criterion by the fourth day.

Table 1 Comparison of the two shuttle box tasks using the two main learning criteria: 15/20 (75%) low-latency conditioned responses (CRs) in the daily session and three CRs in a row in the daily session. The day and the percent of animals reaching the criteria are shown

The differences in acquisition of the two responses are shown in Fig. 2 and indicate that the animals mastered the appetitive task faster and responded to the CS with lower latency than in the aversive task. Learning curves were analyzed using ANOVA with repeated measures incorporating the model (appetitive/aversive) as a categorical predictor. Analysis revealed both the effect of the model (F(1, 28)=24.79, p< 0.01) and the interactive effect of the learning day and model (F(9, 252)=10.13, p< 0.01) on conditioned responses. Animals showed an increase in the number of responses after learning in both paradigms, but the learning curve in the case of appetitive task was steeper, revealing faster habit formation. Post hoc analysis showed significant differences between the level of conditioned responses starting from day 6: 16.7 ± 0.9 versus 8.9 ± 1.1 (p<0.01) in active avoidance. Comparison of reaction time latencies showed significant changes over time (F(9,252)=67.35, p<0.01) and differences between the models (F(9,252)=9,26, p<0,01), starting from day 5: 5.9 ± 1.3 versus 10.4 ± 0.6 s (p<0.01) in active avoidance. By the tenth day reaction time latencies in both models had stabilized, but they still were much lower and more consistent in the appetitive task: 2.5 ± 0.3 versus 7.0 ± 0.9 s (p<0.01) than in active avoidance.

Fig. 2
figure 2

Mean learning curves, representing the main parameters of the learning process. Upper picture – the appetitive task, lower picture – the active avoidance in the shuttle box. Performance is measured by the conditioned reaction latency (CR latency); rate of conditioned reactions (CRs) and intertrial reactions (ITRs), represented on the additional axis. All data are presented as mean ± SEM

Another major difference between the models was the overall activity level; the number and dynamics of intertrial reactions is presented in Fig. 2. Whereas in active avoidance the number of intertrial reactions hardly reached ten per day during the entire training period, it notably increased in the appetitive task starting from the fourth day (fourth vs. third day, p<0.05) and reached the maximum of 61.7±6.6 on the seventh day. ANOVA revealed both the effect of the model (F(1,28)=216.6, p<0.01) and interaction effect of the learning day and model (F(9,252)=38.4, p<0.01). It should be noted that in appetitive-driven task animals were very active, all transitions between compartments were directed towards feeders and accompanied by nose pokes. Despite intertrial transitions between compartments not being reinforced, they persisted up to the tenth day. In contrast, animals in the active avoidance task were mostly immobile, except for the conditioned stimulus, and by the tenth day spent 440.7 ± 39.8 s (approx. 55% of total time) freezing (vs. 19.5 ± 1.7 s in the appetitive-driven task, p<0.01).

To further illustrate this difference between two models, we analyzed the number and timing of intertrial and conditioned reactions in individual animals, as reported in Fig. 3. The goal was to show the learning process in more detail and analyze the reasons for smaller conditioned reaction latency and the excessive number of intertrial reactions in the appetitive-driven task. Concerning the conditioned reaction latencies, comparison of distributions showed that most conditioned reactions in both models had a low latency of 0–5 s (see the right panel of Fig. 3, lower histograms). However, although overall conditioned reaction time distributions differ (K-S test, p<0.01), which can underlie the contrast in mean reaction times. This contrast could be explained by the presence of occasional escape reactions (10–15 s) up to the end of active avoidance learning but not the difference in reaction time to the CS itself.

Fig. 3
figure 3

Sample graphs for two individual rats, representing latencies for each conditioned (diamonds) and intertrial (triangles) reaction for all trials during learning. Upper picture – appetitive task, lower picture – active avoidance in the shuttle box. Time interval for each trial is divided to the conditioned stimulus period (0–20 s) and the intertrial period (20–50 s). Solid line on the lower graph (active avoidance) delineates the avoidance period (0–10 s) and escape period (10–20 s). Additional time distribution histogram for the different reactions is shown for each graph in the right panel. Latency of intertrial reactions in this figure (vertical axis) is measured from the end of the trial

The other question is whether intertrial reactions during the appetitive task differed from those in the active avoidance only in number or also qualitatively. Analysis showed that intertrial reaction time distributions did not differ between models (K-S test, p=0.3), despite the difference in numbers. In both models, the majority of reactions occurred during the second half of the intertrial period and had a latency of 15–30 s (see the right panel of Fig. 3, upper histograms). During the active avoidance task, intertrial reactions were present from the beginning but ceased over time. During the appetitive driven task, the initial emergence of intertrial reactions coincided with heightened behavioral activity towards feeders, mostly on the second day (i.e., after 20–40 trials), and then their frequency only increased. Multiple intertrial reactions (up to three) could occur in a single intertrial period (see Fig. 3) and animals showed signs of their extinction after prolonged training for 15–20 days. By the end of the learning period, animals showed stable low latency conditioned reactions towards the background of intertrial reactions with different timing, which raises the question of stimulus control. Thus, the number of reactions on the last day was normalized to the CS and ITI time, respectively. The level of baseline responding (during ITI) was 0.04 transitions per second. Analysis of the intertrial reactions distributions in the right upper panel of the Fig. 3 shows that intertrial reactions were not uniformly distributed in the ITI: there were no ITRs in the first 10 s and their number increased towards the end of ITI – beginning of CS. Taking into account the small CS time (due to low latency of response) the responding rate during CS was 0.33 transitions per second – much higher than the baseline.

Thus, the conditioned reaction to the light was completely established by the ninth experimental day and consistently performed with low latency. Animals demonstrated rather low variability in the formation and performance of this reaction. At the same time, a large variation was expressed in the number of intertrial responses: the dispersion in values ranged from 43 to 72. The number of ITRs increased during the conditioning process and the peak value (45 ± 3) coincided with the formation of the sustained conditioned response, as can be seen in Fig. 1. The process of excess reactions elimination showed a large variability between animals (from 0 to 86% decrease) and was accompanied by an increase in substitutive grooming activity and defecation in some animals.

Discussion

The two-way shuttle box apparatus along with the Skinner box remain the standard for studying instrumental learning in rodents. The main advantage of the shuttle box is the operant being measured (locomotor shuttling response), which is different from the manipulatory response (usually, a lever press) in Skinner-type tasks. In contrast to the lever press, shuttling does not demand shaping, which makes learning easily automated. The other side is that the shuttling response, usually combined with negative reinforcement, cannot be compared directly to lever-press learning, as too many variables differ in these tasks. In the work of Konorski (1967), who was the first to cogently divide the behavior into locomotor and manipulatory, these two types of responses and their inhibition were proposed to be dependent on different brain systems: premotor-prefrontal cortex for manipulatory responses and the caudate-prefrontal system for locomotion. Thus, it remains a matter of conjecture whether instrumental learning in such different tasks follow the same rules.

Nevertheless, the main problem with active avoidance is the inability to differentiate experimentally the processes specific to punishment and learning in general. In research on this model, the negative effect of fear circuitry (Pavlovian fear conditioning) on instrumental response learning is emphasized (Campese et al., 2015; LeDoux et al., 2017). Procedurally, the use of electrical current as an US makes the conditioning procedure more standardized, but complicates the interpretation of the results. Using the shuttle box avoidance, we cannot clearly differentiate whether the learning curve represents the general instrumental learning process or those processes specific to negative reinforcement. It complicates the distinction between the structures, implicated in the instrumental learning and fear circuitry, and makes the interpretation of pharmacological interventions to this model difficult. Unlike the versatile Skinner-type task, which is used either with electric shock or with food as a primary reinforcer, the shuttle box task is usually paired with negative reinforcement, which limits its use.

In an effort to broaden the use of the shuttle box paradigm and to get a comparable appetitive conditioning task while using the same CS and shuttling response, we created a computer-controlled shuttle box chamber with automatic LED panels and feeders on both sides (Berezhnoy & Inozemtsev, 2014). The food reinforcement was made contingent upon shuttling response to the light stimulus, providing both Pavlovian and instrumental contingency in the task. Campese et al. (2015) proposed a similar method using an UnSignaled APproach (USAP), in which the delivery of reinforcement was contingent upon shuttling response alone (instrumental contingency). In this work, we proposed the signaled approach (SAP) task for the comparison with shuttle box avoidance and described the temporal parameters and main features of the learning process. The constant automated acquisition protocol (20 s CS – 30 s ITI, 20 trials per day) without prior magazine training or familiarization with the chamber was used in the study. The use of this simple protocol alone allowed animals to acquire the target behavior: transition to the other compartment with a nose poke to the feeder in response to the light CS, followed by waiting in the doorway for the next CS during ITI. The final reaction was quite similar to the shuttling response to the CS in active avoidance. As most of the animals mastered this task within 5–6 days (100–120 trials, 10 min/day) from the beginning of the protocol, it may be considered easier than active avoidance, which required from 7 to 10 days (140–200 trials). On the first day, some animals explored the new environment and paid no attention to CS. Considering this, the conditioning process could have been even faster if we had used prior familiarization procedures. But even with the acquisition times given here, the proposed appetitive task enabled them to reach the learning criterion much faster than in classical Skinner box (Beninger, 1989) or other operant conditioning tasks (Bhandari et al., 2016). The final performance of the acquired habit was more stable in the appetitive task compared with the avoidance task, which was represented by 100% performance level on the tenth day (see Fig. 2). Easier acquisition of the shuttling response in the appetitive task, aside from its practical meaning of model validation, supports the notion of particular qualities of negative reinforcement, complicating instrumental learning in the avoidance paradigm (LeDoux et al., 2017; Moscarello & LeDoux, 2013). The main complication of avoidance learning, as seen in this study, was the low level of locomotor activity in animals after receiving electric shock in the very first trials. This is clearly illustrated by the low level of intertrial responses and their cessation on consequent learning days, in contrast to the appetitive model (see Fig. 3). These phenomena can be attributed to concurrent responses, like freezing, elicited by electric shock and CS (Moscarello & LeDoux, 2013), or conflict, generated by the absence of a safe place in the chamber (Vogel, 2008). It is important how these initial concurrent responses can shape the whole learning process and alter the perception of CS. In the situation of low locomotor activity, the proportion of reinforced responses is relatively low, which slows down the process of instrumental learning. Modern research, focusing on competing CS-evoked responses, shows the potentiation of instrumental learning during the active avoidance paradigm, following lesions of the central amygdala, responsible for the Pavlovian freezing response (Moscarello & LeDoux, 2013). These data form the basis of the modern two-process theory (LeDoux et al., 2017), which underlies the negative effect of Pavlovian CS-US (S-O) association, formed initially on the process of instrumental learning in active avoidance. But the side-by-side comparison raises the question as to whether the Pavlovian-instrumental interaction follow the same temporal regularities during appetitive learning.

In contrast to aversive situations, the conditioning of the appetitive reaction, shown in this study, clearly follows the classical incentive learning laws, as shown for the Skinner-box type of task (Beninger, 1989). The reaction of approach and nose poke to the feeder is present from the first contact of deprived animal with a feeder in this environment and only strengthens after multiple reinforcements. This approach reaction was originally treated as a “natural” instrumental response (Konorski, 1967), but in modern research is more often considered an entirely Pavlovian response (Harris, Andrew, & Kwok, 2013). The initial activity of the animal in the environment rises and concentrates on the feeders, which is shown by an increase in the number of both the CRs and ITRs by the 30th trial (see Fig. 3). These data could reflect the early establishment of R-O association in this task by the simple law of effect (Gilroy, Everett, & Delamater, 2014). Either a shuttling-feeder response may follow the regularities for a Pavlovian-conditioned approach (Fitzpatrick & Morrow, 2016), when environmental stimuli (in this case, feeders) acquire an incentive motivational value that, in turn, influences instrumental response by affecting the overall motivational substrate that supports responding (Rescorla & Solomon, 1967; Robinson & Berridge, 1993). In line with both these processes, possibly potentiating each other, the animal’s locomotor activity gradually rises, until it reaches stabilization. On a background of high locomotor activity the animal detects the CS-US contingence and forms the S-O association, which underlies the stable low-latency response around the 80–100th trial. Despite fact, that the response is under stimulus control, the extinction of unreinforced ITRs was found only after the seventh to eighth day and was highly variable among subjects. This process may rely on inhibition of locomotor activity, initiated by all the preceding learning processes, and turns out to be the most difficult part of the task. The use of an additional time-out punishment procedure (Richardson & Baron, 2008) at this stage may promote the extinction process in this task. However, the difficulty of ITR extinction could be a sign of insufficient CS discrimination (S-R-O formation) and stimulus control of behavior (Hergenhahn & Gottlieb, 1968; Zielinsky, 1993). It could also reflect the formation of a habitual strategy, on which the animal relies instead of the stimuli. A possible explanation for this could be the formation of R-O association at the initial stages of learning, which overshadows the following S-O association. If the simple R-O association was formed initially, despite the S-R-O contingency, the animal experienced a prolonged period of discontinuous reinforcement (probably, comparable to VR-schedule), which could make it stable to extinction (Mowrer & Jones, 1945).

On the other hand, an observed increase in the number of ITRs, coinciding with the CS-response establishment, could be affected by the temporal structure of the task. As we used the static 20-s CS-30s ITI protocol in both tasks, sharing some similarities with the fixed-interval (FI) protocol, ITRs here could be anticipatory in their nature and reflect the temporal dynamics of Pavlovian influence on the instrumental response (Rescorla & Solomon, 1967). Intertrial shuttling could be compared with the increase in lever-pressing in operant FI appetitive tasks, which is treated as a measure of premature responding and impulsivity (Berger & Sagvolden, 1998) or the prevailing of temporal relations over external cues (Hergenhahn & Gottlieb, 1968). In this regard, it is interesting to note that the majority of ITRs in this study were performed closer to the end of the intertrial period and thus can be scored as terminal behavior, similar to a consummatory response (Castilla & Pellon, 2013; Honig & Staddon, 1977 Whether or not such adjunctive behaviors are under control of reinforcement and underlined by R-O associations is still under discussion (Killeen & Pellon, 2013). The same temporal regularity was seen in this study for the avoidance paradigm, despite the smaller number of ITRs. It is interesting to know whether the same logic of premature responding and impulsivity is attributable to the aversive situation, as usually ITRs in active avoidance are explained either via the dynamic of fear state (Zielinski, 1993) or as separate instrumental reactions, reinforced by fear reduction (Mowrer & Keehn, 1958).

In conclusion, the present study shows that the shuttle-box appetitive task lacks the negative effect of fear circuitry and requires less trials compared to the classical active-avoidance task. This could be fruitfully used in the studies of Pavlovian-instrumental interactions, impulsivity (ITRs, premature responding), influence of reinforcement valence on behavior, and serve as a good comparison for the shuttle-box active avoidance task, as it shares the main common properties of this model (measured motor response, spatial configuration, temporal distribution of the stimuli). These two similar tasks with different reinforcements, studied in comparison, may broaden the use of the shuttle-box learning paradigm and promote our understanding of the neural structures involved in this type of instrumental conditioning.