Introduction

Visual perception plays an important role in animals’ successful adaptation to their environments. Although birds and humans have a long independent evolutionary history of more than 300 million years, comparative studies have revealed that they share functional properties such as object perception and categorization, implying that there may be some basic principles that work across different taxa (Soto & Wasserman, 2012). However, convergent evidence from other studies suggests that pigeons and humans may differ in their perception of object recognition (Friedman et al., 2005), picture perception (Goto, Lea, Wills, & Milton, 2011), and structure of random dots known as Glass patterns (Kelly et al., 2001). These findings are often accounted for by the relative importance of the whole and its component parts in the initial formation of stimulus control, reflecting differences in visual processing between humans and pigeons (Cerella, 1986; Cook, 2001). Because primates and birds show marked differences in brain structure (e.g., the six-layered neocortex in mammals and the dorsal ventricular ridge in birds), despite the anatomical similarities in their visual pathways (Husband & Shimizu, 2001), the difference between humans and pigeons is often thought to represent a general difference between primates and birds (Lea et al., 2006; Shimizu et al., 2010).

As claimed by Gestalt psychologists, in humans, the perception of the whole sometimes has properties that are not present among its component parts. Empirical evidence for Gestalt phenomena comes from experiments in which discrimination of parts becomes easier when they are embedded within a well-structured context. Humans detect more quickly, and more accurately, a diagonal line among multiple line distractors of different inclinations when presented with redundant contextual information in the form of an L-shape to each diagonal (Fig. 1). Despite the fact that context itself provides no information with respect to target detection, a number of studies have revealed that humans perform better in discrimination or target localization when certain contexts are present (Pomerantz, 2003; Pomerantz & Pristach, 1989; Pomerantz & Portillo, 2011; Pomerantz et al., 1977). Such facilitation, known as the configural superiority effect, may occur because the human percept of wholes differs from the mere summation of its component parts. Novel Gestalts emerge when stimuli are presented within a congruent context.

Fig. 1
figure 1

Stimulus displays in Experiment 1. Each display consisted of one target and three identical distractors. Targets were located at the left top in these examples. The L-shaped context facilitated target localization for humans and chimpanzees (Goto et al., 2012) despite the redundancy between the context and the discrimination of the positive and negative slopes. In contrast, the U-shaped context disrupted target localization for humans and chimpanzees

Donis and Heinemann (1993) examined whether such redundant contexts facilitate discrimination learning in pigeons as in humans. They trained pigeons to discriminate line orientations presented alone or embedded in an L-shaped context, as shown in Fig. 1, using a successive discrimination procedure. Seven out of eight pigeons learned the discrimination faster when the line was presented alone rather than in an L-shaped context, indicating that the same context that facilitated discrimination in humans has the opposite effect in pigeons. They further tested pigeons with varieties of stimulus sets and concluded that redundant contexts always disrupted discrimination (Donis et al., 2005). Another study, which used a same-or-different discrimination procedure, also confirmed the differential effect of L-shaped context on line-orientation discrimination in humans and pigeons (Kelly & Cook, 2003). These results suggest that pigeons, unlike humans, lack the perception of emergent configurations.

We previously compared chimpanzees with humans to examine whether non-human primates perceive emergent configurations in the same manner as humans (Goto et al., 2012). In a series of experiments, both chimpanzees and humans searched for an odd target among homogeneous distractors when each stimulus was presented in one of three ways: (1) alone, (2) with identical context that resulted in emergent configuration in humans (congruent context), or (3) with identical context that did not result in emergent configuration in humans (incongruent context) (see Figs. 1 and 4). Overall, the two species showed consistent responses: target localization became faster and more accurate when the patterns were presented in a congruent context than when they were presented alone. Similarly, neither species showed facilitated target localization when patterns were presented in an incongruent context. These results suggest that chimpanzees perceive emergent configurations in the same manner as humans.

In the present study, we examined the perception of emergent configurations in pigeons and crows using the same procedures that we used in our previous chimpanzee study (Goto et al., 2012). The previous study using similar stimuli revealed that the same redundant contexts have different effects on pattern discrimination in pigeons and humans (Kelly & Cook, 2003), but we did not know whether the pigeon results could be generalized to other birds. For example, pigeons feed in flocks on the ground on seeds and grains whereas crows eat a wide variety of food, including small live animals. Differences in feeding habitats might result in different aspects of visual information that birds attend to.

In order to make a plausible cross-species comparison, we tested both pigeons and crows using experimental stimuli and procedures as closely matched as possible to those of our chimpanzee study (Goto et al., 2012). In Experiment 1, we examine the effect of redundant context on line orientation discrimination in a four-alternative forced-choice task. Two types of redundant contexts were used. For humans and chimpanzees, the discrimination of shapes was facilitated when presented with congruent contexts from which the salient configurations emerged, whereas the same discrimination was disrupted when presented with incongruent contexts in which the configurations became less salient. In Experiment 2, we further examined the effect of redundant context by introducing four additional stimulus sets. Among the four stimulus sets used in Experiment 2, two sets were two-dimensional (2D) figures, and the others could be perceived as three-dimensional (3D) figures within the congruent contexts (Enns & Rensink, 1991; Weisstein & Harris, 1974). Although we originally thought that pictorial depth cues would have similar emergent properties to 2-D figures, both humans and chimpanzees failed to show the configural superiority effects with these 3D stimulus sets. Nonetheless, we included the 3D stimulus sets for the sake of comparison.

Experiment 1

Experiment 1 examined the effect of two types of redundant context in pigeons and crows using the same task and stimuli as in our previous chimpanzee study (Goto et al., 2012). In our previous study, discrimination of positive and negative slopes was easier in both humans and chimpanzees when the slopes were presented in an L-shaped context than when they were presented alone (Fig. 1). However, the addition of a U-shaped context to the slopes disrupted the discrimination of slopes in both species. The angular disparity between target and distractor slopes was 90°, and targets were presented at five different inclinations. Although the angular disparity between targets and distractors remained constant across the different presentations, humans and chimpanzees both made more errors and took longer to locate the targets as the stimulus inclination approached 45° from horizontal in the no-context presentations. The decreased visibility of obliquely oriented patterns compared to horizontally or vertically oriented ones is known as the “oblique effect” (Appelle, 1972), and this effect has been reported in pigeons (Donis, 1999). We thus examined whether the oblique effect would be preserved when L-shaped contexts were added to the slopes.

Methods

Subjects

Pigeons

The subjects were three experimentally naïve homing pigeons (Columba livia) obtained from the Japanese Association of Racing Pigeons. They were individually housed in stainless steel mesh cages (33 cm wide × 28 cm deep × 35 cm high) on a 13:11 h light/dark schedule. They were kept at or above 85% of their free-feeding weight, which was maintained by supplementary feeding and by food reinforcers in daily testing sessions. Water and grit were freely available in the living cages.

Crows

The subjects were three large-billed crows (Corvus macrorhynchos), which had been captured in the wild as nestlings and hand-reared in the laboratory. The Environmental Bureau of the Tokyo Metropolitan Government (Permission #74) permitted this capture. They were individually housed in stainless steel mesh cages (43 cm wide × 60 cm deep × 50 cm high) on a 13:11 h light/dark schedule. They were kept at or above 85% of their free-feeding weight, which was maintained by supplementary feeding and by food reinforcers in daily testing sessions. Water was freely available in the living cages. The Animal Care and Use Committee of Keio University (No. 08008) approved the experiment reported here.

Apparatus

Pigeons

Tests were conducted in a custom-built, operant conditioning chamber with internal dimensions of 43 cm long × 42 cm wide ×39 cm high. All stimuli were presented on a 15-in. color LCD monitor (FP51G, BenQ), visible through a 27.4 cm × 20.7 cm viewing window in the middle of the front panel. The bottom edge of the viewing window was 14 cm above the chamber floor. An infrared touchscreen (UniTouch SCN-CT-FLT14.0-SN3-J10, Touch Panel Systems) detected pecks to the monitor. A 28-V house light was located in the ceiling and illuminated during trials. A 5 cm × 5 cm aperture was positioned 6 cm below the viewing window and 3 cm above the chamber floor. A pellet dispenser (ENV-203, MED-Associates) positioned outside the chamber delivered a 45-mg pellet (5TUZ, TestDiet) into the aperture through a vinyl tube. A 28-V feeder light situated in the aperture signaled food availability. A computer (Dimension 1100/B110, Dell), situated nearby the chamber, controlled the experimental events and operated the house light, feeder light, and dispenser via an interface (K8055, Velleman). Throughout the experimental sessions, 75-dB white noise was played as a masking noise.

Crows

Tests were conducted in a custom-built, operant conditioning chamber with internal dimensions of 46 cm long × 60 cm wide × 60 cm high. The monitor, touchscreen, computer, interface, house light, and feeder light were the same models as those used for the pigeon chamber. All stimuli were presented on a monitor, visible through a 27.4 cm × 20.4 cm viewing window in the middle of the front panel. The bottom edge of the viewing window was 18 cm above the chamber floor. A house light was located in the ceiling and illuminated during trials. A 7 cm × 7 cm aperture was positioned 9 cm below the center of the bottom edge of the viewing window and 2 cm above the chamber floor. A pellet dispenser (ENV-203-1000, MED-Associates) positioned outside the chamber delivered dry dog pellets (approximately 0.45 g) into the aperture through a vinyl tube. A feeder light situated in the aperture signaled food availability and also served as the conditional reinforcer. Throughout the experimental sessions, 75-dB white noise was played as a masking noise.

Stimuli

The stimulus display consisted of one target and three identical distractors (Fig. 1). In the no-context displays, the target and distractors were the positively and negatively inclined line slopes. The positive slopes were inclined 45, 56.25, 67.50, 78.75, and 90° from horizontal. The inclination of the negative slopes differed 90° in a clockwise direction from those of the positive slopes. In the context displays, identical L-shaped or U-shaped contexts were added to both the positive and the negative slopes. The size of the stimuli was 3.0 × 3.0 cm, and the centers of the stimuli were separated by 9.0 cm.

Procedures

Initial training

The pigeons were first given magazine training, and then were allowed to peck at a white circle (3.0 cm diameter) on the monitor, which was subsequently used as a self-start signal to begin each trial. After the acquisition of start-key pecking, the pigeons were trained to peck at symbols of Webding font (3.0 cm × 3.0 cm) (see Supplementary Materials), which appeared at various locations on the monitor. Pigeons were then trained to search for a single odd target among three homogenous distractors (i.e., the four-alternative forced-choice task, see the following paragraph) using Webding font symbols. Target and distractor stimuli were predetermined and no target stimuli were used as distractors, and vice versa. The crows had experienced numerous cognitive tasks including ones in another operant conditioning chamber. The chamber used in past studies had three circular translucent keys, and pecks to them were trained. Thus, initial training was not required in the present study.

Four-alternative forced-choice task

Birds performed a four-alternative forced-choice (4AFC) task. Each trial started with a peck to the start signal in the center of the monitor. This was replaced by one of the stimulus displays described above. Three consecutive pecks to the surface of the monitor within 3.0 cm from the center of each stimulus were considered to be a choice response and resulted in the stimulus display being removed from the monitor. The time elapsed from the appearance of the stimulus display to the last of the three consecutive pecks was recorded as the response time. Pecks outside of these areas were ignored. When birds correctly selected the target, they were rewarded by food, but when they chose the distractor they were given a 5-s timeout. The probability of primary reinforcement (food) was 100% for pigeons and 70% for crows. Correct responses not followed by food were reinforced by a secondary reinforcement (3-s feeder light illumination). Because the size of food pellets for crows was relatively large, we used a partial reinforcement schedule for crows in attempt to increase the number of trials can be completed in daily sessions. When birds failed to make a choice within 25 s from the appearance of the stimulus display, the trial was terminated and considered as an incorrect choice. When birds made incorrect choices, the same trial was repeated by removing the chosen distractors. This correction procedure was used to prevent birds from developing response bias to a specific stimulus or location. The consequences of the choices in these correction trials were the same as those in the regular trials. Correction trials were excluded from the subsequent analysis. Trials were separated by a 5-s intertrial interval (ITI). For pigeons, each session consisted of 120 trials: 5 × 3 × 4 × 2 (stimulus inclinations × context conditions × target locations × cycles). For crows, each session consisted of 60 trials: 5 × 3 × 4 × 1 (stimulus inclinations × context conditions × target locations × cycles). Testing was normally conducted once per day, 6 days per week. Each bird was tested for 20 sessions after attaining overall accuracy of more than 80% with the positive-slope target. Then, they were trained with negative-slope target until attaining overall accuracy of 80%. Tests were continued for further 20 sessions. In total, we conducted 40 test sessions.

Results

Acquisition

Figure 2 shows the proportion of correct responses during the first 30 sessions as a function of two-session blocks. A linear mixed model with context and session as fixed factors and subject as a random factor was performed to examine the significance of the trends for each species separately. All statistical tests of the data in this paper were evaluated using an alpha level of 0.05. Both pigeons and crows readily learned the target localization, showing a significant effects of session block (pigeons: F[14, 88] = 47.942, p < .001; crows: F[14, 88)] = 19.957, p < .001). Overall, context type affected the learning rate, showing a significant main effect of context (pigeons: F[2, 88] = 187.600, p < .001; crows: F[2, 88] = 33.040, p < .001), but the interaction of the two was significant only in pigeons (F[28, 88] = 4.878, p < .001). Post hoc analysis, using the Bonferroni correction, confirmed that both pigeons and crows learned target localization more slowly under the incongruent-context condition than under the no-context and congruent conditions (ps < .001). The two species also learned target localization more slowly under the congruent-context condition than under the no-context condition, but the difference was significant only in pigeons (p < .001).

Fig. 2
figure 2

Proportion correct during the first 30 sessions under each context condition in Experiment 1 for pigeons (a) and crows (b). Thick lines show means of three birds and thin lines show individuals

Test

After attaining the performance criteria, birds were tested for 20 sessions with positively inclined targets and another 20 sessions with negatively inclined targets. No apparent difference was found between the two target types and the data were merged together in subsequent analysis. Figure 3 shows mean error rate and harmonic mean of the correct response times under each condition for both pigeons and crows. Individual data are presented in the Supplementary Materials. The target type (positive and negative slopes) was included as a factor in the subsequent analysis. However, because the effect of target type was not our primary interest, the data from the two conditions were merged in preparing the figure. To perform our statistical analysis, the data were averaged across sessions by target type, context condition, and stimulus inclination for each bird. The significance of the trends was assessed by a linear mixed model, with target type, context, and stimulus inclination as fixed factors and subject as a random factor.

Fig. 3
figure 3

Percent error under each context condition across five stimulus inclinations in Experiment 1 for pigeons (a) and crows (b) and correct response times for pigeons (c) and crows (d). Data from each session are shown in a separate vertical column. The line indicates the median, the box shows the interquartile range (IQR), and the whiskers are 1.5× IQR

During testing, the overall error rate was 6.1% for pigeons and 5.5% for crows. Context type affected the error rate, resulting in context having a significant effect (pigeons: F[2, 58] = 21.376, p < .001; crows: F[2, 58] = 31.703, p < .001). Post hoc analysis indicated that both pigeons and crows made more errors in the U-shaped context condition than in the other two context conditions, (ps < .001). However, error rates were not significantly different between the no-context and L-shaped context conditions in either species. A linear mixed model also revealed that stimulus inclination had a significant effect (pigeons: F[4, 58] = 31.407, p < .001; crows: F[4, 58] = 9.109, p < .001). Concerning target types, both pigeons and crows made more errors with positively sloped targets than negatively sloped targets (pigeons: positive slopes = 7.5 %, negative slopes = 4.6 %; crows: positive slopes = 7.7 %, negative slopes = 5.2 %), resulting in target type having a significant effect (pigeons: F[1, 58] = 21.376, p < .001; crows: F[1, 58] = 5.542, p = .022). No interactions were significant in crows. In pigeons, there were the following significant interactions: (1) target type and context (F(2, 58) = 22.361, p < .001); (2) target type and stimulus inclination (F(4, 58) = 7.730, p < .001); (3) context and stimulus inclination (F(8, 58) = 3.644, p = .002); and (4) a triple interaction of target type and context and stimulus inclination (F(8, 58) = 7.367, p < .001). Interactions with target type were caused by a higher percentage of errors with positive-slope targets than with negative-slope ones. The interaction of context and stimulus inclination was caused by higher error rates at the inclination of 90° and 45° than other orientations.

Response time, in general, correlated positively with error rate, as can be seen in Fig. 3. Both pigeons and crows located the target fastest in the no-context condition and slowest in the U-shaped context condition. The same linear mixed model that was applied to the error rate data was performed to examine the significance of the trends. The results revealed a significant main effect of context (pigeons: F[2, 58] = 34.583, p < .001; crows: F[2, 58] = 63.483, p < .001). Post hoc analysis revealed that pigeons located the target significantly faster in the no-context condition than in the L-shaped context condition (p < .001), but the same trend in crows was not significant (p = .061). Both pigeons and crows, however, located the target faster in the no-context condition than in the U-shaped context condition (ps < .001).

There were other differences in the response-time data between the two species. Pigeons located negative slopes faster than positive slopes (positive slopes = 1,088 ms, negative slopes = 994 ms), resulting in a significant main effect of target type (F[1, 58] = 24.121, p < .001). Crows located the target slowest when the stimulus inclination was at 45 degrees from horizontal, resulting in a significant main effect of inclination (F[2, 58] = 11.280, p < .001). Crows also showed a significant target type and context interaction (F[2, 58] = 11.280, p < .001). No other main effects and interactions were significant.

Discussion

Overall, the effect of context was consistent between pigeons and crows. The addition of both L-shaped and U-shaped contexts disrupted acquisition of target location, increased error rates, and slowed down target location. The disruption was greater with U-shaped context than with L-shaped context. U-shaped context targets and distractors shared more parts than U-shaped context, implying that birds added up parts to recognize wholes. These disruptive effects of redundant contexts were consistent with the configural inferiority effect previously reported in pigeons (Donis & Heinemann, 1993; Donis et al., 2005; Kelly & Cook, 2003).

The results of the present study were in stark contrast with our findings for humans and chimpanzees (Goto et al., 2012). First and foremost, both humans and chimpanzees made fewer errors and located targets faster in the L-shaped-context condition than in the no-context condition, showing the configural superiority effect as opposed to the configural inferiority effect in birds. These results suggest that humans and chimpanzees perceive emergent configurations that birds may not.

Second, the effect of stimulus inclination differed between birds and primates. In our previous study (Goto et al., 2012), we found that humans and chimpanzees showed higher error rates and slower target localization as the stimulus inclination approached diagonal in the no-context condition, confirming the oblique effect. The presence of the L-shaped context enabled primates to locate target almost instantly regardless the stimulus inclination. This context and stimulus inclination interaction was not observed in either pigeons or crows.

Experiment 2

Experiment 2 examined whether birds continue to show the configural inferiority effect with other geometrical figures for which humans and chimpanzees showed the configural superiority effect (Goto et al., 2012). We tested four stimulus sets: two sets were two-dimensional (2D) figures, while two could be perceived as three-dimensional (3D) figures within the congruent contexts (Enns & Rensink, 1991; Weisstein & Harris, 1974). Although both humans and chimpanzees showed the configural superiority effects only with the 2D stimulus sets, we used the 3D stimulus sets in the present study in order to be consistent with our previous study.

Methods

Stimuli

Stimulus sets consisted of four patterns with three conditions (Fig. 4). Two stimulus sets ( < / > ; ( / ) ) could be perceived as 2D figures, whereas the other two sets ( Y / inverted Y ; L / reversed L) could be perceived as 3D figures such as a cube or a cone when presented within a congruent context. The orientation of figures were fixed so that, for example, “<” or “(“ always pointed leftward as can be seen in Fig. 4. In both humans and chimpanzees, perceptual discriminability was enhanced when congruent contexts were added to the 2D sets, whereas it was disrupted when any type of context added was added to the 3D sets. Stimuli were presented at 3.0 cm × 3.0 cm in size and 6.0 cm apart each other.

Fig. 4
figure 4

Stimulus sets used in Experiment 2. Each display consisted of one target and three identical distractors. Each stimulus set consisted of three context conditions. The two stimulus sets on the left, indicated with “<” and “(”, could be perceived as 2D figures, whereas the two stimulus sets on the right, indicated with “Y” and “L”, could be perceived as 3D figures within the congruent context. In humans and chimpanzees, the discriminability of target and distractors with 2D sets is enhanced when presented within a congruent context, but not with 3D sets (Goto et al., 2012)

Procedure

The same 4AFC task used in Experiments 1 was used here. Each pigeon session consisted of 96 trials (3 × 4 × 4 × 2 [context conditions × stimulus sets × target positions × cycles]), and each crow session consisted of 48 trials (3 × 4 × 4 × 1 [context conditions × stimulus sets × target positions × cycle]). Because target detection for the 3D stimulus sets was difficult, especially for crows, the same training criteria as those in Experiment 1 could not be met. We therefore started testing when birds attained an accuracy of 75% for pigeons and 70% for crows. Testing was then conducted for 20 sessions. Target and distractor stimuli were interchanged one another after that, and testing was continued for a further 20 sessions after birds reached the same accuracy criteria as earlier sessions. In total, we conducted 40 test sessions.

Results

Acquisition

Figure 5 shows the proportion of correct responses during the first 30 sessions as a function of two-session blocks. A linear mixed model with stimulus set, in which context and session were fixed factors and subject was a random factor, was carried out to examine the significance of the trends for each species separately. Both pigeons and crows readily learned target location with 2D sets, but they showed poorer learning with 3D sets. These results were confirmed by session block having a significant main effect (pigeons: F[14, 358] = 68.468, p < .001: crows: F[14, 355] = 13.993, p < .001). Stimulus set was significant (pigeons: F[3, 358] = 378.909, p < .001; crows: F[3, 355] = 229.336, p < .001). Stimulus set and session block interaction was also significant (pigeons: F[42, 358] = 4.584, p < .001; crows: F[42, 355] = 1.762, p = .003). Context also affected the rate of learning, resulting a significant main effect of context (pigeons: F[2, 358] = 145.563, p < .001; crows: F[2, 355] = 11.311, p < .001). In particular, incongruent contexts hampered the learning of 3D stimulus sets, revealed by a significant stimulus set and context interaction (pigeons: F[6, 358] = 39.962, p < .001; crows: F[6, 355] = 10.813, p < .001). No other interaction was significant in crows. In pigeons there was a significant context and session block interaction (F[28, 358] = 4.379, p < .001), and a significant stimulus set and context and session block interaction (F[84, 358] = 1.338, p = .037).

Fig. 5
figure 5

Proportion correct during the first 30 sessions under each context condition in Experiment 2 for pigeons (a) and crows (b). Thick lines show means of three birds and thin lines show individuals

Test

After attaining the performance criteria, birds were tested for 20 sessions. Then, the target and distractor stimuli were interchanged one another. Birds were tested for a further 20 sessions after attaining the performance criteria: the reversal of the target and distractor stimuli was included as a factor (hereafter: target type) in the subsequent analysis. However, because the effect of target type was not our primary interest, the data from the two target types were merged into one in preparing the figure showing the test performance.

In order to perform statistical analyses, the data were averaged across sessions by target types, stimulus sets, and context for each bird. The significance of the trends was assessed by a linear mixed model, with target type, stimulus set, and context as fixed factors and subject as a random factor.

Figure 6a, b shows mean error rate under each condition for both pigeons and crows. Individual data was presented in the Supplementary Materials. During testing, the overall error rate was 12.3% for pigeons and 22.7% for crows. In both pigeons and crows, error rates were much higher with the 3D sets than with the 2D sets, resulting a significant main effect of stimulus set (pigeons: F[3, 46] = 40.923, p < .001; crows: F[3, 46] = 170.415, p < .001). Types of context also affected the error rate, resulting a significant effect of context (pigeons: F[2, 46] = 8.376, p = .001; crows: F[2, 46] = 5.563, p = .007). Post hoc analysis further revealed that pigeons made significantly more errors in both congruent and incongruent context conditions compared the no-context condition (ps < .001). In contrast, crows made significantly more errors in the incongruent context condition than in the no-context condition (p = .005), but no difference of error rate was found between the no-context condition and the congruent context condition. Other aspects of the results were not consistent between pigeons and crows. The linear mixed model revealed a significant stimulus set and context interaction for crows (F[6, 46] = 4.467, p < .001), but this difference was not significant for pigeons. Concerning the target type, pigeons made significantly more errors before than after the reversal of target and distractor stimuli (before = 16.6%, after= 10.1%), resulting a significant main effect of target type (F[1, 46] = 9.604, p = .003), but this difference was not significant for crows (before = 24.4%, after = 23.3%). Both pigeons and crows showed a significant target type and stimulus set interaction (pigeons: F[3, 46] = 5.395, p = .003; crows: F[3, 46] = 2.893, p = .045). No other interaction was significant.

Fig. 6
figure 6

Percent error under each context condition in four stimulus sets in Experiment 2 for pigeons (a) and crows (b) and correct response times for pigeons (c) and crows (d). Data from each session are shown in a separate vertical column. The line indicates the median, the box shows the interquartile range (IQR), and the whiskers are 1.5× IQR

Figure 6c, d shows the harmonic mean of the correct response times in each condition separately for pigeons and crows. Response time, in general, correlated positively with error rate. Both pigeons and crows located the target faster with 2D sets than with 3D sets, resulting a significant main effect of stimulus set (pigeons: F[3, 46] = 121.946, p < .001; crows: F[3, 46] = 6.630, p = .001). Otherwise, there were no significant differences for condition in the crows’ response-time data. Pigeons, on the other hand, located targets fastest under the no-context condition, resulting a significant main effect of context (F[2, 46] = 6.240, p = .004). Post hoc analysis further revealed that pigeons located the target faster in the no-context condition than in the congruent context condition (p = .003). Concerning target type, pigeons located the target faster after the reversal of target and distractor stimuli (before = 1,125 ms, after = 1,073 ms) but the opposite trend was observed in crows (before = 1,174 ms, after = 1,489 ms), resulting a significant main effect of target type (pigeons: F[1, 46] = 4.503, p = .039; crows: F[1, 46] = 92.623, p < .001). In pigeons, there were also significant interactions of target type and stimulus set (F[3, 46] = 21.946, p < .001), and target type and stimulus set and context (F[6, 46] = 2.537, p = .033). No other interaction was significant.

Discussion

In general, this experiment further confirmed the findings of Experiment 1 and showed that both congruent and incongruent contexts disrupted target localization, at least in pigeons. In crows, these effects of context were not statistically significant and no facilitation effect of redundant context on target localization was observed. Importantly, the absence of facilitation effects of context with the 2D stimulus sets again were different compared to the findings for humans and chimpanzees (Goto et al. 2012). With the 2D stimulus sets, congruent contexts facilitated target localization in humans and chimpanzees, while incongruent context disrupted it in these two species.

General discussion

The examination of redundant contexts on pattern discrimination is an important means of revealing the mechanisms of form perception. The present study, together with our previous study on chimpanzees and humans (Goto et al., 2012), confirmed the findings of Kelly and Cook (2003), namely, that the effect of redundant context on pattern discrimination is different in pigeons and humans. Importantly, we have built on their findings to show that redundant contextual information is processed in fundamentally different ways in hominids (humans and chimpanzees) and birds (pigeons and crows). In humans and chimpanzees, adding a redundant context sometimes facilitated the discrimination of patterns such as line orientation, curves, and brackets (Goto et al., 2012). In contrast, the present study revealed that the same contexts did not facilitate pattern discrimination in neither pigeons nor crows. Rather, they disrupted the discrimination in pigeons.

Despite the lack of the configural superiority effect, birds still discriminated the 2D figures easier than the 3D figures. We speculated that openness and closure might become effective cues for the discrimination in birds. In the 2D figures, targets from distractors could be discriminated using those cues when redundant contexts were added. In contrast, the addition of redundant contexts only generates closures with the 3D figures.

Even though we used experimental stimuli and procedures that were as closely matched as possible to those of our chimpanzee study (Goto et al., 2012), the difference of observation distance between the two studies resulted in the different relative size of stimuli in terms of visual angle. The smaller the stimulus size is, the easier it might be to attend to configural properties (Goto et al., 2004). We, however, do not think that birds failed to see the emergent configurations due to the relatively large stimulus size. Kelly and Cook (2003) manipulated the stimulus size but they found the contextual effects remained unchanged across the stimulus size. Furthermore, Donis and Heinemann (1993), who originally reported the disruptive effects of redundant context in pigeons, presented the stimuli that were about nine times smaller in size than those used in the present study, implying that the contextual effects were relatively size-independent.

The discrepancies in visual perception between pigeons and humans are often discussed in the framework of the relative dominance of global and local information (Cerella, 1986; Cook, 2001; Lea et al., 2006). One of the popular ways to examine the relative importance of global and local information is the discrimination of hierarchically constructed figures in two kinds of configurations (e.g., S or H presented in both small and large sizes). These hierarchical stimuli contained essentially the same information at both the global (i.e., large figures) and local (i.e., small figures) levels. Whereas humans showed a greater ability to discriminate global over local figures (Navon, 1977), pigeons showed local advantages when discriminating similar stimuli (Cavoto & Cook, 2001). However, many non-human primates showed local advantages similar to those observed in pigeons (baboons: Deruelle & Fagot, 1998, Fagot & Deruelle, 1997; capuchin monkeys: Spinozzi et al., 2006; chimpanzees: Fagot & Tomonaga, 1999; rhesus monkeys: Hopkins & Washburn, 2002), indicating that the local advantages in discriminating hierarchically constructed stimuli were due to the fact that non-human animals are in general poor at perceptually grouping disconnected parts. Indeed, Fagot and Tomonaga (1999) demonstrated that chimpanzees came to show a global advantage when adjacent local figures were connected by line segments. In contrast, in the perceptual process that we focused on in this present study and our previous study on chimpanzees (Goto et al., 2012), configuration is perceived as an emergent property. Such emergent properties highlight the differences between primates and birds better than those using hierarchical stimuli.

Discrepancies between pigeons and humans were also found in regard to global structures that emerged from randomized dot arrays (Kelly et al., 2001; Qadri & Cook, 2016). Humans detect circular or radial Glass patterns through random noise more easily than translational spiral patterns, whereas both pigeons and starlings showed no such difference in detectability among different structured pattern. Apparent lack of amodal completion by pigeons under various circumstances further supports the idea that pigeons, unlike humans, rely on local cues than global cues (Fujita & Ushitani, 2005; Sekuler et al., 1996; Ushitani & Fujita, 2005). Pigeons, however, are capable of detecting some other structures among random noise (Cook et al., 2005), or motion emerged from point-light biological motion (Dittrich et al., 1998), although they do not seem to learn the connection the correspondence between the point-light displays and fully detailed movies (Qadri et al., 2014; Yamamoto et al., 2015).

Given the fact that the results in general were similar between pigeons and crows, the difference in perception of emergent configurations between hominids and birds are likely due to fundamental differences in neural mechanisms between hominids and birds. One such difference is in the retinotectal feedback system. Many vertebrates possess centrifugal innervations from the brain to the retina, but this feedback system is absent in humans (Uchiyama, 1989). The isthmo-optic nucleus in the avian caudal midbrain receives retinotopically organized outputs from the optic tectum, and then the projections are sent to back to retinal neurons, preserving their topographic organization. Lesions in the isthmo-optic nucleus cause deficits in the detection of visual stimuli and visual search in shaded areas (Rogers & Miles, 1972), but not in bright areas or in pattern discriminations (Hodos & Karten, 1974). When neurons of the isthmo-optic nucleus are electrically stimulated, an enhancement of ganglion cell response occurs in a specific region of the retina (Miles, 1972). Ohno and Uchiyama (2009) suggested that the centrifugal system in birds is involved in guiding spatial attention, that is, a specific area in the visual field is brought to attention or “spotlighted” by the activity of the isthmo-optic nucleus. In rhesus macaques, similar or equivalent mechanisms of spatial attention have been suggested but the regions associated with spatial attention are in general found in the collothalamic visual system, such as the superior colliculus (Boehnke & Munoz, 2008; Kustov & Robinson, 1996), pulvinar (Robinson, 1993), and posterior parietal cortex (Posner & Petersen, 1990). Thus, some of the “guidance” of visual attention takes place at the retinal level in birds, resulting in the lack of a Gestalt experience, whereas it is carried out in higher brain structures in primates, resulting in a Gestalt experience.

Second, the roles of the two visual pathways may be different between birds and primates. In both primates and birds, information from the eyes is conveyed to higher brain regions through two distinct visual routes: the lemnothalamic and collothalamic pathways. The former is the primary route for visual processing in primates, whereas the latter is the primary route for birds. Compared to the relatively small superior colliculus in the primate brain, the avian tectum is a massive structure, containing over 15 layers of neurons that receive input from the retinal ganglion cells (Shimizu et al., 2008). The tectum sends visual information to the nucleus rotundus, the thalamic center of the avian collothalamic pathway, which is comparable to the pulvinar in mammals. The projection from the tectum to the nucleus rotundus is not topographically organized in birds and the neurons have extremely large receptive fields (i.e., over 100° in visual angle; Revzin, 1970), responding to any objects in the receptive field and thus guiding attention. In birds, the telencephalic target of the projections from the nucleus rotundus is a large cluster of cells called the entopallium, which in turn sends projections to several motor and “association” areas in the telencephalon, including the basal ganglia and polysensory regions (Husband & Shimizu, 1999). Because the entopallium receives topographically organized projections from different subdivisions of the nucleus rotundus, different parts of the entopallium are also involved in the analysis of different types of information. Entopallial lesions cause elevated discrimination thresholds for luminous intensity (Hodos et al., 1984; Hodos et al., 1988) and pattern discrimination (Bessette & Hodos, 1989; Hodos & Karten, 1970). However, there are also results showing that such lesions cause no or mild deficits in discrimination of color brightness (Bessette & Hodos, 1989; Hodos & Karten, 1970) and patterns (Watanabe, 1991). These results suggest the entopallium plays a pivotal role in visual processing in birds.

Third, and finally, the differences in perception between primates and birds may be explained by differences in their pallium structures. The primate visual cortex (pallium) has six layers of neurons organized in a column, whereas the avian entopallium (pallium) consists of clusters of cells. The primate columnar structure may be an efficient system for lateral inhibition and rich cortico-cortical connection in the cortex, from which Gestalt experience may arise. Indeed, single-unit physiology of primate implicates area V4 for configural processing (Wilson & Wilkinson, 1998). In contrast, the cluster organization of the avian visual system may yield advantages in local processing but disadvantages in processing emergent Gestalt properties. Recent neural connection studies, however, have a found column-like structure with intrinsic connections in both the auditory (Butler et al., 2011) and visual (Shimizu et al., 2010) telencephalon in birds. The mammalian pallium is clearly separated into gray matter and white matter and the mammalian column structures are composed of thin laminated gray matter. Although the avian pallium does not have a clear spatial separation of gray matter and white matter, the column-like structure extends into the telencephalon. These structural differences in the pallium, as well as the differences in the retino-tectal pathways in the telencephalon, likely account for the differences in visual processing between the primates and birds.

Although vision is the primary sensory systems for many vertebrates, comparative studies on the phylogenetic differences among species are relatively scarce. The present study highlights the phylogenetic differences between birds and primates with regard to the perception of emergent Gestalt properties. Exploring the phylogenic origin of Gestalt perception and its neural substrates remains an important theoretical and empirical theme for comparative studies in visual perception.

Author Note

This work was funded by a Grant-in-Aid for Scientific Research (Grant 22700271) from the Ministry of Education, Culture, Sports, Science and Technology of Japan to KG and the Global COE Program (D029) of Keio University. The authors thank Ei-Ichi Izawa for his support in the care and maintenance of the crows in the laboratory and Toru Shimizu for his thoughtful comments on the manuscript.

Open Practices Statement

Data and stimuli are available at Open Science Framework https://osf.io/7zxaj/. None of the experiments was preregistered.