Introduction

Ensemble perception refers to the visual system’s ability to statistically summarize various lower-level and higher-level features in a rapid and accurate manner (for a review, see Whitney & Yamanashi Leib, 2018). Previous research has shown that viewers can efficiently report the average of various low-level features such as orientation (Dakin, 2001; Parkes, Lund, Angelucci, Solomon, & Morgan, 2001), brightness (Bauer, 2009), size (Ariely, 2001; Yildirim, Öğreden, & Boduroglu, 2018), color (Maule, Witzel, & Franklin, 2014), position (Alvarez & Oliva, 2008; Boduroglu & Yildirim, 2020), as well as higher-level features like facial identity (de Fockert & Wolfenstein, 2009) and facial emotion (Haberman & Whitney, 2007). This averaging ability is supported by independent, multi-level feature-specific statistical summary mechanisms (e.g., Haberman, Brady, & Alvarez, 2015; Yörük & Boduroglu, 2020) and is believed to serve a number of functions. One, it is argued that ensemble perception complements the foveal high-resolution representation, allowing viewers to experience the world in a holistic and detail-rich manner (Cohen, Dennett, & Kanwisher, 2016). Two, ensemble representations have been shown to contribute to the formation of individual object representations (Brady & Alvarez, 2011; Brady & Tenenbaum, 2013; Oriet & Brand, 2013; Mutluturk & Boduroglu, 2014; Yildirim et al., 2018). The visual system can also extract the variance of lower-level features such as color (Maule & Franklin, 2020; Ward, Bear, & Scholl, 2016), luminance (Tong, Ji, Chen, & Fu, 2015), orientation (Morgan, Chubb, & Solomon, 2008; Norman, Heywood, & Kentridge, 2015; Jeong & Chong, 2020), size (Solomon, Morgan, & Chubb, 2011, Semizer & Boduroglu, under review, this issue; Tokita, Ueda, & Ishiguchi, 2016), as well as of higher-level features such as facial expression (Haberman, Lee, & Whitney, 2015), gender, and race (Phillips, Slepian, & Hughes, 2018); viewers can also represent the range of size, orientation, and brightness of a set (Ariely, 2001; Khayat & Hochstein, 2018). The efficient representations of central tendency and variability are believed to work in a complementary fashion to serve a third, less investigated function of ensemble perception. Specifically, mean and variance information are likely to contribute to the detection and representation of outliers (Alvarez, 2011; Haberman & Whitney, 2012). The goal of this research is to directly investigate the contribution of ensemble perception on outlier representation precision. Specifically, we test whether outlier precision changes as a function of its deviation from the broader ensemble.

The visual attention literature has provided considerable evidence on the visual system’s efficiency in detecting outliers. Building on the seminal work by Treisman and Gelade (1980), numerous studies have replicated the visual pop-out phenomena, in which an item with a unique feature is rapidly detected regardless of the number of distractors (Wolfe, 1994; but also see Duncan & Humphreys, 1989); such visual pop-out effects have been reported even with samples as young as 3-month-old infants (Adler & Orprecio, 2006; Colombo, Ryther, Frick, & Gifford, 1995). Rosenholtz (1999) argued that the visual system encodes any significant deviation from the local distribution of features (motion, color, or orientation), which in turn helps tag some items as salient and unusual. Specifically, she reported that when the target was further away from the mean of the non-identical distractors, it was detected faster. Rosenholtz and colleagues further demonstrated that the successful identification of a target depended on the utilization of a statistical summary of the information presented in peripheral vision (Rosenholtz, Huang, Raj, Balas, & Ilie, 2012). Since both set summaries and set range information can be concurrently and implicitly extracted (e.g., Dakin & Watt, 1997; Khayat & Hochstein, 2018; Solomon, 2010), it is no surprise that these statistical summary mechanisms can contribute to the detection of outliers. For instance, Hochstein, Pavlovskaya, Bonneh, and Soroker (2018) reported that outlier detection depended on the distance from the set edge, with outlier detection efficiency increasing with increased distance from the set edge. They noted that for items to be tagged as outliers, their feature value not only needed to be outside the range of the remaining items of a given array but they also needed to differ a certain amount from the range edge of these items. More recently, Cant and Xu (2020) argued that outlier processing may be obligatory, based on findings that even under conditions instructed to ignore outliers during tasks involving early perceptual processing, viewers were not able to discount them (for effects of outlier discounting during summarizing facial expressions, see Haberman & Whitney, 2010).

Even though there has been an increase in the number of studies investigating the link between ensemble perception and outlier processing, these studies have typically focused on outlier detection efficiency and they have not addressed issues regarding outlier precision. Therefore, in this study, we specifically investigated whether ensemble perception contributed to the outlier item’s precision. Since items that are more distinct are more likely to be tagged as outliers during early phases of ensemble perception (e.g., Cant & Xu, 2020; Hochstein et al., 2018), it is possible that these items are excluded from the broader ensemble and independently represented. In other words, ensemble mechanisms and outlier processing may operate in parallel. This possibility is in line with previous work from our lab in which we demonstrated that there was no cost to outlier precision when participants had to concurrently extract mean and outlier size. In that study, we also had shown that the precision of the same size item was significantly higher when the item had outlier status as opposed to when it was a member of a set; outlier status was a function of the featural distance to the size distribution of the remaining items (Yildirim & Boduroglu, 2015).

To test the impact of outlier distinctiveness on representational precision, across two experiments we asked participants to report the orientation of a spatial outlier presented in a display consisting of eight lines (see Fig. 1). In half of the trials, the orientation of the spatial outlier was similar to that of the remaining set (less distinct outlier) and in the other half, the spatial outlier’s orientation distinctly varied from those items in the remaining set (more distinct outlier). Critically, to ensure that participants were engaging ensemble perception mechanisms, we asked participants to determine outlier orientation while they were also required to attend to the entire display; in both experiments, on a random subset of the trials, participants had to report the mean orientation of the non-outlier lines. Critically, in Experiment 1, these trials constituted only 20% of the trials acting as catch trials; in Experiment 2, outlier and group trials were intermixed but evenly split across conditions. For both outlier and local mean orientation trials, participants were asked to rotate the orientation of a response line to match the probed target. Our primary prediction was that participants would have less error in reporting the orientation of the more distinct than the less distinct outlier. We also explored how the presence of a more or less distinct spatial outlier impacted the representation of the remaining set of items and the interaction between outlier and ensemble representations, by considering potential sources of bias in responses.

Fig. 1
figure 1

The timeline of experimental trials and schematic representation of the two outlier conditions. The top and bottom displays represent more distinct and less distinct outlier conditions, respectively. As can be seen, the average orientation of the group and the spatial outlier orientation are more similar in the bottom display, but they are approximately perpendicular in the top display

Experiment 1

Method

Participants

Fifty undergraduate studentsFootnote 1 from Boğaziçi University participated in the study in exchange for course credit. Data from one participant was excluded because his/her accuracy was below chanceFootnote 2 and data from two other participants were excluded because they did not respond in more than 10% of the trials. Analyses were run on the data from the remaining 47 participants (19 female, Mage = 20.38 years; SDage=1.19). All participants had normal or corrected-to-normal vision, had no auditory deficits, and were native Turkish speakers. The sample size was determined a priori by making a power analysis in G*Power3 (Faul et al., 2007, 2009) by taking reference of the smallest effect size in Yildirim and Boduroglu (2016): power analysis for a medium effect size (d= .58) and an alpha of .05, to test a two-tailed t-test showed the required sample size was 41 participants. To account for potential participant loss, we collected data from 50 participants.

Materials and stimuli

In this experiment, participants studied a display that consisted of eight lines. In all displays, seven of the lines were placed closely to constitute a perceptual group, and the eighth line was positioned separately from the main set, hence making it a spatial outlier. In the majority of the trials (80%) participants reported the orientation of the spatial outlier; in the remaining trials, they reported the average orientation of the group (group trials). These latter set of trials were included to ensure that participants would not selectively attend only to the spatial outlier. Critically, in half of the spatial outlier trials (more distinct outlier), the outlier’s orientation differed from the endpoints of the group orientation range by 90° ± 3°; in the remaining trials, the orientation of this spatial outlier differed from the group orientation range by 15° ± 3°. The more and less distinct outlier trials were randomly intermixed with the group trials. Participants completed 240 spatial outlier and 60 group trials, evenly distributed across two blocks and across outlier type conditions.

Each trial (see Fig. 1) started with a gray blank screen that appeared for 1,000 ms, which was followed by a fixation point (+) that was shown for 500 ms. Then, the line display was presented for 250 ms, followed by a black and white noise mask that appeared for 1,000 ms to prevent stimulus persistence. Finally, the response screen remained on until a response was submitted. With the onset of the response screen, participants heard an auditory cue indicating whether they were supposed to report the orientation of the single line (“tek” in Turkish/single) or the mean orientation of the group (“grup” in Turkish/group); we used a post-cue to encourage participants to attend both to the group and the spatial outlier during encoding. On the response screen, there was a black line (100-pixel), the orientation of which was pseudo-randomly determined with the constraint that it would not have the same orientation as any of the lines, the local mean and global mean of the display. Participants adjusted the line’s orientation by holding the left mouse key and registered their responses with the right mouse key. The line stayed the same length during the orientation adjustment phase. The response screen stayed until a response was made, if the participant did not adjust the orientation and just clicked the right button no response was recorded and the next trial began.

All lines were of identical length (100 pixels, 2.74° visual angle) and were rotated from the center along the horizontal axis to create sets with unique orientations. In each display, the orientations of the lines were at least 3° different from one another and none of the lines were the same orientation as the local mean (the mean orientation of the group lines) or the global mean (the mean orientation of all the lines in the display). The orientation of less distinct outliers deviated 15° ± 3° from the group lines’ range edge to ensure all the lines were facing the same approximate direction but were sufficiently different that reporting the local or global mean would not yield an almost correct response. More distinct outliers deviated 90° ± 3° from the group line’s edge to ensure the outlier was facing another direction than the group lines. Thus, the less and more distinct outliers differed from the local mean by an average of 30° (ranging from 25° to 37°) and 105° (ranging from 98° to 111°), respectively. In any display, the possible line orientations for the group lines were selected from ranges of 8–52° and 98–142°, with the constraint that there was a maximum of 30° difference among the group lines. The orientation of the outlier lines varied between 37–70 ° and 125–160°.

All the lines were individual .png files generated with Visual Studio 2017. These image files were used in E-Prime to form displays. In a display, the lines were positioned in a 760 × 608-pixel space at the center of the 1,024 × 768-pixel screen. The outlier was always located in the foveal area, within a 10° visual angle to ensure that performance was not affected by the precision loss in peripheral vision. The group lines were positioned at the corners of the display, with some of them in the foveal and the rest in the peripheral area. The group lines were always closer to each other, while the spatial outlier was located diagonally to the other lines to ensure the largest spatial separation (approximately 270–300 pixels). These restrictions led to four outlier-group line position combinations. These combinations were counterbalanced and were randomly presented throughout the experiment to reduce the predictability of outlier and group positions. Within each display, the group lines were placed on an invisible matrix with eight possible positions; because there were seven group lines in each display, one of the positions was randomly left blank in each trial. None of the lines were positioned at the center of the screen, with a minimum of approximately 6° radius circular blank area around the fixation point.

The experiment was programmed and run in E-prime 2.0 (Psychology Software Tools, 2012), on 17-in. monitors with screen resolution set to 1,024 × 768 pixels. Two audio files (.wav), the words “tek” (“single” in Turkish, 496 ms) and “grup” (“group” in Turkish, 498 ms) were recorded and edited as auditory cues for the tasks on Praat (Boersma & Weenink, 2020).

Procedure

Before the experiment, participants signed an informed consent form and filled a demographics survey that included questions of age, sex, rightleft handedness, department, native-language, and vision and auditory problems. In each session, a maximum of three participants were in separate cubicles. Participants sat approximately 5560 cm away from the computer screen. The experimenter read the instructions as the participants followed them from their screens. Prior to the start of the experiment, participants completed familiarization and training blocks for both types of trials. During the familiarization trials, participants were introduced to the conditions (four trials per task). Then they completed ten training trials (eight outlier task and two local mean task trials in random order) and received feedback. As feedback, we presented them with an image in which their response was superimposed on the actual target orientation. Then they completed 300 experimental trials. These trials were randomly generated for each participant. Following completion, participants completed a post-test form to provide strategy information and general feedback regarding the experiment. The whole session took approximately 35 min.

Results and discussion

For each response E-Prime 2.0 recorded the coordinates of the point where the participant dropped the left click. Using these coordinates along with the central coordinates of the screen where the response probe was placed, we determined the polar coordinates of each response, which yielded the orientation of the adjusted response line, between 0° and 180°. For each response, we then calculated the angular deviation from the correct orientation by computing the absolute acute difference (AAD) from the target orientation, leading to a value between 0° and 90°.

We excluded the trials where no adjustment was made to the response line and when responses were determined faster than 150 ms, leading to the exclusion of approximately .03% of all responses. We also excluded data from two participants because they either did not respond to more than 10% of all the trials or did not respond to 25% of the trials in any one of the conditions and from one participant because their average group trial responses were worse than chance.Footnote 3 For all remaining participants, the AAD distributions were positively skewed. Therefore, for each participant, we calculated the median error per condition. Because the AAD scores were not bidirectional, the AAD scores were not normally distributed. Therefore, we carried out our analyses on log-transformed scores. Given the uneven distribution of trials across outlier and perceptual group trials and given our a priori predictions about outlier trials only, in Experiment 1 we chose to separately analyze the outlier and group trials.

To test our main prediction that AAD would be larger for less distinct as opposed to more distinct outliers, we carried out a paired-samples t-test (two-tailed). As predicted, the AAD in the more distinct condition (M = .90, SD = .11) was significantly lower than in the less distinct condition (M = .98, SD = .13), t (46) = 4.43, p = .0001, 95%CI = [.05, .12], d = .65. This finding suggests that distinct outliers are represented with greater precision. To ensure that participants did not selectively attend to the spatial outlier and actually engaged in ensemble perception, in 20% of the trials we had asked participants to report the average orientation of the perceptual group consisting of the seven items, in other words, the local mean. For these trials, we computed the AAD between the local mean (i.e., the perceptual group’s orientation) and the generated response. When we compared the angular error in the group trials with less distinct and more distinct outliers, we found that the AAD in the more distinct condition (M = 1.15, SD = .17) was significantly lower than in less distinct condition (M = 1.20, SD = .17), t (46) = 2.35, p = .02, 95%CI = [.01, .09], d = .34. In other words, participants not only reported the more distinct outliers with greater precision, they also reported the summary of the set displayed along these distinct outliers with greater precision.

Participants may have represented the average group orientation more accurately when the seven-line group was accompanied by a more distinct outlier than a less distinct outlier, because in the former, the group may have formed a more distinct gestalt, allowing it to be processed as a separate ensemble, excluding the outlier. On the other hand, when displays had less distinct outliers, this could have resulted in the processing of the whole display as one ensemble. In other words, the less distinct outlier displays could be thought of as a single eight-line ensemble with greater variability compared to those seven-line groups in the highly distinct outlier displays. When there is greater variability in displays, errors in summarizing tend to be larger and more biased towards available representations (Chong & Treisman, 2003; Im & Halberda, 2013; Semizer & Boduroglu, under review, this issue; Utochkin & Tiurina, 2014). If this is indeed the case, there may be a greater bias towards the global mean especially when the group mean is probed for displays with a less distinct outlier. This bias may be more apparent in the group trials as opposed to the outlier trials, especially given the uneven distribution of trials; group trials – in which they are actually supposed to report the local mean – may not have been prioritized as much as the outlier trials, and consequently be more prone to error. It is known that increased uncertainty in a given condition results in responses that tend to show a greater bias towards already available ensemble representations (Brady & Alvarez, 2011; Solomon, 2020).

To test this possibility, we computed a global mean bias score capturing the bias towards the global mean in group trials. First, for each trial in the group condition, we computed the AAD against the correct value (i.e., the local mean) and also against the global mean orientation. By subtracting the AAD against the global mean from the AAD against the local mean, we found whether the response was closer to the local mean or the global mean; this value is referred to as the response difference from here on. However, since the orientation difference between the local mean (correct response) and the global mean values in the displays were different across conditions (orientation difference in less distinct outliers: M = 3.78, SD = .54, range = 35 and more distinct outliers: M = 13.18, SD = .50, range = 12-14), rather than interpreting this response difference directly, we divided the response difference in each trial by the corresponding local-global mean orientation difference of the display. This resulted in bias scores within a [-1; 1] range. If the final global mean bias score was between [0 ;1], this meant a bias towards the global mean. When the score equaled 1, this meant that the responses were closest to the global mean value. Any other positive value indicated that the response was between the local and global mean, and closer to the global mean. And finally, when the global bias was negative, the response was closer to the probed- local mean value (see Fig. A4 in the Online Supplementary Materials (OSM) for a schematic illustration of these possibilities). This global mean bias score does not have a true zero, yet has meaningful ordering and thus classifies as interval data. As long as the assumptions for parametric tests are met, such data can be analyzed using parametric tests. Similar indices that have ordered data and yet no true zero like the Goodman and Kruskal’s gamma coefficient used in metamemory research are typically analyzed using parametric tests (e.g., Schraw, 2009; Schwartz, Boduroglu, & Tekcan, 2016).

On these global mean bias scores, we conducted a paired t-test (two-tailed) and, contrary to our expectations, regardless of the outlier type, global mean bias scores were found to be similar for less distinct (M = .26, SD = .4) and more distinct outliers (M = .22, SD = .29.), t (46) = .96, p = .34, 95%CI= [-.04, .12], d= .14. This suggests that there may be other factors besides the global bias mean that contributed to the difference in errors for group trials across outlier conditions. We also checked to see whether there was any global mean bias in the outlier trials in a post hoc manner. To calculate the global mean bias scores for the outlier trials, outlier orientation was used instead of the local mean. We found that for the less distinct outlier trials (M = -.39, SD = .25), responses were pulled towards the global mean more than in the more distinct outlier trials (M = -.77, SD = .07), t (46) = 10.74, p < .00001, 95%CI = [.31, .45], d = 1.57) (see Fig. 3 for all global mean bias score results).

One possible factor that may have contributed to the observed pattern of results could have been particular strategies employed by our participants. Specifically, we considered three possible strategies as boundary conditions. One of the most likely strategies is that participants reported only the outlier orientation regardless of what was probed, because there were more outlier trials than group trials (outlier strategy). Another possibility was a global mean-based strategy. Given that the global mean is extracted in an effortless manner even when items are perceptually grouped in displays (Yildirim et al., 2018), we thought that participants may have relied on the average orientation of all the lines in the display (global mean strategy). Finally, if during ensemble perception, outliers were discounted, then participants may have exclusively relied on the local group mean (local mean strategy), regardless of the probe. To test these possibilities, we ran simple simulations in which we generated sets of responses mimicking the possible responses from 1,000 participants, using either one of these consistent strategies. Responses were generated by adding ±0–10° jitter to target simulation value, for each display, using the “randbetween” function and the data table tool in excel. For the Outlier Strategy simulation, jitter was added to the outlier’s orientation for each trial; for Global Mean Strategy simulation, it was added to the global mean in the display; and for Local Mean Strategy simulation, it was added to the local mean in the display. The amount of jitter was determined based on the average error in the current data and the average angular error observed in other experiments from our lab using a similar orientation-averaging task (Yörük & Boduroglu, 2020). The 10° jitter was chosen to reflect the lower boundary of errors to reflect the performance of high-accuracy participants.Footnote 4 As can be seen in Fig. 2, none of the strategies yielded a pattern of results completely matching the observed data (Experiment 1); participants’ responses suggested that they were appropriately responding based on task demands, but they were more erroneous than what would have been expected based on an optimal strategy. That is, as can be seen in Fig. 2, performance on the outlier trials were more erroneous than what would have happened if participants had just relied on an outlier-based strategy; the performance on the group trials were worse than what would have been expected if participants relied on a local-mean-based strategy. We compared the results of Experiment 1 to the simulated data by conducting three separate 2 (Data: Experiment 1, Simulation) × 2 (Task: Outlier, Catch) × 2 (Outlier Type: Less Distinct, Distinct) mixed ANOVAs, randomly selecting 47 participants from each simulation. In all three mixed ANOVAs, there was a main effect of the between subject factor, data, all Fs > 76.2, all ps < .0001 (the complete set of results are presented in Tables A2-A4 in the OSM). More critically, the comparison of the results of Experiment 1 and the data from the outlier simulation revealed two interesting points: First, in the actual data, there was a cost of reporting the outlier orientation while concurrently processing the broader ensemble. Errors in the less distinct (M = .98, SD = .13) and the more distinct (M = .90, SD = .11) outlier conditions were higher than those reported in the Outlier Strategy simulation (less distinct outlier: M = .72, SD = .04; more distinct outlier: M = .73, SD =.04), d=.22, t (178.5) = 11.1, pBonferrroni<.0001. Second, had people adopted an outlier-based strategy, then angular errors would have been similar across the two conditions, d= -.01, t (184) = -.50, pBonferrroni= 1. However, in our data, errors were significantly less in the more distinct outlier condition. Thus, we were able to rule out a strategy exclusively based on outlier characteristics.

Fig. 2
figure 2

Average log transformed AAD (absolute acute difference) scores for Experiment 1, Experiment 2, and the simulations reflecting outlier, global-mean, and local-mean based strategies in responding to Experiment 1 trials. Error bars represent the standard error of the mean. The AAD scores represent the smallest angular difference between the participant’s response and the correct response. For the outlier task, the correct response is the outlier’s orientation and for the group task it is the local mean, excluding the outlier. The simulations were run with 1,000 participants responding with a constant strategy with ±0–10° angular deviance. For the outlier, global mean and local mean simulations responses were generated based on the outlier orientation, the global or local mean orientation of the displays, respectively. See Online Supplementary Materials Table A1 for M and SD values

The simulations also ruled out a strategy based on both the global mean and the local mean. In the group trials: error was less in the simulations (Global Mean Strategy: M = .93, SD = .06; Local Mean Strategy: M = .72, SD = .08) than in the actual data (M = 1.18, SD = .17), suggesting that viewers were not able to prioritize a local-mean-based strategy and discount the outliers. As for the global-mean strategy, the simulations revealed that such a strategy would have been inefficient: the trends observed in the simulations indicated larger errors whenever there was a distinct outlier, in both the group and the outlier conditions, F (1,92) = 1187.95, p < .0001, ηp2 = .93.

Our data and the comparison of our pattern of findings with the simulation results altogether suggest that characteristics of outliers impact both outlier and ensemble representation resolution. Well aware that our simulation results were rather simplistic in that they reflected viewers’ exclusive reliance on a single strategy, we wanted to further test the idea that distinct outliers are better represented than less distinct outliers, even when the task design did not favor the prioritization of a subset of trials – in our case, outlier over group. Therefore, in Experiment 2 we investigated whether the effect of outlier distinctiveness persisted even when outlier and group trials were given equal importance by equating trial distribution.

Experiment 2

In Experiment 1, we demonstrated that distinct outliers were represented with higher precision than less distinct outliers. The outlier simulation results ruled out the possibility that participants were only attending to outliers at the expense of attending to the broader ensemble, but it is possible that they may have been relying more on this strategy on a subset of the trials. The particular trial distribution in Experiment 1 (80% outlier trials) might have encouraged participants to adopt such a strategy. In turn, this might have partly contributed to the precision advantage for distinct outliers.

In Experiment 2, we reduced the number of outlier trials and increased the number of group trials to equate their overall number. Our goal was to determine whether the distinct outlier advantage would hold when the trial distribution did not specifically favor outlier processing.

Method

Participants

Forty-seven undergraduate students (28 female; Mage: 21.47 years; SDage: 1.44) from Boğaziçi University participated in the study in exchange for course credit. We excluded two participants as they did not respond to more than 10% of the trials. Data analyses were carried out on the data from the remaining 45 students (27 females; Mage: 21.44 years; SDage: 1.50). Sample size was determined the same way as in Experiment 1.

Materials and stimuli

All aspects of the experiment were identical to Experiment 1 except for the distribution of outlier and group trials. Participants completed 240 experiment trials evenly split between the two conditions randomly intermixed across two blocks, with only the constraint that there were no more than two consecutive trials from the same condition following each other. Trial order was randomly determined for each participant. We also reduced the mask duration to 320 ms, to be more in line with the literature (e.g., Enns & Di Lollo, 2000).

Results and discussion

The dependent variable was calculated and the data were cleaned as in Experiment 1 and the absolute acute difference (AAD) for each condition is presented in Fig. 2. In this experiment, since the distribution of group and outlier trials were equal, we chose to directly compare the error on these trials using a within-subjects ANOVA. To determine the impact of outlier distinctiveness on outlier and local mean precision, we conducted a 2 (Task: Outlier, Group Mean) × 2 (Outlier Type: less distinct, more distinct) within-subjects ANOVA on AAD. AAD was smaller in the outlier (M = .97, SD = .10) than in the group (M = 1.14, SD = .13) task, F (1, 44) = 71.31, p < .0001, ηp2 = .62. Across both tasks, when there was a more distinct outlier in the display (M = 1.02, SD = .11), the AAD was significantly less than that for less distinct outliers (M = 1.09, SD = .10), F (1, 44) = 30.14, p < .0001, ηp2 = .41. More critically, there was a significant task and outlier type interaction, F (1, 44) = 8.74, p = .005, ηp2 = .20. This interaction was due to outlier distinctiveness playing a role in outlier trials but not in group trials. Specifically, in the outlier trials, error was higher in displays with less distinct outliers (M = 1.02, SD = .13) than more distinct outliers (M = .91, SD = .13), t (85.4) = 5.79, pBonferroni< 0.0001. Errors did not vary in the group trials as a function of outlier distinctiveness: less distinct (M = 1.15, SD = .14) and more distinct (M = 1.13, SD = .13), t (85.4) = 1.27, pBonferroni = 1.

As in Experiment 1, we calculated global mean bias scores for both outlier and group trials (see Fig. 3). Even when outlier and group distribution trials was equated, errors in the outlier trials were significantly lower than in the group trials. Given that bias becomes more evident for trials that had larger errors, we expected responses in the outlier compared to the group trials to show less bias towards the global mean; we also expected this difference between trial types to be reduced when displays contained less distinct outliers. Scores between 0 and 1 indicated a bias towards the global mean; bias scores approaching -1 indicated that responses were close to probed values; i.e., the correct response. The (Task: Outlier, Group Mean) × 2 (Outlier Type: less distinct, more distinct) repeated-measures ANOVA on global mean bias scores revealed that there was a main effect of trial type, with outlier trials (M = -.53, SD = .17) showing a weaker bias towards the global mean than in the group (M = .29, SD = .32) trials, F (1, 44) = 276.9, p < .0001, ηp2 =86. Critically, when there was a less distinct outlier (M = .03, SD = .27) in the display, responses were pulled towards the global mean more than when displays had distinct outliers (M = -.27, SD = .21), F (1, 44) = 151.1, p < .0001, ηp2 = .77. These main effects were qualified by a significant task and outlier type interaction, F (1, 44) = 48.4, p < .0001, ηp2 = .52. This interaction was due to the difference between less distinct and more distinct outlier conditions being smaller for the group (d= .16), t (86.1) = 4.79, pBonferroni<.0001, than the outlier trials (d= .45), t (86.1) = 13.86, pBonferroni<.0001. For the distinct outlier responses, the mean global mean bias score approached -1 (M = -.75, SD = .10). This means that responses in most of these trials were closer to the probed outlier’s orientation than the global mean (see Fig. 3). Thus, Experiment 2 demonstrated that the presence of a distinctive outlier in the display impacts the processing of both the outlier and the ensemble, increasing precision of each.

Fig. 3
figure 3

Global mean bias scores in Experiments 1 and 2. Error bars represent SEM

Overall, these results partially replicated the findings of Experiment 1.Footnote 5 First of all, in both experiments distinct outliers were represented with greater precision than less distinct outliers, and in both experiments, we found greater bias towards the global mean when the outlier was less distinct. For the group trials, the pattern across the two experiments was less consistent. While in Experiment 1 error was higher when the display had a less distinct outlier, in Experiment 2, we found similar levels of error across conditions. For group trials, while there was no evidence of a global mean bias as a function of outlier distinctiveness in Experiment 1, in Experiment 2, we observed a significant difference in global mean bias across the two outlier conditions, less distinct (M = .37, SD = .32), more distinct (M = .21, SD = .31), t (86.1) = 4.79, pBonferroni= .00004.

General discussion

The goal of this study was to determine whether outlier distinctiveness impacted outlier processing during ensemble perception of displays. Concurrent processing of mean and variance (and/or range) information is likely to facilitate the rapid detection of outliers (Cant & Xu, 2020; Hochstein et al., 2018). Our findings demonstrated that ensemble perception may contribute not only to the detection but also to the increased precision of outlier representations. We reported that spatial outliers that also varied in orientation from the remaining set of items were represented with greater precision than spatial outliers that were similar in orientation to the other items. Also, when outliers were more distinct, the reported orientation was closer to the probed target value and there was no obvious bias towards the global mean of the display. On the other hand, for less distinct spatial outliers, the reported orientations were pulled towards the global mean more. This is likely due to the fact that the global mean is involuntarily extracted from displays (e.g., Brady & Alvarez, 2011; Mutluturk & Boduroglu, 2014) and utilized when item representations are noisy; the available summaries, albeit being task-irrelevant, bias the responses. Our findings on the impact of outlier distinctiveness on item precision suggest that distinct items may be separately processed from remaining items.

In both experiments, precision was lower for group trials than outlier trials. Critically though, while presence of a distinct outlier increased group precision in Experiment 1, in Experiment 2, there was no difference between group trials based on outlier type. By increasing the number of group trials in Experiment 2, we might have encouraged participants to give equal priority to both group and outlier trials. This might have rendered a local-mean-based strategy a more viable one. When one considers the local-mean-based simulation results (see Fig. 2), one notices that error is at similar levels for the two outlier conditions in the group trials, a pattern observed in Experiment 2. Thus, it might not have mattered what the specific outlier condition was for these group trials. In contrast, in Experiment 1, the group trials were infrequent, making it unlikely that participants adopted a local-mean-based strategy to respond. Instead, in Experiment 1, for the group trials participants might have relied on a readily available summary representation. Research has shown that participants obligatorily extracted the global mean even when it was task-irrelevant and when there were spatially segregated perceptual groups in displays (e.g., Yildirim et al., 2018; for similar findings, also see Brady & Alvarez, 2011, and Solomon, 2020). The global mean bias analyses revealed a very similar pattern across the experiments, except that global mean bias increased for the group trials with less distinctive outliers in Experiment 2. In both experiments, there was greater global mean bias in the group responses than outlier responses. While in Experiment 1 the bias in the group trials did not vary as a function of outlier distinctiveness, in Experiment 2 responses to less distinctive outlier displays were disproportionately pulled towards the global mean bias. When one considers high precision and the minimal global mean bias observed in more distinct outlier trials, it becomes apparent that participants process displays that have more distinct outliers as consisting of a group and a separate individual outlier. The reason why global mean bias might be larger for the less distinctive group trials in Experiment 2 might have to do with the fact that the equal distribution of the outlier and group trials in this experiment made a local-mean-based strategy more viable than in Experiment 1, as adopting this kind of strategy would eliminate the difference between group trials with different distinctiveness levels (as can be seen in the simulation results in Fig. 2). However, since the participants could not anticipate which task would be next, a local-mean-based strategy would be ineffective for the outlier trials, which consisted half of the trials. Therefore, participants might have been flexibly shifting strategies in Experiment 2 and when the strategy was not appropriate for the particular trial, they might have been biased by the global mean that might have been readily available. In other words, in Experiment 2, participants might have concurrently extracted a combination of the outlier orientation, the local mean, and/or the global mean; when probed representations were not available, especially in the group trials, participants might have chosen to report an alternative summary.

While our results highlight the precision advantage observed for more distinct over less distinct spatial outliers, we believe that our findings also provide indirect evidence for the computational capacity and constraints of ensemble perception mechanisms. In our study, all displays had a spatial outlier; identifying this item as a spatial outlier required participants to integrate both spatial summary and positional variance information. It is known that viewers can efficiently extract the spatial summary of a display either in the form of a centroid or the center-of-mass (Alvarez & Oliva, 2008; Boduroglu & Shah, 2014; Boduroglu & Yildirim, 2020; Rodriguez-Cintron, Wright, Chubb, & Sperling, 2019). Despite the evidence on how the visual system can capture variance of multiple items across various domains (e.g., color: Maule & Franklin, 2019; orientation: Morgan et al., 2008; luminance: Tong et al., 2015; size: Tokita et al., 2016), to our knowledge there is no direct evidence that viewers code positional variance. However, if such information was not implicitly available to participants, the visual system would not have been able to tag less distinct items as spatial outliers. One might argue that the evidence that these items are treated as outliers by the visual system is weak. It is true that we do not have a baseline condition to compare the representational precision of any set member to a potential outlier, to confirm their special status. Nevertheless, we know that the error in reporting the orientation of either type of outlier was less than the error in reporting the group’s average orientation (main effect of trial type). Thus, it is possible that positional variance may to be coded alongside centroid information and be functional in determining outlier status. The ability to concurrently extract two statistical summaries (mean and variance) of spatial position adds to a growing body of findings that argue for the independent extraction of summary statistics (e.g., Utochkin & Vostrikov, 2017). It is possible that the precision advantage observed for the outliers was partly due to their spatial outlier status, suggesting that participants may have had access to these four summaries (mean and variance across spatial and orientation domains) in a simultaneous fashion. Even though it has been argued that the computational capacity of ensemble perception is limited by the coding of several ensembles as opposed to several summaries (e.g., Attarha & Moore, 2015; Poltoratski & Xu, 2013; Utochkin & Vostrikov, 2017; Boduroglu & Yildirim, 2020), we believe our data provide a more nuanced perspective. Our participants may have been able to efficiently summarize at least two different summaries from two different dimensions. Future research needs to more directly test the capacity limits to mean and variance estimation across different dimensions.

While our study demonstrated a precision advantage for distinct outliers, future studies are necessary to further determine the mechanisms contributing to precise outlier representations. One possibility that may merit further research is whether selective and/or covert attention may contribute to the distinct outlier precision advantage. Once an item is tagged as an outlier via ensemble mechanisms, selective attention mechanisms may guide the allocation of resources to them, increasing their precision (e.g., Bays & Husain, 2008). Another possibility may be linked to the temporal aspects of item encoding. Because ensemble perception mechanisms help detect outliers early on, these tagged outliers may benefit from being encoded for longer durations, which in turn may lead to the formation of consolidated representations (e.g., Vogel, Woodman, & Luck, 2006). A final possibility may be linked to how ensemble perception summarizes both mean and variance information. It is possible that these two summary statistics may together scaffold the outlier representation, by determining the range of featural value and narrowing representational error. Future research is necessary to further explore these options.

In sum, we demonstrated that ensemble perception mechanisms contribute to the detection and precision of outlier representations. As the saliency of the outliers increase, so does their precision. Future research needs to determine whether these precise perceptual representations of outliers persist in later stages of visual information processing.