Introduction

Focusing attention on certain objects and ignoring others is central to our attentional abilities (Wolfe & Horowitz, 2017). But how pliant is this ability? Humans show a general tendency to attend to where others are looking (Zuberbühler, 2008), enhancing the acquisition of key elements in the environment (Capozzi et al., 2016) and facilitating social coordination (van Vugt, 2014). Despite evolutionary advantages, attending to where others are looking could also have unexpected influences on one's goals. At a museum, for example, one might be more inclined to dwell on an artwork that others are looking at as well, or perhaps more precariously, at a social gathering shift their attention away from their partner because several others are looking at someone else.

Despite these intuitions, social attention research has focused on an individual’s sensitivity to others’ gaze direction without taking into account their current focus of attention (Kingstone et al., 2017). In the gaze-cuing paradigm, for example, participants are asked to maintain fixation on a central computerized face that can look to the left or right (Frischen et al., 2007). Response time (RT) to a target is shorter when it appears at the gazed-at location versus a non-gazed-at location, suggesting that a participant's attention is spontaneously shifted to the gazed-at location (see also Birmingham & Kingstone, 2009). Additionally, recent work has demonstrated that this effect is modulated by the number of gazing faces. For instance, Capozzi et al. (2018) have shown that as more faces look at a certain object, the more likely participants are to attend to that object (see also Sun et al., 2017).

In light of this work, it follows that one’s ability to focus attention on a given location may be modulated by the presence and number of people who look toward or away from it. To address this issue, we used a group gaze-cuing paradigm (Capozzi et al., 2018) with four stimulus faces and adapted it to include a nonpredictive directional cue before stimulus faces looked toward and/or away from the cued location. Previous work has shown that spatially uninformative cues as simple as letters or words (e.g., “left” or “right”) are sufficient to bias attention, leading to shorter RTs to targets appearing in the direction that is congruent with the cue (i.e., cue-congruent targets) relative to those appearing in the opposite direction (i.e., cue-incongruent targets; e.g., Hommel et al., 2001).

Thus, we expected to find a congruency effect at baseline. That is, when two faces are looking in one direction and two faces are looking in the opposite direction, RTs to a target at the cue-congruent location will be faster than RTs to a target at the cue-incongruent location (Capozzi et al., 2014). Crucially, if attentional focus is pliable to others’ gaze, we expect that this congruency effect will increase as the group majority looks in the cue-congruent direction and decline as the group majority looks in the cue-incongruent direction in a linear fashion (e.g., Capozzi et al. 2018; Sun et al., 2017).

Our hypotheses were confirmed. We found a cue-congruency effect that was enhanced when all the faces looked at the cue-congruent location; and the cue-congruency effect declined when most or all of the faces looked away from the cue-congruent location. Collectively these data reveal that peoples' focus of attention is modulated by where others in a group are looking.

Methods

Participants

We aimed to test 35–40 participants based on an a priori power analysis (dz =.5, α =.05, β = .20; Capozzi et al., 2018; Faul et al., 2007). Thirty-eight students (27 females, 11 males, M = 20.39 years, SD = 2.14 years) from the University of British Columbia (UBC) participated. All participants’ vision was normal or corrected to normal. Participants received course credits for participation. The study was approved by the UBC ethics board.

Apparatus and stimuli

Participants were seated at a distance of 70 cm in front of a computer screen (resolution: 1,024 × 768; NEC AccuSync LCD200VX 20.1-in., 75 Hz) and a keyboard was placed in front of them within easy reach. As shown in Fig. 1, stimuli consisted of four male faces (varying in size from 3.8° to 4.4° in width and 6.2° to 6.4° in height), three placeholder objects (2.4° × 2.6°), and a black circle (0.5° × 0.5°) used as response target. The experiment was run using Experiment Builder software (SR Research).

Fig. 1
figure 1

Example stimuli and trial sequence. (a) The participant and four faces fixate the central object. (b) A directional cue (L or R) is shown at the central object. The participant is required to press the left or right key on the keyboard. (c) A variable number of faces (0–4) look at the object at the cue-congruent location (in this example three) and the other faces at the opposite location. (d) After either 300 ms or 900 ms, a black circle appears either at the cue-congruent location (as illustrated here) or the other (cue-incongruent) location. The participant is required to press the space bar as fast as possible when seeing the black circle

Procedure and design

In each trial, participants were presented with four faces gazing towards the central object on the computer screen, which they were required to fixate (Fig. 1a). Then either an “L” or “R” cue was randomly presented on the object (Fig. 1b); these cues were not predictive of the target location. To enhance the probability of this cue orienting attention to the left or right we asked participants to respond to the L and R cues by pressing a left (“N”) or right (“M”) key on the keyboard (colored in blue and yellow, respectively) with their preferred hand (Lu & Proctor, 1995; Van der Lubbe & Abrahamse, 2011); the trial would not proceed until the correct key was pressed and a feedback tone was presented upon an erroneous response. Then, a variable number of faces (0–4) shifted their gaze (Fig. 1c) to the location congruent with the cue (we will call this the “cue-congruent location”), while the other faces (0–4) gazed to the mirror opposite location (the “cue-incongruent location”). Gaze shifts were manipulated with head movement only (i.e., a head turn to the left or right). After either 300 ms or 900 ms, a target stimulus (a black dot) was presented (Fig. 1d) either on the object at the cue-congruent location (congruent trials) or the cue-incongruent location (incongruent trials). Participants were instructed to ignore the gaze shifts and respond as fast as possible to the target by pressing the space bar. On a small number of catch trials (10%), no target was displayed, and no response was required.

We used a within-subject design in which we manipulated three factors. The first factor was the Proportion of Faces that gazed toward (T) or away (A) from the cue-congruent location (five levels: All-T, Most-T, Neutral, Most-A, All-A). In the All-T trials, all four faces gazed toward the object in the cue-congruent location. In the Most-T trials, three faces gazed toward the object in the cue-congruent location while the one remaining face gazed away from it. All-T and Most-T trials served to test whether group gaze towards the cue-congruent location increases participants’ attention towards this object. In the Neutral trials, two faces gazed toward the object in the cue-congruent location and two faces gazed away from it. These trials served as a baseline to ensure that a congruency effect is present when an equal proportion of faces are gazing toward and away from the cue-congruent location. In the Most-A trials, three faces gazed away from the cue-congruent location, while the one remaining face gazed toward it. In the All-A trials, all four faces gazed away from the cue-congruent location. Most-A and All-A trials served to test whether the gaze direction of faces away from the cue-congruent location decrease participants’ focus of attention to the cue-congruent location. The position of faces and their gaze shifts were pseudo-randomized such that each face in each position would turn left or right an equal number of times.

The factor Target Congruency (two levels: congruent, incongruent) manipulated whether the target appeared on the object at the cue-congruent location or the opposite cue-incongruent location.

Finally, the factor Stimulus-Onset Asynchrony (SOA; two levels: 300 ms or 900 ms) manipulated whether the target appeared after 300 or 900 ms after the group gaze shifts. All trials were varied randomly and equiprobably with 20 repetitions for each trial type and 40 catch trials randomized across Proportion of Faces, resulting in a total of 440 trials. The experiment lasted about 30 min.

Results

To exclude the possibility that participants pressed the space bar regardless of target presence, we assessed if participants withheld responding on catch trials and responded on target trials, and found this to be the case (catch trial accuracy: M = 0.93, SD = 0.11; target trial accuracy: M = 1.00, SD = 0). Furthermore, to match our preprocessing procedure to our earlier study (Capozzi et al., 2018), we excluded RTs less than 200 ms and greater than 1,200 ms resulting in the removal of 3% of the data.

To provide a general descriptive overview of the filtered RT data, Table 1 displays RTs for all combinations of Proportion of Faces, Target Congruency, and SOA. There was a tendency for RTs to be shorter on congruent trials than on incongruent trials, with an apparent modulation by the number of the faces looking toward or away from the direction of the cue. Moreover, RTs became shorter as the SOA lengthened from 300 ms to 900 ms, reflecting a standard foreperiod effect (Niemi & Näätänen, 1981).

Table 1 Mean response times (RTs) in ms as a function of the proportion of faces looking toward or away from the cue-congruent location for congruent and incongruent trials, respectively. Values in brackets are standard error of the mean

SOA stimulus-onset asynchrony

To test whether these observations were statistically reliable, we fitted a linear mixed model (Bates et al., 2015; McElreath, 2020) including all the above factors with RT as dependent variable. We modeled all main and interaction effects between factors as fixed effects. As random effects, we included for each participant intercepts and slopes for all factors. We assessed the significance of these effects using the Satterthwaite's method (Luke, 2017) and type III sum of squares.

This analysis returned a significant main effect of Target Congruency, F(1,571) = 5.39, p = .020, and a trend towards significance for SOA, F(1,70) = 3.23, p = .077. Importantly, there was a significant interaction between Proportion of Faces and Target Congruency, F(1,678) = 32.60, p < .001. No other effects were significant (ps > .149). The total variance explained by the model was 83% (Johnson, 2014; Nakagawa & Schielzeth, 2013).

To follow up on the significant Proportion of Faces × Target Congruency interaction, we computed the magnitude of the congruency effect as the difference between the RTs for the targets at the cue-incongruent locations minus the targets at the cue-congruent locations averaged across SOAs at each level of Proportion of Faces, as shown in Fig. 2.Footnote 1 We ran a linear mixed model using the congruency effect as dependent variable, the factor Proportion of Faces as categorical predictor, and the Neutral trials as reference group for the Proportion of Faces factor.

Fig. 2
figure 2

Difference between incongruent and congruent trials as a function of faces looking towards or away from the cue-congruent location. Error bars are standard error of the mean

We found that the intercepts differed significantly from zero (M = 11.59, t(185) = 2.60, p = .010), indicating that there was a congruency effect at baseline. In other words, responses to targets appearing in the cue-congruent direction were significantly faster than responses appearing in the cue-incongruent direction when the gaze directions of the stimulus faces were balanced (i.e., two faces looking towards the cue-congruent direction and two faces looking in the cue-incongruent direction). This result shows that the directional cue was effective in orienting participants’ attention when group gaze was not biased toward (T) or away (A) from the cue-congruent location.

When three faces looked at the cue-congruent location and one face looked away from it (i.e., Most-T trials) we found that the significant Most-T cue-congruency effect was the same as baseline (Mdifference = -0.02, t(185) = 0.00, p = .999). We also calculated a Bayes Factor for this comparison and found that the null hypothesis is 5.72 times (BF = 0.17) more likely than the alternative hypothesis (Leppink et al., 2017). Importantly, however, we found that when all faces looked toward the cue-congruent location (i.e., All-T trials), the congruency effect significantly increased relative to baseline (Mdifference=15.33, t(185)=2.43, p=.016) indicating that all faces gazing toward the cue-congruent location increased participants’ focus of attention at that location.

Our analyses also showed that the congruency effect decreased relative to baseline both when three faces looked away from the cue-congruent location and one face looked toward it (i.e., Most-A trials; Mdifference = -15.74, t(185) = -2.50, p = .013) and when all faces looked away from the cue-congruent location (All-A trials; Mdifference = -17.23, t(185) = -2.73, p = .007). These results indicate that most and all faces gazing away from the cue-congruent location decreased participants’ focus of attention at that location.

As an additional follow-up analysis, we tested whether the magnitude of the congruency effect modulation differed between faces looking Away versus Towards the cue and between the All- versus Most-faces conditions. We first subtracted the baseline congruency effect from the other conditions (All-T, Most-T, All-A, Most-A). Using the absolute value of these differences as dependent variable, we ran a liner mixed model with the factors Direction (Towards, Away) and Faces (All, Most). We found no significant main effects (Direction: F(1,38) = 0.23, p =.635; Faces: F(1,45) = 0.058, p = .812) and no significant interaction effect (F(1,74) = 2.97, p = .089), indicating that the magnitudes of the differences from the baseline congruency effect did not differ substantially across All- versus Most-faces conditions, and across Away versus Towards conditions.

Together these findings demonstrate that participants’ focus of attention is both increased by the proportion of faces gazing toward the cue-congruent target (i.e., All-T) and decreased by the proportion of faces looking away from it (i.e., Most-A and All-A).

Discussion

Consistent with past research, participants were generally faster at detecting targets that appeared in the nonpredictive letter cue-congruent location relative to targets appearing at the cue-incongruent location (e.g., Hommel et al., 2001; Van der Lubbe & Abrahamse, 2011). Importantly, this congruency effect was significant at baseline, that is when half of the group faces gazed toward the cue-congruent location and the other half gazed towards the cue-incongruent location, consistent with the notion that two divergent lines of sight are often disregarded as they do not represent a sufficiently clear or coherent social cue (Capozzi & Ristic, 2020; see also Kingstone et al., 2019). Obtaining this result was foundational to the present study, as without attention being committed to the cue-congruent location at baseline, we would not be able to examine the change in this prior attentional bias due to the group gaze. Thus, to maximize the probability that participants' attention was shifted to the cue-congruent location – without explicitly directing them to do so – we used a nonpredictive directional cue (the letters L or R) that demanded an action (a left or right keypress). One line for future research will be to examine the effect of group gaze on attention when it is previously engaged either by a symbolic cue or an action alone.

Critically, when the proportion of faces looking toward or away from the cued object was unequal, the congruency effect was altered. Specifically, our data show that when all faces look toward the participants’ cued object, the congruency effect increases relative to baseline. That is, whereas previous studies have shown that increasing proportions of gazing faces increases observers’ attentional shifts to the gazed-at location (e.g., Capozzi et al., 2018, but see also Gallup et al., 2012), the current study shows that increasing proportions of gazing faces enhances one’s current attention to objects. In other words, these data suggest that one might be more inclined to commit attention to a location in the environment simply because a majority of other people are looking toward it. It is interesting to speculate that this form of social tuning may have the advantage of helping to establish a shared focus of attention and thus facilitate social interactions (Shteynberg, 2015, 2018). Though, of course, it can also carry social costs as one might be relatively slow to respond to important changes in the environment because others’ gaze does not facilitate attentional disengagement from one’s current focus of attention (Simons, 2000). In this respect, it is interesting to note that when most but not all faces gazed towards a participant's cued object, we did not observe a change relative to baseline. This suggests that others’ gaze increases or strengthens one’s attention on a given object only when a significant proportion of people (in our case all of the people) look at the same object. This result may indicate a mechanism to minimize the modulatory effects of others’ gaze and maintain behavioral flexibility (Capozzi & Ristic, 2018). Future research might address this issue by systematically varying group size and increasing the proportion of cue-congruent gazes (e.g., Capozzi et al., 2018).

Interestingly, however, when both most and all the faces gazed away from the cued object, the congruency effect decreased progressively, suggesting that it becomes increasingly difficult to commit attention to a particular object when the proportion of people looking elsewhere increases. This novel result suggests that the gaze of the group may be particularly well suited to decrease or interfere with one’s current attentional focus. Previous studies and theoretical approaches have tended to focus mostly on the advantages of utilizing others’ gaze for efficient monitoring of the environment (e.g., Zuberbühler, 2008). While undoubtedly correct, this approach has overlooked the fact that, as our data show, the gaze of others can also be detrimental to environment processing. This finding corroborates the notion that humans might prioritize social information even when it is inconsistent with one’s current goals, and echoes previous studies in social influence that have shown, for example, that individuals may alter their responses in perceptual judgement tasks if they diverge from the prevailing view of the group (Cialdini & Goldstein, 2004). Our data add to this literature by showing that one’s current attentional focus is susceptible to the interference of others’ gaze, thus suggesting that the weight of social information might penetrate even a relatively primary function such as maintaining attention (see also Zanesco et al., 2019). In this respect, however, it is important to note that we manipulated participants’ focus of attention in a relatively implicit way, but without explicitly instructing participants to maintain attention on the cue-congruent object. It is possible that more explicit instructions would strengthen participant’s attentional commitment, making it more resistant to others’ gaze. Thus, an interesting question for future research would be to test to what extent a stronger commitment to maintain attention would counter the distracting interference of others’ gaze (Geng et al., 2019). Furthermore, and along similar lines, previous research has shown that various social factors – such as stimulus race, gender, and perceived similarity – play a role in social attention behaviors such as gaze following (e.g., Dalmaso et al., 2020). Thus, future research would also benefit from a systematic analysis of the interactions between individual goals and the diversity within the social environment in modulating the present findings.

In sum, our data show that the presence and number of people who look toward or away from one’s attended object can both increase and decrease one’s ability to maintain attention on that object. Thus, this study shows that others’ gaze can be hard to resist, which can be both beneficial and costly to one's responses to events in the environment.