The environment is crowded with an abundance of visual information. However, our processing capacity to handle this information is severely limited. Hence, when we search for specific items or information (e.g., one’s daughter within a group of children), we use features specific to the item of interest (e.g., a blue ribbon in her hair) to deploy our attention. That is, the search is guided by the representation of an item’s feature, and theories of visual attention refer to this internal representation in memory as the attentional or target template (Bundesen, 1990; Duncan & Humphreys, 1989). Accounts of attention propose that holding these templates in memory can be used to increase the sensory gain of neurons that encode task-relevant features (Bundesen et al., 2005; Chelazzi et al., 1993; Desimone & Duncan, 1995).

One widely held view about attentional selection, the feature-similarity view, suggests that attention is biased to the visual items whose features match the attentional template. For example, Folk et al. (1992) found that when participants looked for a red target, color singleton cues that matched the target feature guided attention. By contrast, a singleton cue, with a color that was dissimilar to the target color (e.g., green, did not guide attention). This top-down contingent capture takes place as we tune our attention to the target features, enhancing the response gain of visual features similar to the target item (Folk & Remington, 1998; Martinez-Trujillo & Treue, 2004). Thus, the feature-similarity view proposes that attention is tuned to the exact feature values of the target and that attention is biased to items having those feature values.

However, it has been recently proposed that attention is not always biased to visual stimuli that have the same features as the target. Rather, the relation between target features and features of nontargets plays an important role in guiding attention (Becker, 2010; Becker et al., 2013; Becker et al., 2020). Specifically, in a study adopting a spatial cuing paradigm (Becker et al., 2013), participants looked for a gold target among orange nontargets, which yielded the target–nontarget relation “yellower.” Prior to the target display, a singleton cue of a unique color appeared with three other cues of a different color. The relation between the singleton cue and other cues could either match or not match the target–nontargets relationship. The results showed that a singleton cue with the target color did not capture attention when the relation between the singleton cue and other cues did not match the target–nontarget relationship. By contrast, a cue with the nontarget color captured attention when its relation matched the target’s relative color. These results suggest that physical feature values do not solely determine capture by a singleton cue. Instead, these results indicate that attentional selection depends on the relative match between the target and cue and supports a relational account of attentional guidance (Becker, 2010; Becker et al., 2010).

According to the relational account, the pattern of results was caused by tuning attention to the feature direction that efficiently distinguished the target from the nontargets (Becker et al., 2010). This suggestion is in line with evidence proposing malleability of target representations. For example, in a study by Navalpakkam and Itti (2007), participants searched for a 55° target line among 50° distractor lines during visual search trials, and then the remembered target value was measured on separate probe trials. On probe trials, participants selected the target line from five tilted lines (30°, 50°, 55°, 60° and 80°). The results showed that participants reported 60° more frequently as the target than the actual target value (55°), indicating a shifted representation and tuning of attention to features that could optimally discriminate a target from nontargets (Geng et al., 2017; Kerzel, 2020; Navalpakkam & Itti, 2007; Scolari et al., 2012; Scolari & Serences, 2009; Won et al., 2020; Yu & Geng, 2019).

That is, facing multiple visual inputs, our visual system evaluates the relationship between a target and nontargets. Then, the contextual information that stipulates how the target differs from surrounding items guides attention (Becker, 2010; Becker et al., 2010, 2013; Schönhammer et al., 2020) and affects the representation of the target (Geng et al., 2017; Kerzel, 2020; Navalpakkam & Itti, 2007; Scolari et al., 2012; Scolari & Serences, 2009; Won et al., 2020; Yu & Geng, 2019). Notably, the previous studies concerning this issue used a fixed target feature throughout the experiment to facilitate the effect of contextual information regarding the target–nontarget relationship (Schönhammer et al., 2020). In such cases, it is plausible to presume that the target template is maintained in long-term memory. While some researchers argued that attentional templates are known to be maintained in working memory (Bundesen, 1990; Bundesen et al., 2005; Chelazzi et al., 1993; Desimone & Duncan, 1995), recent studies showed that when the target remained constant over successive trials, the attentional template held in working memory was transferred to long-term memory (Carlisle et al., 2011; Woodman et al., 2007). Specifically, an electrophysiological study by Carlisle et al. (2011) showed that the contralateral delay activity (CDA) of event-related potentials that indexes target representation maintenance in visual working memory disappeared when participants looked for a constant target (no significant CDA after five repetitions of a target). Given that attentional templates could be represented in either working or long-term memory, different types of attentional templates might exert different effects on attentional selection and/or representation shifts (Goldstein & Beck, 2018).

The present study, in contrast to previous studies, investigated whether attentional deployment toward the target’s relative feature and the shift of target representation would also occur when the target feature changes trial by trial (i.e., when the attentional template is maintained in working memory). One possibility is that the long-term representation of a target is needed for contextual information to have a significant effect. Alternatively, dynamic adjustment of context depending on the target–nontarget relationship could be established. If so, attentional capture by target–nontarget relation would also be observed even when the target changes on each individual trial.

To test these alternatives, we used a slightly modified spatial cuing paradigm in which features of targets alternated across trials (Experiment 1) or in which target color varied between blocks (Experiment 2). Then, we examined whether distinct types of attentional templates have different effects on attentional selection and representational shift. In both experiments, the target was presented among three nontarget items of a different color. Prior to the search display, a singleton cue of a unique color was presented with three other cues of a different color. The relation between the singleton cue and other cues could either match or not match the target–nontarget relationship. Attentional capture by the relative target feature was measured by the cue validity effect, with faster or more accurate responses occurring when the singleton cue appeared at the target location (valid trials) than when the singleton cue appeared at the nontarget location (invalid trials).

Furthermore, we measured target representation in isolation of the target–nontarget relationship (baseline representation block) before the introduction of the visual search task. This baseline representation block provided a measurement of participants’ remembered target feature that was unaffected by the target–nontargets relationship. By comparing the target representations before and after the introduction of the target–nontarget relationship, we examined whether a shift of target representation would occur.

Methods

Experiments 1 and 2

Participants

Seventeen adults (seven males, ages 19 to 28 years) participated in Experiment 1 and 19 adults (10 males, aged 21 to 28 years) participated in Experiment 2 for monetary compensation or course credit. In a pilot study, we found cueing effects under matched-relation, but not under mismatched-relation conditions. The partial eta squared of the interaction between relation and cue validity was .16. When aiming for a power of .90 with a Type Ι error rate of 5%, the necessary sample size is 17. Participants were excluded if their search error rate was larger than 20%. Two participants in Experiment 2 were excluded for this reason. All participants provided informed consent and had normal color vision and normal or corrected-to-normal visual acuity. All experimental procedures were approved by the Chungnam National University Institutional Review Board.

Apparatus and stimuli

The experiment was programmed and run using PsychoPy (Peirce, 2007). The stimuli were presented on a 21-in. LCD monitor with a spatial resolution of 1,980 × 1,080 pixels and a refresh rate of 60 Hz. Viewing distance was set to about 60 cm. The color of all stimuli was selected from a color wheel defined in RGB color space. The colors were originally defined in HSV color space with the same saturation (1) and value (1). HSV colors with individual hues (0~360) were converted to the corresponding RGB colors. There was a memory, a cue, a target, and a response display for the visual search task and a color wheel display for the color task. The baseline representation block comprised a memory and a color wheel display. The visual stimuli in the cue, target, and response displays were presented on an imaginary circle with a radius of 4.0°. The color wheel was 0.75° thick and the inner edge was 4.25° from fixation. To avoid motor biases and response repetition, the color wheel was randomly rotated at each presentation.

Design and procedure

Experiment 1

In the baseline representation block, a colored circle appeared in the memory display. The color of the circle (radius of 1.3°) was randomly selected without replacement from a pool of four colors [10° (red), 55° (orange), 155° (green), 285° (purple)]. To measure the representation of the target color on its own, avoiding the formation of a context or relationship between target and nontargets, only the target-colored circle appeared. By clicking the mouse on the color wheel, participants were able to report the target color. The baseline representation block contained four runs, each of which included 32 trials.

In the visual search block, the memory display consisted of a colored circle (radius of 1.3°) that was presented at the center of the screen (see Fig. 1). Similar to the memory display in the baseline representation block, the color of the circle changed trial by trial without replacement from a pool of four colors. The color appearing in the memory display was the target color; participants searched for the color among three nontarget colors in the target display. The three nontarget items in the target display were different by −15° from the target color. That is, the rotational direction (target–nontarget relationship) of the target from the nontarget items was positive. The response display contained four Ts rotated by 90° clockwise or counterclockwise in gray color (1° height).

Fig. 1
figure 1

Example of visual search and color task for Experiment 1. The cue display is depicting valid trial. The color task was run every eighth trial. Nontarget color and nonsingleton color values are exaggerated for visual clarity (see Methods for true values). (Color figure online)

The cue display comprised a singleton cue and three other cues with a distinct color. The cues were circular rings. The line thickness was 0.2°. The relation between the singleton cue and other cues could either match (same relation) or not match (different relation) the target–nontarget relationship. In the same relation condition, the color of the singleton cue relative to the other cues deviated in the same direction as the target from the nontargets. For example, if we set the target color to 55° and the nontargets were set to 40°, the singleton cue was at 65° and the other cues were at 55°. In the different relation condition, the deviation of the singleton cue relative to the other cues was opposite to the target relative to the nontargets; the singleton cue was at 55°, and the other cues were at 65°. The location of the singleton cue was spatially nonpredictive of the target location (25% of trials were valid).

Taken together, the experiment consisted of a 2 × 2 design, with relation (same vs. different) and validity (valid vs. invalid) as within-subjects factors. Participants completed six experimental runs of trials, each of which included 128 search trials and 16 color task trials, for a total of 864 trials. Prior to the main experimental session, each participant completed 36 practice trials composed of both search and color tasks to become familiar with the task. Participants were asked to fixate on the central fixation dot throughout the entire experiment.

The experiment began with the baseline representation block. In the baseline representation block, each trial started with a memory display where a colored circle was presented for 300 ms. After a 300-ms interval with fixation, a color wheel was presented. Participants were asked to report the target color with a mouse click on the corresponding location of the color wheel. This baseline representation block was followed by the visual search block. In this block, there were two tasks: a visual search and a color judgment task. As shown in Fig. 1, each trial began with the presentation of the target color for 300 ms on a black background, followed by a 300-ms fixation display. Then, the cue display was shown for 80 ms, followed by a fixation display for 80 ms and a search display for 80 ms. After target offset, the response display appeared until a response was registered (Max 2 s). Participants were asked to locate the target-colored circle and report the orientation of the T which followed at the position of the target-colored circle by pressing “.” for left and “/” for right. They were instructed to respond as rapidly and accurately as possible, ignoring the cue display. Upon response, the response display was removed. The stimulus onset asynchrony between the cue and target was 160 ms, and the target was presented for 80 ms. Participants were asked to maintain fixation on the central fixation dot.

The color task was presented after every eighth search trial in which participants were to make a color judgment. For the color task, participants were required to report the target color on the color wheel by clicking on its location with a mouse. Participants were told that they should try to be as precise as possible. Following a mouse response, the fixation display was shown for 500 ms.

Experiment 2

The methods were mostly identical to those of Experiment 1, with the following exceptions. In Experiment 2, the target color varied between blocks both in the baseline representation and visual search blocks. The visual search block comprised four separate color blocks, each of which consisted of 256 visual search trials and concomitant 32 color judgement trials. The order of the color blocks was counterbalanced across participants. Prior to the start of the search trials, an example of the target color was presented. Then, participants searched for the color over successive visual search trials. After participants completed all 256 search trials, the color task was run.

Control experiment

To control for possible differences in stimulus saliency and rule out the effect of color directionality, we conducted a control experiment, in which the target color was negatively rotated from the color of nontargets. The full details of the control experiment are presented in the Supplemental Material.

Results

To analyze reaction times (RTs) data, only correct search trials were used. Furthermore, we considered color judgments in the color task in which participants responded outside the two standard deviations (SDs) from the target value (larger than 35° in Experiment 1) to be guesses. This criterion was also applied to the baseline representation block. We excluded these trials from further analysis (<3.4% in Experiment 1, < 2.7% in Experiment 2 of all color judgements). As the target color did not influence the effect of the other independent variable on the dependent measure, we collapsed data across the target colors for analysis (see Table 1).

Table 1 The interaction statistic for the ANOVA with target color

Experiment 1

Reaction times

A two-way analysis of variance (ANOVA), with relation and validity as within-subjects factors revealed a significant main effect of validity, F(1, 16) = 8.36, p = .010, η2 = .343, indicating faster RTs on valid trials than on invalid trials. Importantly, the interaction between validity and relation was also significant, F(1, 16) = 9.37, p = .007, η2 = .369. Given the significant interaction, we split the data into same-relation and different-relation and used pairwise t tests to evaluate each data set. Pairwise t tests revealed that the matched singleton cues captured attention; significantly shorter RTs were observed on valid trials than on invalid trials, t(16) = 4.23, p < .001, Cohen’s d = 1.025. In contrast, under different relations, the singleton cues did not capture attention, p = .674 (see Fig. 2).

Fig. 2
figure 2

Results from experiment 1 and 2. Cueing effect (invalid RT minus valid RT) as a function of cue relation. Error bars represent the 95% confidence intervals. Data from Experiments 1 and 2 are shown in blue and red, respectively. The relative frequency of errors of color task, collapsed across target colors, is represented by the gray blue (Experiment 1) and gray pink (Experiment 2) bars, which refer to the axis on the right. (Color figure online)

Accuracy rates

The two-way ANOVA was also applied to the accuracy data. Search accuracy rates were high and similar across the trial types, p = .361, indicating that the present RT results were uncontaminated by a speed–accuracy trade-off (see Table 2).

Table 2 Mean search accuracy rates as a function of cue validity and cue context in Experiment 1 and Experiment 2

Representational shift

First, we examined the baseline representation block to investigate the shift of target representation with and without surrounding items. The target color was found to be remembered with a positive shift, mean ± SD = 1.006 ± 2.576, but the shift was not significantly different from zero, p = .126. Importantly, compared with baseline block data, a significant shift was observed with color task data, mean ± SD = 7.368 ± 2.080, t(16) = 8.66, p < .001, Cohen’s d = 2.017 (see Fig. 3). The results indicate that the representations of the target colors were significantly shifted away from the nontarget colors in response to the introduction of the nontarget set.

Fig. 3
figure 3

Magnitude of representation shift (degree) by experiment (Experiment 1 vs. Experiment 2). Error bars represent the 95% confidence intervals. Asterisks indicate significant levels for representation shifts (***p<.001; *p<.05) 

Experiment 2

Reaction times

A repeated-measures two-way ANOVA, with relation and validity as factors, revealed a significant main effect of validity F(1, 16) = 77.45, p <.001, η2 = .830. The interaction of validity and relation was also significant, F(1, 16) = 49.5, p < .001, η2 = .760. Given the significant interaction, we split the data into same-relation and different-relation sets, and pairwise t tests were used to evaluate each data set. Pairwise t tests revealed that the same relation singleton cues captured attention; significantly shorter RTs were observed on valid trials than invalid trials, t(16) = 8.49, p < .001, Cohen’s d = 2.059. In contrast, under different relations, cueing costs (inversion of cueing effect) were found, t(16) = 3.20, p = .006, Cohen’s d = .775 (see Fig. 2).

Accuracy rates

The two-way ANOVA was also applied to the accuracy data. Search accuracy rates were high and similar across the trial types, p = .440, ruling out a possibility of speed–accuracy trade-off (see Table 2).

Representational shift

Contrary to Experiment 1, the analysis of the baseline representation block revealed a significant positive shift of target representation even when the target appeared alone, mean ± SD = 1.261 ± 1.951, t(16) = 2.67, p = .017, Cohen’s d = .647. More importantly, the extent of the shift in the color task was greater than that of the baseline representation block, mean ± SD = 10.980 ± 4.931, t(16) = 7.69, p < .001, Cohen’s d = 1.866 (see Fig. 3). These results indicate that the positive shift of the target representation was further positively exaggerated when the context between target–nontargets was formed.

Comparing Experiment 1 and Experiment 2

The current data indicate that contextual information about the target–nontarget relationship affects attentional selection and representation shifts regardless of the status of the attentional template. To quantitatively test this indication, a mixed factor analysis, with relation and validity as within-subjects factors and with experiment as a between-subjects factor, was applied to the RT data. This analysis revealed a significant two-way interaction between relation and validity, F(1, 32) = 49.11, p < .001, η2 = .605, indicating validity effects were different depending on the cue context. Importantly, the three-way interaction between factors was also significant, F(1, 32) = 6.20, p = .018, η2 = .162, indicating that context had stronger effects on attentional selection with long-term memory attentional templates. To compare representation shifts, a mixed-factor analysis, with the introduction of visual search (baseline vs. color task) as a within-subjects factor and experiment as a between-subjects factor, was used to evaluate to the color wheel data. A two-way interaction between the factors was also significant, F(1, 32) = 4.81, p =.036, η2 = .130, suggesting that context had greater effects on the representational shift when participants searched for the fixed target.

Correlation between magnitudes of individual cueing effects and individual representational shift

The current results, where the magnitudes of the validity effect and representational shift in Experiment 2 were greater than those in Experiment 1, suggest that attentional selection and shifted representation are related. To substantiate this idea, we correlated validity effects and representational shift. Data from two experiments were collapsed to increase statistical power. The magnitude of the representational shift and validity effect were positively related (see Fig. 4), r(32) = .46, p = .006, suggesting that a larger shift of representation led to stronger attentional capture (Kerzel, 2020).

Fig. 4
figure 4

Correlation between the magnitudes of cueing effect and representational shift. Data were collapsed across Experiment 1 and Experiment 2

Use memory versus search for a singleton target

Conclusions regarding the relationship between the type of attentional template and the attentional selection/representational shift could be limited by the fact that the target was a singleton stimulus. Therefore, participants might not have needed to retain the target’s feature in either working or long-term memory. To address this issue, we ran control experiments. These experiments were identical to Experiment 1 and Experiment 2, except that participants performed a feature search instead of a singleton search. That is, a search target appeared with a similar color distractor and two gray distractors. Hence, to perform the visual search, participants needed to use a working or long-term memory representation. These experiments confirmed that context has greater effects on representation shift and attentional selection when participants used long-term target templates than when working memory templates were used (for more details, see the Supplemental Material).

General discussion

In two experiments, we showed that singleton cues matching the target–nontarget relations captured attention (Becker, 2010; Becker et al., 2013; Becker et al., 2020; Kerzel, 2020; Schönhammer et al., 2020) and that the target representations were shifted away from the nontarget features (Geng et al., 2017; Kerzel, 2020; Navalpakkam & Itti, 2007; Scolari et al., 2012; Scolari & Serences, 2009; Won et al., 2020; Yu & Geng, 2019). The present findings are also in line with previous studies adopting an inattentional blindness task, suggesting that attention can be tuned to relation (Goldstein & Beck, 2016; Most et al., 2001; Most et al., 2005). The contribution of the present study to the extant literature is that attention is tuned to the relative target feature when the target template is maintained in working memory. Another important finding is that the attentional capture by the target–nontarget relation was significantly weaker with the attentional template in working memory than with the attentional template in long-term memory.

Traditionally, literature on attention has suggested that attentional control settings are widely tuned when participants search for a singleton target (Folk et al., 1992, Experiment 4). In contrast, when participants look for specific feature targets (i.e., feature search), their attention is narrowly tuned, operating with respect to the specific feature value (Folk & Remington, 1998, 2008; Lamy et al., 2004). In the present study, however, attention was found to be tuned narrowly, with fixed targets thereby showing a greater validity effect, despite the identical search requirements of the two experiments (i.e., singleton search). The current results support the findings of a previous study by Lamy and colleagues (Lamy et al., 2006). In that study, the authors compared the fixed target with a varied target condition. The results showed an RT benefit under the fixed target condition, indicating that even under singleton search conditions, attention was narrowly tuned. Moreover, in the current study, the target–nontargets relationship was stable throughout the experiment; the target was always +15° from the nontargets. Hence, it is certainly possible that repeated exposure creates a categorial relation. The larger number of trials in Experiment 2 performed to learn the categorial relationship between each target color and nontargets instead of individual target colors in Experiment 1 might have led to narrowed tuning of participants’ attention to the relationship that could differentiate the target from nontargets efficiently (Wolfe et al., 1992; Wolfe & Gray, 2007).

Furthermore, not only the magnitude of attentional capture but also the magnitude of the representational shift of the target was greater with long-term memory representations than with the working memory representations. Both the optimal tuning and relational accounts suggest that target representation is shifted based on target-to-nontargets distinctiveness to efficiently discriminate the target from nontargets. Since the target was fixed in Experiment 2, participants were exposed to the same task configuration repeatedly. Considering that the internal representations of stimuli are strengthened via repeated exposure (Silverstein et al., 1998), repeated exposure should result in a more consolidated target–nontargets relationship. We surmise that the consolidated relationship/category exaggerates the target-to-nontargets dissimilarity, and that participants looked for colors deviating more from nontargets (Goldstein & Beck, 2016; Most et al., 2001). As a result, the target representation was more shifted with a fixed target than with a varied target.

Regarding the experimental controls for the target color repetition, one might argue that long-term memory might have been involved because there were just four different target colors. However, the target color was selected randomly without replacement and participants had to retain the specific color in working memory, in preparation for the color wheel task, which appeared occasionally. Moreover, in an exploratory analysis, we examined whether the magnitude of the representational shift changed with increasing time spent on the task in Experiment 1. We treated the experimental run as a factor and grouped each of the four consecutive color task trials into an epoch resulting in four epochs in each experimental run. A two-way ANOVA, with run and epoch as factors, failed to yield any significant main effects, ps > .118, or an interaction, p = .671. These results indicate that the shift was stable without variations throughout the experiment. If an attentional template in working memory had passed into long-term memory, the representational shift would have increased throughout the experiment. Given the above, we believe that an attentional template in working memory rather than long-term memory was involved in Experiment 1.

It is noteworthy that cues that did not match the relative target feature resulted in cuing costs, thereby resulting in slower search RTs on valid trials than on invalid trials (Becker et al., 2017; Harris et al., 2013; Schoeberl et al., 2018; Schönhammer et al., 2016). However, these same location costs were only observed with the long-term memory attentional template. Though several previous studies showed that the same location costs were found with heterogeneous search displays (i.e., feature search; Carmel & Lamy, 2014; Kerzel, 2019; Lamy & Egeth, 2003), the same location costs were found with homogeneous search displays in the present study. The present experimental conditions and results did not allow one to explicitly determine under which conditions the same location costs occur. However, given that the feature search required narrow attentional tuning and the narrowly tuned attention with a fixed target in the present study, enhanced attentional tuning might favor the same location costs.

It would also be interesting to know whether the type of attentional template affects attentional selection and representational shift of other features as well. Previous studies have shown that representation shifts take place for various features, including size (Hodsoll & Humphreys, 2001), orientation (Navalpakkam & Itti, 2007), and color (Geng et al., 2017; Kerzel, 2020; Yu & Geng, 2019). Several studies using luminance showed relational search patterns (Becker et al., 2017; see also Goldstein & Beck, 2016) even though representational shifts have not yet been reported. This pattern of results might be modulated by the type of attentional template, as shown in the present study. Future studies should examine whether the present findings are feature independent.

To conclude, the present study showed that contextual information that specifies how a search target differs from nontargets exerts a significant influence on attentional selection and representational shifts. Importantly, the magnitude of contextual effects differed depending on where the attentional template was maintained. The magnitude of capture effects elicited by matching singleton cues and the shifts of target representations were greater when the attentional template was held in long-term memory than that of working memory. Our findings support previous suggestions that working and long-term memory attentional templates have different functional roles, which leads to different effects on attentional selection (Goldstein & Beck, 2018; Woodman et al., 2007). In addition, the present results demonstrating that the magnitude of representational shifts and validity effects are positively related support recent evidence for the effect of representational shift on attentional selectivity (Kerzel, 2020). In summary, the present findings suggest the strength of given contextual information as a determining factor in attentional tuning and shifts in target representation.