Our ability to temporarily maintain information in visual working memory appears to be limited to just a few representations (Cowan, 2001; Irwin & Andrews, 1996; Luck & Vogel, 2013; Vogel et al., 2001). What information is contained in those representations? One view expounded in theories of both attention and memory is that spatial location is the root feature that allows us to access our internal representations of the objects we see (Kondo & Saiki, 2012; Rajsic & Wilson, 2014; Treisman, 1996; Treisman & Zhang, 2006; Wheeler & Treisman, 2002). According to this idea, location is a special feature of an object that defines it and is stored with any other object feature that we might maintain in memory. The present study sought to test this theoretical proposal using a masking paradigm to determine whether spatial location is a special feature of an object that is inherently bound to all other object features.

The idea that location is a necessary part of any object representations in visual working memory was tested by Logie et al. (2011). They used an irrelevant feature change paradigm assuming that if a given feature were stored, then changes along this irrelevant-feature dimension would interfere with subjects’ ability to detect changes of the other, task-relevant features. They found that the irrelevant-location changes disrupted change-detection accuracy at short retention intervals, but not at retention intervals longer than 1 second. Logie and colleagues interpreted these findings as due to an initial integration of location and other object features in subjects’ visual working memory representations. They proposed that the task-irrelevant feature of location was then actively inhibited during the memory-retention interval, resulting in the diminishing effect across time. A similar pattern was found for other types of task-irrelevant feature changes, suggesting that location was not special, but that all features were initially bound into integrated object representations in visual working memory, with the irrelevant features being discarded as time passed following encoding.

One limitation of the paradigm used by Logie et al. (2011) was that it did not study visual working memory storage under conditions in which the encoding of information into memory is stressed by the availability of time. That is, location information might be filtered out of the working memory representations when the task-relevant information must be encoded as quickly as possible. Woodman and Vogel (2008) proposed that irrelevant features of objects are excluded from visual working memory storage under such conditions. They used masking to examine the rate of encoding into visual working memory. Specifically, the stimulus-onset asynchrony (SOA) between the memory arrays and the masks was varied across trials to estimate the speed with which the different features of the objects could be encoded into memory. As shown in Fig. 1, the rate of information accrual differed when color was task relevant than when orientation or shape were to be remembered, or when the task required subjects to remember both color and shape or orientation. The findings suggested that participants limited encoding to just the task-relevant features of objects (i.e., color, shape, or orientation). It is possible that when a masking paradigm is used, we will see that location behaves like these other object features and is filtered out of the object representations in visual working memory when task irrelevant. However, given the theoretical views ascribing a special role to location, it is possible that location must be encoded along with any other object feature.

Fig. 1
figure 1

Stimuli and results of Experiment 1 in the Woodman and Vogel (2008) study. a Examples of the sample, mask, and test arrays used, with an orientation change shown. b Results in terms of the objects’ worth of information, or K (error bars show the 95% within-subjects confidence intervals, as described by Loftus and Loftus (1988), in this and subsequent figures). Note. Adapted with permission of the Psychonomic Society

Two other details of the design require explanation. First, no-mask trials were randomly interleaved with the masking trials. This allowed me to determine whether performance at the longest sample-to-mask SOAs had reached asymptote. Second, participants were required to perform a concurrent verbal task (i.e., articulatory suppression) to discourage verbal recoding of the to-be-remembered information (Baddeley, 1986). This is important because evidence consistent with unbound representations in visual working memory could be due to the storage of some of the information in verbal working memory. The present experiments used a concurrent verbal task to rule out this alternative explanation.

Method

Experiments 1 and 2 used different groups of 40 volunteers, between 18 and 35 years of age, who received course credit in exchange for their participation. All reported normal or corrected-to-normal visual acuity, normal color vision, and consented to procedures approved by the Vanderbilt University Institutional Review Board.

The stimuli were viewed on a gray background (40.6 cd/m2) at approximately 57 cm. All sample stimulus arrays consisted of three colored squares that subtended approximately 2.0° × 2.0° of visual angle. Each square was centered 7.2° from the center of the monitor. In Experiment 1, each memory stimulus was randomly placed at one of 12 possible locations similar to a clock face. In Experiment 2, each memory stimulus was randomly placed at one of four possible locations (i.e., 12, 3, 6, and 9 o’clock). In both experiments, the color of each square was selected at random from a set of seven highly discriminable colors (chromaticity coordinates of the CIE 1931 color space; red, x = .627, y = .327; blue, x = .142, y = .065; violet, x = .279, y = .139; green, x = .280, y = .589; yellow, x = .397, y = .500; black, 0.04 cd/m2; and white, 92.6 cd/m2).

The mask array was composed of checkerboard-like masks. Twelve were presented in Experiment 1 and four were presented in Experiment 2, centered on each of the possible stimulus locations. Each mask was generated by randomly selecting (with two replacements) one of the colors for each cell (0.6° × 0.6°) of a 4 × 4 matrix.

The numbers and digits that made up the articulatory suppression stimuli were strings of four characters (each 0.5° × 0.7°) presented in white (92.6 cd/m2). On each block of trials, the subjects repeated “ABCD,” “WXYZ,” “1234,” or “6789” at a rate of three to four characters per second. The order was randomized across subjects. Each block of trials began with the presentation of the string that subjects were to repeat during each trial of that block. These verbal responses were recorded and monitored by the experimenter to verify compliance.

On each trial, a central black fixation point (0.04 cd/m2, 0.05° × 0.05°) was presented 500 ms before the sample array. The sample array was presented for 23 ms and was followed 35, 105, 140, or 176 ms later by the onset of the mask array (i.e., the four SOAs). The mask array was shown for 500 ms. The time the screen was blank between the mask array offset and test array onset was varied such that the retention interval was always 1,500 ms. On one-fifth of all trials, no mask trials were presented to measure performance in the absence of masks. The test array was extinguished either by the observer’s response or after 5,000 ms had elapsed. The no-mask trials and those with the four different SOAs were all randomly interleaved.

In the color condition, the color of one of the squares in the test array was replaced with a color not in the sample array on 50% of trials. In the location condition, the location of one of the squares changed on 50% of trials to a previously unoccupied location. In the conjunction condition, the color of one of the items changed on 25% of trials, and the location of one item changed on 25% of trials. In all conditions, the stimuli were identical in the sample and test arrays on 50% of trials, randomly interleaved with the change trials. In each condition, subjects performed one 16-trial block of practice before 180 experimental trials. Subjects were allowed to rest in between conditions. Condition order was randomized for each participant. The subjects pressed the z key to indicate change and x key to indicate no change. Accuracy was stressed, and responses were unspeeded. Change-detection performance was measured using K (Pashler, 1988), derived from hit rate and false-alarm rate as described elsewhere (Cowan, 2001; Woodman & Vogel, 2005), for estimating the number of items represented in memory. The same patterns of results were observed using percentage correct.

Experiment 1

The findings of Experiment 1 are shown in Fig. 2b. The consolidation function for color exhibited superior change-detection performance compared with the functions when the location or conjunction of color and location were task relevant. This shows that with the same amount of time between the memory arrays and the mask arrays, more color information could be stored than information about the spatial location, or both of these features in the conjunction condition. This was evidenced by the output of an analysis of variance (ANOVA) with the factors of condition (color, location, or conjunction) and SOA (35, 105, 140, 176 ms, or no mask) returning significant main effects of condition, F(2, 78) = 4.58, p < .05, and SOA, F(4, 156) = 248.48, p < .001, but no interaction of these factors (p > .50). This lack of an interaction is consistent with the similarity of the mean slopes of the consolidation functions across conditions (color: 4.74%/ms, location: 4.62%/ms, conjunction: 4.52%/ms, presented in % correct change per ms of additional exposure, pairwise t-test ps > .40).

Fig. 2
figure 2

Stimuli and results of Experiment 1. a Examples of the sample, mask, and test arrays used in Experiment 1, with a location change shown. b Results from Experiment 1 in terms of the objects’ worth of information, or K. c The findings in terms of types of changes detected in the conjunction and single-feature conditions

Figure 2c shows the percentage of changes detected on the masking trials for the two features, color and location, in the single-feature conditions and the conjunction condition. Entered into an ANOVA with the factors of number of features (single versus conjunction), feature (color versus location), and SOA (35, 105, 140, or 176 ms), this pattern resulted in a significant effect of SOA, F(3, 117) = 24.00, p < .001, due to performance increasing across SOA, and Feature × SOA interactions, F(3, 117) = 3.88, p < .05, due to the generally steeper slopes for color across SOA than location.

The findings of Experiment 1 suggest that when color is the only feature of an object that subjects need to remember, this information can be encoded into visual working memory more efficiently compared with when location needs to be remembered or when both color and location need to be stored. This suggests that location is not a special feature that cannot be filtered out during the process of encoding information into visual working memory. Instead, it appears that location behaves just like any other feature, such as the shape or orientation of an object, and can be inhibited from entry into the working memory representation of an object. The pattern of results from the conjunction condition replicated those of the previous Woodman and Vogel (2008) study in that the conjunction of features could only be encoded as quickly as the slowest to encode feature (compare Figs. 1b and 2b). The present findings show that this pattern is also true if the two task-relevant features are color and location.

The findings from Experiment 1 and those of Woodman and Vogel (2008) might lead one to conclude that color is consistently encoded more quickly than any other object feature. However, in Experiment 1, the number of possible locations that an object could occupy (i.e., 12 locations) was larger than the number of possible colors that the objects could take on (i.e., seven colors). It is possible that having more degrees of freedom along a feature dimension requires a more precise encoding of the individual feature value for a given object. This greater precision would be necessary to support a comparison process that would need to distinguish between subtler changes in the test array (Awh et al., 2007; Hyun et al., 2009). In Experiment 2, I examined whether location might be encoded more quickly than color, or the conjunction of these features, when the spatial locations were limited to four possibilities.

Experiment 2

The method of Experiment 2 was identical that of Experiment 1, except that only four possible stimulus locations were used, as shown in Fig. 3a. The findings of Experiment 2 are shown in Fig. 3b. The consolidation functions show that when a small number of stimulus locations were possible, encoding of the spatial information enjoyed a y-intercept effect that brought it closer to ceiling at the shortest SOA, but that the slopes of the consolidation functions were similar across, as we found in Experiment 1 (color: 5.77%/ms, 4.60%/ms, conjunction: 5.49%/ms, in percentage correct increase per ms, pairwise ps > 0.2). Again, the conjunction condition was indistinguishable from that of the slower-to-encode feature—in this case, color. This resulted in the output of an ANOVA with the factors of condition (color, location, or conjunction) and SOA (35, 105, 140, 176 ms, or no mask) yielding a significant main effect of condition, F(2, 78) = 43.41, p < .001, and SOA, F(4, 156) = 186.60, p < .001, in addition to a significant interaction of these terms, F(8, 312) = 5.69, p < .001. Thus, while reducing the degrees of freedom of the spatial locations of the items boosted overall change detection accuracy, the selectivity of the encoding to task relevance was replicated in Experiment 2.

Fig. 3
figure 3

Stimuli and results of Experiment 2. a Examples of the sample, mask, and test arrays used, with a location change shown. b Results from Experiment 2 in terms of the objects’ worth of information, or K. c The findings in terms of types of changes detected in the conjunction and single-feature conditions

The next analysis examined the difference in the shape of the consolidation functions between Experiments 1 and 2. It appears that the design of Experiment 2, in which a smaller number of locations were possible, resulted in the consolidation functions in all conditions reaching asymptote earlier than in Experiment 1. Although it is likely that change in the shape of these functions was due to the differences in the stimuli used between experiments, it is also likely that some of this was due to the use of different groups of subjects, with those in Experiment 2 simply having a faster consolidation rate than the subjects in Experiment 1. An ANOVA with the factors of experiment (Experiment 1 versus 2), condition (color, location, or conjunction), SOA (35, 105, 140,176 ms, or no mask) yielded a signification main effect of experiment, F(1, 78) = 39.29, p < .001, condition, F(2, 156) = 23.41, p < .001, and SOA, F(4, 312) = 422.94, p < .001. In addition, there were significant Experiment × Condition interactions, F(2, 156) = 26.27, p < .001, due to larger effect of condition in Experiment 2; Experiment × SOA, F(4, 312) = 27.74, p < .001, due to the consolidation functions reaching asymptote at shorter SOAs in Experiment 2 than Experiment 1; Condition × SOA, F(8, 624) = 3.27, p < .01, due to the flip in location and color conditions between Experiments 1 and 2; and Experiment × Condition × SOA, F(8, 624) = 2.96, p < .05, due to the function reaching asymptote more quickly in Experiment 2 compared with the similar functions in Experiment 1, in which color was the fastest feature to be encoded.

The efficiency of a cognitive process is typically estimated by the slope relating performance to the amount of information available from the environment (Wolfe, 1998). The calculations of slope in Experiments 1 and 2 suggest remarkably similar encoding efficiency across conditions, with the slopes not differing significantly between conditions. However, note that the estimates of slope in the present study are constrained by ceiling and floor levels of performance, with these y-intercept effects likely being determined by factors such as perception and decision-making, not just the efficiency of encoding into working memory. The present approach was to make the colors and locations highly discriminable, but it is likely that processes other than working memory consolidation contributed to the findings. For the present hypotheses, the conclusions drawn from slope values are the same—that location is handled like color during masked encoding. However, it is important to remember that other cognitive mechanisms contribute to the y-intercept effects observed in the present results.

The final analyses examined whether the rate of feature encoding differed between the single feature and conjunction conditions by examining the percentage of changes detected in the different conditions. As shown in Fig. 3c, the consolidation rates were similar for both the color and location features in the single feature conditions and conjunction condition in terms of number of changes detected, indicating that the differences between conditions were driven by errors on same trials in the conjunction condition relative to the feature conditions. The ANOVA with the factors of number of features (single versus conjunction), feature (color versus location), and SOA (35, 105, 140, or 176 ms) yielded a significant main effect of feature, F(1, 39) = 62.11, p < .001, and SOA, F(3, 117) = 49.21, p < .001, but no main effect of number of features or higher order interactions.

The findings of Experiment 2 provide evidence for three conclusions. First, the effect of feature-selective consolidation into visual working memory when color and location are the possible task-relevant features is a general phenomenon that replicates. Second, when people need to remember both features of the objects, then the efficiency of encoding tracks the rate of consolidation for the slower-to-encode feature. This was the case in Experiment 1 when color was the faster-to-encode feature and was again observed in Experiment 2 location was the faster-to-encode feature due to a smaller number of possible feature values. Third, our findings appear to falsify the view that location is a critical aspect of objects to which all other object features are bound.

General Discussion

This study tested the possibility that the consolidation of location into visual working memory is prioritized because it is the feature that fundamentally distinguishes different objects and allows all other features to be combined as integrated object representations (Kahneman, Treisman, & Gibbs, 1992; Logie et al., 2011; Treisman & Gelade, 1980; Treisman & Sato, 1990). According to this account, location information should be encoded into visual working memory along with every other feature. In Experiment 1, the rate of consolidation was different when subjects were remembering the color of the objects versus when they were remembering the color and location of each object. This suggests that task relevance of the features to be consolidated into visual working memory that was previously observed also holds for location (Woodman & Vogel, 2008). In Experiment 2, location change detection benefited from decreasing the number of possible locations from 12 to four, becoming the most efficiently consolidated feature. However, task relevance still drove the consolidation functions with the conjunction condition tracking the slower-to-encode feature. Thus, I conclude that encoding of spatial location into visual working memory only appears to happen under time pressure when that information is task relevant.

Logie et al. (2011) argued that location is maintained in visual working memory regardless of task demands. This conclusion was based on their finding that randomization of location between memory and test arrays disrupted performance for detecting changes in other features. Logie and colleagues proposed that for features other than location, visual working memory processing is task-oriented, such that visual working memory treats location differently than other features. Our findings suggest that this is not the case. Our experimental manipulations influenced the consolidation of location and other features in the same way.

The results from a study by Jiang, Olson, and Chun (2000) are important for reconciling the present findings with those of Logie et al. (2011). Using a change-detection paradigm, Jiang and colleagues found that randomizing location when it was not task relevant disrupted participants’ ability to detect changes. However, an additional experiment showed that when the target was made more salient in the test array by dimming the distractors, the disruptive effect of randomizing location when it is not task relevant disappeared. In addition, Woodman, Vogel, & Luck (2012) showed that changing the locations of objects between memory and test arrays on every trial did not impair memory for color. Thus, the discrepancies between these studies and the Logie et al. study might be based on the attention capturing effects of randomizing locations in the test array on a subset of trials during the experiment. In the experiments that scrambled the locations of objects on every trial, people can apparently ignore these changes without cost. It will be interesting in the future to see whether this ability to ignore irrelevant location changes is something that subjects rapidly learn during trials of the experiment, or whether this is entirely under top-down control and can be engaged immediately following instruction.

The present study raises a theoretically important question about generality. The experiments presented here used objects defined only by the presence of a color at a given location. However, one proposal is that spatial location becomes important when multiple object features need to be bound together (e.g., Wheeler & Treisman, 2002). The present findings leave open the possibility that spatial location is impossible to filter out of the representations of more complex objects, with this idea being consistent with a body of existing evidence (Pertzov & Husain 2014). This will be an important topic for future investigation.