Experts display remarkable perceptual advantages that enable them to rapidly focus on relevant aspects of domain-specific stimuli (for a review, see Reingold & Sheridan, 2011). Expert chess players can rapidly move their eyes to the optimal move during a chess game (Charness, Reingold, Pomplun, & Stampe, 2001) and expert radiologists can rapidly move their eyes to abnormalities during medical image perception tasks (Kundel, Nodine, Krupinski, & Mello-Thoms, 2008). To investigate how expertise shapes attention guidance and eye-movement control, we monitored the eye movements of experts and non-musicians in the domain of music reading during a novel music-related visual search task. We will begin by reviewing prior work on the perceptual skills of experts and we will then introduce the present paradigm and our predictions.

Chunking and template theories (Chase & Simon, 1973a, 1973b; Gobet & Simon, 1996b, 2000) provide a theoretical perspective for understanding the perceptual advantages of experts. These theories were initially developed in the chess domain, in response to findings that chess experts had an extraordinary ability to memorize briefly presented game positions (as shown by pioneering work by de Groot, 1946). This memory advantage largely disappeared if the experts were instead shown random configurations of pieces (Chase & Simon, 1973a, 1973b; Gobet & Simon, 1996a). To accommodate this pattern of results, chunking theory (Chase & Simon, 1973a, 1973b) assumes that experts acquire memory structures over the course of many hours of practice. These memory structures are comprised of “chunks” of domain-specific visual information and are complemented by additional larger memory structures called “templates” (Gobet & Simon, 1996b, 2000). Chunks and templates allow experts to efficiently process domain-specific stimuli in terms of broader patterns instead of individual features.

There is now substantial evidence that chunking and template theories can accommodate a variety of empirical findings in the visual expertise literature (for reviews, see Bilalić, 2017; Reingold & Sheridan, 2011, Sala & Gobet, 2017; Sheridan & Reingold, 2017). However, more work is needed to understand the precise mechanisms by which chunks and templates support the superior visual search performance of experts. There are many possible complex mechanisms that could be involved, as indicated by Eckstein’s (2011) interviews with experts from three domains (i.e., fishing, radiology, and satellite imagery), which led him to conclude that “expertise for all three tasks is based on a complex set of knowledge about targets, backgrounds, and context.” (p. 22). To further explore these complex mechanisms, our goal was to introduce a new visual search paradigm for investigating the nature of experts and non-musicians’ mental representations of the targets they are searching for (i.e., search templates) within their visual working memory (VWM). Our paradigm builds on a large literature showing that more precise search templates facilitate visual search performance (for reviews, see Eckstein, 2011; Wolfe & Horowitz, 2017). For example, people show decrements in attentional guidance and decision making in visual search tasks when the search template quality is degraded by adding inaccurate or extraneous features (Hout & Goldinger, 2015), and people are better at focusing on relevant information in real-world scenes when the search template is cued with a specific picture rather than a verbal cue (Malcolm & Henderson, 2010). Extending this prior literature, rather than manipulating how the search template is cued, our paradigm instead examines the effect of expertise on the processing of visually complex search templates.

Critically, whereas prior visual search paradigms have typically presented the search template prior to the search array, the current paradigm used a novel approach of displaying both the search template and the search array simultaneously. Specifically, we monitored the eye movements of music experts and true non-musicians while they searched for a specific section of music (i.e., the search template) within a simultaneously presented music score (i.e., the search array). Eye tracking allowed us to test if the experts and non-musicians processed the search template in a qualitatively different manner throughout their search. Based on chunking and template theory, we predicted that experts would acquire a more precise representation of the search template in VWM than non-musicians. We therefore expected experts would have higher accuracy relative to non-musicians, should not need to move their eyes back to the search templates as often as non-musicians, and would be better at rapidly focusing on relevant information than non-musicians.

In addition to examining relevancy and expertise effects, we also examined the effect of the visual complexity of the music scores (see Fig. 1). As illustrated by the stimulus examples in Fig. 1, music scores are ideal for studying interactions between expertise and complexity because they vary in visual complexity to a much greater extent than stimuli from the chess domain, which has historically been the dominant domain for testing chunking and template theory. The music scores in our study contained complex and fine-grained visual details that convey a variety of aspects of music performance, including duration, pitch, volume, and numerous expressive elements of the music. Furthermore, similar to chessboards, music scores are well suited for testing the predictions of chunking and template theory because they are comprised of individual features (e.g., music notes) that could potentially belong to one (or more) “chunks” (e.g., chords, arpeggios, etc.).

Fig. 1
figure 1

An example of the three regions from the dwell-based analyses for a simple trial (Panel a) and a complex trial (Panel b). The search template was presented above the search array, which contained the target and distractor bars. The boxes around the regions of interest are shown here for illustrative purposes only and were not presented during the experiment. There were three possible target regions per image, and we counterbalanced which of the three regions served as a target (see text for details)

Visual complexity effects (i.e., greater processing difficulty for complex than simple music scores) have been mixed in the music reading literature, with prior studies showing “significant” (e.g., Goolsby, 1994; Kinsler & Carpenter, 1995; Penttinen, Huovinen, & Ylitalo, 2015; Wurtz, Mueri, & Wiesendanger, 2009), “null” (e.g.,  Waters & Underwood, 1998) and “reverse” (e.g, Polanka, 1995) visual complexity effects. To clarify the boundary conditions of visual complexity effects, the present study tested for interactive effects of visual complexity and expertise in the context of a task that did not require music performance, which allowed us to eliminate potential confounds due to variations in motor processing and the speed of performance (i.e., tempo).

Also, as an additional methodological advantage, our paradigm was accessible to a true non-musician group. As discussed by Donovan and Litchfield (2013), a gap in the expertise literature is that very little work has contrasted experts with naïve observers (but see e.g., Donovan and Litchfield, 2013; Waters, Underwood, & Findlay, 1997), potentially because many tasks in the expertise literature require at least some amount of domain-specific knowledge. By including the non-musician group in the present study as a baseline, our main goal was to test chunking and template theory’s assumption that domain-specific experience facilitates the perceptual grouping of stimuli into larger patterns (i.e., chunks) during a challenging visual search task. In the experiment reported below, we tested our hypotheses using a large and equal sample of non-musicians (n = 30) and expert musicians (n = 30).

Method

Participants

The expert musicians (N = 30Footnote 1, 12 females, mean age = 24.63 years) were recruited from the University at Albany, SUNY, campus and the surrounding community. Expert musicians had completed at least 10 years of music training (including music reading, music theory, and music performance of at least one instrument). This approach to operationalizing expertise is consistent with existing definitions in the music expertise literature (e.g., Burman & Booth, 2009; Drai-Zerbib, Baccino & Bigand, 2012; Halpern & Bower, 1982; Waters & Underwood, 1998; Wong & Gauthier, 2010). The non-musicians (N = 30; 23 females, mean age = 18.43 years) were undergraduate students who self-reported that they had little to no experience with music training and that they could not read music. Participants were either compensated with course credit or $10 in cash. All participants had self-reported normal or corrected-to-normal vision.

Materials and design

Our materials consisted of 108 three-line excerpts from lesser known Baroque and Classical era piano scores. Fifty-four of these images were selected to be “visually simple” and 54 were “visually complex”. We defined complexity as the amount of ink on the page, and we collected data from five independent raters to verify that we had implemented a strong manipulation of complexity according to this definition. We manipulated visual complexity within-subjects, and we employed a 2 (Complexity: Visually Complex or Visually Simple) x 2 (Expertise: Experts or Non-musicians) design.

As shown in Fig. 1, on each trial in the experiment, a target bar (i.e., the search template) was presented above the music score (i.e., the search array), and this target bar was identical to one of the bars within the score. Each music score had between 6 and 24 bars that could have served as potential target bars. To select the target bars for our study, we randomly selected three bars from each music score (one bar was selected from each of the three lines in the score) to serve as possible target bars. Across participants, we counterbalanced which of the three possible target bars was designated the search target, and the remaining two bars were designated as distractors. Thus, across people, the same bars served as both targets and distractors, such that each bar served as a control for itself. Each participant saw a given music score only once, and there was an equal chance of the target being located in each of the three lines.

Apparatus

Participants’ eye movements were monitored using the SR Research EyeLink 1000 Plus system with high spatial resolution and a sampling rate of 1000 Hz. A head and chin rest stabilized the head. Although viewing was binocular, only the right eye was monitored. The gaze-position calibration error was less than 0.5° for all participants. Visual stimuli were displayed on a 24-inch Asus CG248QE computer screen with a resolution of 1920 x 1080 pixels. The screen was 70 cm away from the participant. The music notation in the images was presented in black on a white background. While there were slight variations in the sizes of the music scores and search templatesFootnote 2, the distance between the center of the music score and the center of the search template remained fixed. A gamepad was used to collect responses.

Procedure

At the beginning of the trial, participants looked at a fixation cross, which was centered at the location where the search template subsequently appeared, and they pressed a button to initiate each trial. The search template was displayed above the search array throughout the trial, which permitted visual comparisons between the search template and the search array. Participants were instructed to locate the target as quickly and accurately as possible. When the participants located the target bar, they indicated their response by looking at the target and then pressing a button on the game pad. Once the button was pressed their fixation location was recorded, and the trial ended. To prevent button press errors, the trial did not progress (and responses were not recorded) unless participants were looking at the search array at the time that they pressed the button. This sequence was repeated for five practice trials, which were followed by 108 experimental trials.

Results

We will begin by reporting accuracy and reaction times (RTs), followed by our eye-tracking measures that were designed to assess how experts and non-musicians allocated their attention to relevant and irrelevant information.

We analyzed our data using 2X2 ANOVAs, with Complexity (simple, complex) as a within-subjects variable and Expertise (expert, non-musician) as a between-subjects variable. Also, in our dwell-based analyses for the search array, we included Relevancy (target, distractor) in our analyses as an additional within-subjects variable. We excluded inaccurate trials (when the target was not accurately located) from all of the RT and eye movement analyses.

Accuracy and RT

Table 1 contains the means and standard errors for the accuracy and RT measures. Confirming our hypothesis, accuracy was higher for experts than non-musicians (F(1,58) = 4.53, p < 0.05,\( {\eta}_p^2 \) = .07), although the two groups did not differ in reaction times (F(1,58) = 0.07, p = 0.80, \( {\eta}_p^2 \) = .001). Also, compared to simple trials, complex trials yielded longer RTs (F(1,58) = 47.58, p < 0.01, \( {\eta}_p^2 \) = .45) and more accurate responses (F(1,58) = 4.57, p < 0.05, \( {\eta}_p^2 \) = .07), suggesting a speed–accuracy tradeoff for the complexity manipulation. There were no interactions for either the RT or accuracy measures (all ps > 0.8, all Fs < 1).

Table 1. Means for accuracy, reaction time (in ms), saccade amplitude (in ° of visual angle), and fixation duration (in ms) as a function of expertise and complexity

Fixation duration, fixation count, and saccade amplitude

Table 1 contains the means and standard errors for average fixation duration (in ms), average saccade amplitude (in degrees of visual angle), and average fixation count (i.e., the average number of fixations in a trial). Most importantly, as evidence that the experts and non-musicians adopted qualitatively different search strategies, the experts showed overall shorter saccade amplitudes (F(1,58) = 20.16, p < 0.01, \( {\eta}_p^2 \) = .26), and longer fixation durations (F(1,58) = 25.02, p < 0.01, \( {\eta}_p^2 \) = .30), but no significant differences in fixation count (F(1,58) = 1.53, p = 0.22,\( {\eta}_p^2 \) = 0.03) relative to non-musicians. Building on these findings, the dwell-based analyses reported below were designed to further explore possible group differences in search strategies.

Also, fixation durations were slightly shorter in the complex condition than the simple condition (F(1,58) = 7.38, p < 0.01, \( {\eta}_p^2 \) = .11). Therefore, as shown in Table 1, the longer RTs in the complex condition (as reported above) reflected an increase in the number of fixations (F(1,58) = 70.28, p < 0.001, \( {\eta}_p^2 \) = 0.55) instead of an increase in fixation durations. Complexity did not impact saccade amplitudes (F(1,58) = 1.96, p = 0.17, \( {\eta}_p^2 \) = .03), nor did it interact with expertise for saccade amplitude, fixation count, and fixation duration (all ps > 0.5, all Fs < 2.5).

Dwell-based analyses

To test our hypothesis that experts and non-musicians would adopt qualitatively different search strategies, we used dwell-based analyses to investigate the processing of three different regions (see Fig. 1): the search template region, the target region, and the distractor region. For each region, we analyzed three dwell-based eye-movement measures: (1) First-dwell duration (i.e., the duration of the first dwell on the region; a dwell was defined as one or more consecutive fixations on the region, prior to the eyes moving to a different region of the display), (2) Total-dwell duration (i.e., the sum of all of the dwell durations in the region), and (3) Number of dwells (i.e., the number of dwells in the region). Figure 2 displays the means and standard errors for the dwell-based analyses, which were conducted separately for the search template region (panels a, b, c), and for the target and distractor regions that were located within the search array (panels d, e, f). Because the target bar had an equal chance of being in one of the three lines of the music score, the same bar served as either a target or a distractor depending on its context. As illustrated in Fig. 1, there were two possible distractors (i.e., the other two bars not used as the target) on each trial. However, we only analyzed the distractor region that was fixated first on a given trial. Given that the participants terminated the trial after they found the target, they did not always fixate on both of the distractor regions. Therefore, more data was available for the first-fixated distractor region.Footnote 3

Fig. 2
figure 2

Dwell-based analyses of the search template region (panels ac) and the target versus distractor regions (panels df). We analyzed first-dwell duration (ms) (a and d), total dwell duration (ms) (b and e), and dwell count (c and f). The error bars represent the standard error of the mean

Search template analyses

In support of our hypothesis, the dwell-based analyses for the search template revealed qualitatively different search strategies for experts versus non-musicians. Specifically, as evidence that experts did not need to return to the search template as frequently as non-musicians, the experts had a fewer number of dwells (F(1,58) = 4.46, p < 0.05, \( {\eta}_p^2 \) = .07). Also, as evidence that experts were better at focusing on relevant information, the experts had longer first-dwells and total-dwells in the search template region than the non-musicians (all ps < 0.01, all Fs > 30).

As further evidence that the two groups adopted different strategies, experts had larger complexity effects (i.e., greater processing difficulty for complex than simple trials) compared to non-musicians, as indicated by significant complexity by expertise interactions for all three measures (all ps < 0.05, all Fs > 4.5), as well as significant main effects of complexity for first dwell and total dwell (all ps < 0.01, all Fs > 15). For the dwell count measure, the non-musicians showed a reversal in the pattern of complexity effects, and there was no main effect of complexity (F(1,58) = 0.01, p = 0.93, \( {\eta}_p^2 \) = 0.00). Follow-up t tests revealed that experts had significantly fewer dwells in the simple condition relative to the complex condition (t(29) = 5.32, p < 0.001, d = 0.98), while non-musicians showed the opposite pattern (t(29) = 4.33, p < 0.01, d = 0.79). Overall, this pattern of complexity results, which was not predicted a priori, suggests that the experts processed the search templates in-depth, whereas the non-musicians engaged in more superficial processing of the search template.

Target vs. distractor analyses

Our analyses of the target and distractor regions revealed that experts showed larger relevancy effects (i.e., longer dwell times on targets than distractors) compared to non-musicians, and this relevancy × expertise interaction was significant for the first-dwell measure (F(1,58) = 15.20, p < 0.01, \( {\eta}_p^2 \)= 0.21), but not for the total dwell and dwell count measures (all ps > 0.1, all Fs < 2.5). There was a main effect of relevancy for all three measures (all ps < 0.01, all Fs > 200), with dwell times being longer on targets than distractors.

Also, not surprisingly, the complex condition elicited longer dwells and higher numbers of dwells than the simple condition. For two of the measures (i.e., total dwell and number of dwells) the complexity effect was larger for targets than distractors (all ps < 0.01, all Fs > 40), potentially because the targets elicited more in-depth processing than the distractors. None of the remaining effects were significant (all ps > 0.05, all Fs < 3.5).

General discussion

According to chunking and template theories, the perceptual skill of experts reflects their ability to process domain-specific stimuli in terms of larger patterns called “chunks” and “templates” (Chase & Simon, 1973a, 1973b; Gobet & Simon, 1996b, 2000). Given that chunking reduces working memory load (Thalmann, Souza, & Oberauer, 2019), we predicted that experts should be better than non-musicians at encoding and storing precise search templates in visual working memory (VWM) during challenging and complex visual search tasks. In the present study, we introduced a paradigm in which the search template and search array were presented simultaneously, which allowed the experts and non-musicians to make visual comparisons across the two regions. Using this paradigm, we monitored eye movements to show that experts and non-musicians processed the search template in a qualitatively different manner. Compared to the non-musicians, the experts had longer dwell durations for the search template region, which suggests that they were processing the search template in more depth than the non-musicians. Also, the experts had reduced numbers of dwells on the search template and stronger relevancy effects. In contrast, the non-musicians—who were presumably less able to use “chunks” and “templates” to help them to remember the search template—instead adopted a sub-optimal and less accurate strategy of making frequent visual comparisons back to the search template throughout the trial. In summary, our study extends the prior literature by helping to clarify the specific mechanisms by which chunks/templates could support expert visual search performance. Taken together, our results indicate that the experts acquired precise visual search templates, which they then used to efficiently guide their subsequent search.

The present study also helps to clarify why prior music reading studies (for reviews, see Madell & Hébert, 2008; Puurtinen, 2018) have shown a wide range of patterns of visual complexity effects on eye movements, including “null” and “reverse” complexity effects. We demonstrated robust visual complexity effects for expert musicians in the context of a task that eliminated confounds due to performance-related task demands (such as differences in tempo). Furthermore, we demonstrated that experts show stronger visual complexity effects on dwell durations than non-musicians, which suggests that the experts were engaging in more in-depth processing of the music scores than the non-musicians. The non-musicians’ “reverse” visual complexity effect for the dwell count measure suggests that the non-musicians found it particularly challenging to differentiate between targets and distractors in the simple condition, potentially because simple music scores are less visually distinctive than complex scores, and visual similarity is known to impact the difficulty of visual search tasks (for reviews, see Eckstein, 2011; Wolfe & Horowitz, 2017). Building on these findings, future work could further explore how both task demands and expertise act as boundary conditions that jointly determine the impact of visual complexity.

More generally, our study also contributes to the broader question of the extent to which the mechanisms that support visual expertise are domain general or domain specific. As concluded in Brams et al.’s (2019) recent review, the visual expertise field currently needs integrative theories that make links across domains. Towards this goal, the present paradigm facilitates comparisons with other domains by emphasizing the visual component of music expertise. Our results are consistent with prior work showing that music reading experts are better than non-musicians at processing domain-related visual patterns (for reviews, see Madell & Hébert, 2008; Puurtinen, 2018), as shown by their larger eye-hand spans (e.g., Truitt, Clifton, Pollatsek, & Rayner, 1997), and their ability to process configurations of music notes automatically (Wong and Gauthier, 2010). Also, beyond the music reading domain, we replicate prior findings that experts in many domains, including chess and medicine, are better than non-musicians at focusing their attention on relevant information (e.g., Bilalić, Langner, Erb, & Grodd, 2010; Bilalić, Turella, Campitelli, Erb, & Grodd, 2012; Donovan & Litchfield, 2013; Kundel et al., 2008; Sheridan & Reingold, 2014). Furthermore, the musicians’ longer fixation durations relative to non-musicians indicate that they were potentially processing larger configurations of music notes in a given fixation, which is in line with prior findings that chess experts have a larger visual span (Reingold, Charness, Pomplun, & Stampe, 2001).

Building on the present findings, future work could apply the present paradigm’s novel approach of simultaneously presenting the search template and the search array to other domains of visual expertise. Also, given that music reading expertise is multimodal (e.g., Drai-Zerbib, Baccino, & Bigand, 2012), future work could investigate the extent to which the music experts in our study were utilizing auditory and motor processing, in addition to visual processing, to encode the visual search templates. To fully capture the perceptual and cognitive processing that supports music reading expertise, it’s possible that future instantiations of chunking and template theory will need to be extended to accommodate multisensory representations of domain-specific perceptual patterns. Thus, music reading is a complex and ecologically valid domain that provides considerable scope for investigating the extraordinary perceptual skill of experts.

Open Practices Statement

This experiment was not formally preregistered. Interested researchers can receive access to the data and materials by contacting the authors.