The current study investigated the holistic processing model of expertise by comparing expert and nonexpert populations in visual search tasks. According to the theory underlying this model, expert perception involves “Gestalt” processing insofar as it consists of the co-occurrence of two distinct perceptual acts that engineer dual perspectives of an image (Palmer, 1990). The expert’s first perceptual act is a “global-focal” search utilizing parafoveal and peripheral visual data to construct a holistic perspective of an image in question (Kundel, Nodine, Conant, & Weinstein, 2007; Nodine & Mello-Thoms, 2000). The initially engineered holistic image operates as the ground from which particular figures organized within that ground “pop-out” when relevant. Hence, the second perceptual task is a “focal feature analysis” of the holistically construed image, in which the relevant targets are selected for fixation and action (Kundel et al., 2007). These two perceptual tasks operate as dual, parallel processing streams in which experts see both the whole of their field of search and their target on a near-simultaneous basis (Drew, Evans, Võ, Jacobson, & Wolfe, 2013; Nodine & Kundel, 1987; Nodine & Mello-Thoms, 2000).

Historically, radiology has been the most studied domain of expertise with respect to holistic visual processing (Sheridan & Reingold, 2017). In early studies, Kundel and Nodine (1975) found that in a split second, radiologists could detect perturbations in radiographs with remarkably high accuracy. Since then, studies on the visual expertise of radiologists have shown that experts exhibit longer saccadic amplitudes and faster time to the first fixation of a target than do novices (Brams et al., 2019; Gegenfurtner, Lehtinen, & Säljö, 2011). The holistic model suggests that expert radiologists naturally construct holistic images in their visual search from which perturbations and targets are focally selected with greater speed and accuracy than do novices (Kundel et al., 2007).

Additionally, the strengths of visual experts who rely on holistic visual processing include the ability to utilize peripheral and parafoveal visual information in search tasks. The use of extrafoveal perceptual information has been shown to aid in experts’ visual searches across medical domains insofar as the same experts perform worse when the peripheral information is occluded (Sheridan & Reingold, 2017). To control for extrafoveal perceptual data, studies impose gaze-contingent viewing (GCV) conditions on participants. In GCV, all of an image except a small window at the center of the eyes’ focus is occluded. By using GCV to remove peripheral and parafoveal perceptual data, an expert can be effectively prevented from holistically processing an image, causing them to exhibit more saccades at a lower amplitude than in normal conditions (Carmody, Nodine, & Kundel, 1980; Viviani & Swensson, 1982).

The current study sought to determine the extent to which the holistic visual processing model is employed by a previously unstudied population of visual experts: architects. Although there is no agreed upon definition of “visual expertise” in particular, there is a working characterization that can be gleaned from the wide array of empirical work on the topic. Empirical research has been conducted with radiologists, but also with fingerprint examiners, orthodontists, birdwatchers/ornithologists, TSA agents, visual artists, among many others. These domains have the following features in common: (a) performance and skill specific to a particular visual domain (by contrast to a domain-general ability); (b) performance success above a threshold set by the standards and parameters of the respective domain; and (c) performance that depends primarily on visual-perceptual performance. In the case of architects, the expertise is also visual as expressed through façade, interior, and landscape design in architectural software suites.

Because architects are an unstudied population of visual experts, this study innovated a task to measure the visual expertise of architects. For this purpose, we chose to investigate architects’ potential use of holistic visual processing in target search tasks by using eye tracking and GCV conditions. We hypothesized that architects, like other visual experts, would likewise exhibit this behavior when performing a focused target search unhindered by GCV conditions. The existing evidence on visual expertise suggests that architects should holistically process visual data. However, to preview our results, we found no evidence for such processing within this group. This implies that there are limits to the holistic processing account of visual expertise, and also that factors such as search target type and domain-specific strategies need to be more carefully considered in future research. Architects therefore provide an interesting test case to examine the robustness of the holistic processing account of visual expertise.

Method

Participants

Data were collected from 55 participants with normal or corrected-to-normal vision, divided into two groups (expert and naïve). The sample size was based on power calculations performed with G*Power (Version 3.1.9.2; Faul, Erdfelder, Lang, & Buchner, 2007). Our expert group consisted of professional architects (n = 27, seven females, 20 males) who held either (or both) a master’s degree in architecture, or a license to practice architecture. The average age of the recruited architects was 42 (SD = 11.63) with an average of 19 years of experience (SD = 12.12). Due to poor eye-tracking calibration and program failure, five of the 27 architects were omitted from the analyses. The naïve group (n = 28, 20 females, eight males) consisted entirely of undergraduate students. The average age of the undergraduate students was 19 (SD = 1.93). Due to poor eye-tracking calibration, four of the 28 undergraduates were omitted from the analyses. None of the participants had any experience in radiology, and only the architects had experience in architecture. The undergraduates received course credit for participation, and the architects were compensated $30 for their time.

Materials

Data were collected using the EyeLink Plus (SR Research, Ontario, Canada) eye tracker at a rate of 1,000 pairs (x–y coordinates) per second and two laptops (one running EyeLink Version 5.15 and the other running the experiment script using the Psychophysics Toolbox; Brainard, 1997; Kleiner, Brainard, & Pelli, 2007; Pelli, 1997) for MATLAB (Version R2017a). A headrest minimized participants’ head movements. All of the images were of a consistent size, 28.3 cm × 28.3 cm, subtending a visual angle of 24.6° in either direction. For the gaze-contingent viewing condition, the unoccluded portion of the image was centered on the screen coordinate estimated to be in the center of the observer’s gaze by the eye tracker. This unoccluded region gradually faded to gray following a bivariate Gaussian circle whose half width at half maximum measured 1.8°. Any portion of the image farther than 1.8° away from the fixated pixel was 50% or more gray; any portion that was farther than 4.9° from fixation was 100% gray.

Chest radiographs were taken from the Japanese Society of Radiological Technology database (Shiraishi et al., 2000). This database provides high-resolution chest radiograph images and associated diagnosis (malignant or benign), position, and degree of subtlety of nodules. Twenty-two radiographs of varying subtlety were selected for the study. Eleven contained a malignant lung nodule.

The perspective images were created in the 3D model software 3DS Max. We created 22 perspective images. These perspective images contained between 40 and 150 geometric shapes with mutually parallel lines in three-dimensional space, creating a one-point perspective where the parallel lines from each shape converged on a single vanishing point. On half of the images, a single shape was shifted in space so that it was out of perspective relative to all the other objects in the image (see Fig. 1a). Prior to this study, we conducted a pilot study with a separate group of architecture students who did not take part in the experiment to ensure that this task was neither too challenging nor too simple. The one/zero-point perspective task was chosen after discussions with architects who suggested that the ability to detect objects that are not in a correct orientation is a basic skill necessary to properly interpret blueprints.

Fig. 1
figure 1

Examples of the images viewed by the participants in both normal and gaze-contingent viewing. a Perspective image in the normal viewing condition, with the target circled. b Radiograph image in the normal viewing condition, with the target circled. c Perspective image in the gaze-contingent viewing condition. d Radiograph image in the gaze-contingent viewing condition

Procedure

Participants were greeted, then asked to read and sign an informed consent agreement. Next, participants were calibrated to the eye tracker; calibration of each participant remained within an error-margin of .5 degrees in visual angle. Each participant was told they would view a series of radiographs and perspective images (see Fig. 1). These images were presented in two viewing conditions. In the first condition, normal viewing, the entire image was visible to the participant (see Fig. 1a–b). In the second condition, gaze-contingent viewing (GCV), the majority of the image was occluded from the participant’s vision, except for a small window that was visible contingent upon the participant’s gaze. In this condition, the participant controlled the viewing window with their current fixation point. As the participant moved their eyes, the window similarly moved, revealing previously occluded space (see Fig. 1c–d).

The experiment consisted of 48 trials in total. Four of these trials were initially given as practice in order to acquaint participants with the trial task. Practice consisted of two images in both radiology and perspective tasks. The participants were shown the correct answer in these practice trials only. The remaining 44 images were divided into four blocks of 11 images: 11 radiographs in normal viewing, 11 radiographs in gaze-contingent viewing, 11 perspective images in normal viewing, and 11 perspective images in gaze-contingent viewing. The blocks were counterbalanced across participants, and the image presentation was randomized within blocks. In each image, the participant clicked on “the target.” The target in the radiograph images was a cancer nodule (circled in Fig. 1b). The target in the perspective images was an out-of-perspective box (circled in Fig. 1a). Participants were made aware that half of the images contained a target, and the other half did not. If the participant concluded that an image contained no target, they were told to click a button labeled “Nothing wrong.” After each image trial, the participant reported their confidence in their decision on a scale of 1 to 6, 1 being least confident and 6 being most confident.

Results

Below we present behavioral, eye tracking, and qualitative results. Our primary interest was to determine whether the expert group was differentially influenced by a GCV manipulation. To this end, we computed a series of mixed-effect analyses of variance (ANOVAs) consisting in factors of two groups (architects vs. naïve controls) and two viewing conditions (GCV/normal). Descriptive statistics for the analyses are presented in Tables 1 and 2.

Table 1 Perspective images. Means and standard deviations in relation to observer group, view type, and target presence when viewing the perspective images
Table 2 Radiograph images. Means and standard deviations in relation to observer group, view type, and target presence when viewing the radiograph images

Behavioral results

When viewing perspective images, the accuracy of both architects and naïve participants was harmed by GCV for only target-present trials, F(1, 44) = 6.07, p = .02, generalized eta-squared (ηG2 = 0.06); target-absent trials, F(1, 44) = 0.04, p = 0.85, ηG2 < 0.01 1. Architects performed better than the naïve group on the perspective task only in target-present trials, F(1, 44) = 4.29, p = 0.04, ηG2 = 0.05; target-absent trials, F(1, 44) = 2.93, p = .09, ηG2 = 0.03. There was no interaction between viewing condition and group, target-present trial, F(1, 44) = 0.59, p = .44, ηG2 = 0.01, target-absent trials, F(1, 44) = 0.03, p = .86, ηG2 < 0.01 (see Fig. 2). Also, d-prime (d') provides additional evidence that architects possess an increased sensitivity to the targets within the perspective images, F(1, 44) = 16.85, p < .05, ηG2 = 0.16, but this effect did not interact with viewing, F(1, 44) = 1.17, p = .29, ηG2 = 0.01.

Fig. 2
figure 2

Average accuracy and d' of both groups when viewing both radiograph and perspective images in both viewing conditions

For response time, both the architects as well as the naïve group were negatively affected by GCV, target present, F(1, 44) = 48.28, p < .01, ηG2 = 0.24; target absent, F(1, 44) = 17.52, p < .01, ηG2 = 0.07. Contrary to our predictions, architects were slower than the naïve group on target-absent trials and were statistically equivalent on target-present trials, F(1, 44) = 3.74, p = .06, ηG2 = 0.06; target-absent trials, F(1, 44) = 14.06, p < .01, ηG2 = 0.2. There was no interaction between group and viewing condition, target present, F(1, 44) = 0.55, p = .46, ηG2 < .01; target absent, F(1, 44) = 0.08, p = .78, ηG2 < 0.01.

In the radiology task, although sensitivity (d') was reduced on this task in the GCV condition for both groups, architects displayed no evidence of an increased sensitivity to targets, F(1, 44) = 0.11, p = .74, ηG2 < 0.01. There was no evidence of an interaction between viewing condition and group in this task, target present, F(1, 44) = 0.12, p = .73, ηG2 < 0.01; target absent, F(1, 44) = 0.02, p = .88, ηG2 < 0.01.

As with the perspective task, both groups were harmed by gaze-contingent viewing, target present, F(1, 44) = 87.04, p < .01, ηG2 = 0.35; target absent, F(1, 44) = 101.03, p < .01, ηG2 = 0.32. Architects were significantly slower than the naïve group when searching the radiograph images, target present, F(1, 44) = 13.15, p < .01, ηG2 = 0.18; target absent, F(1, 44) = 12.51, p < .01, ηG2 = 0.18. Finally, there was an interaction between view type and group for only target present trials, target present, F(1, 44) = 9.07, p < .01, ηG2 = 0.05; target absent, F(1, 44) = 2.6, p = .11, ηG2 = 0.01.

Eye-tracking analyses

Saccadic amplitude represents the mean length (in degrees of visual angle) of all saccades in a given time window. When viewing perspective images, both the architects as well as the naïve group were harmed by GCV, target present, F(1, 44) = 139.18, p = .01, ηG2 = 0.44; target absent, F(1, 44) = 132.22, p < .01, ηG2 = 0.44. Contrary to our predictions, architects displayed a lower saccadic amplitude than the naïve group, target present, F(1, 44) = 29.69, p = .01, ηG2 = 0.34; target absent, F(1, 44) = 37.74, p < .01, ηG2 = 0.39. There was an interaction between view type and group type, target present, F(1, 44) = 10.9, p < .01, ηG2 = 0.06; target absent, F(1, 44) = 10.01, p < .01, ηG2 = 0.06 (see Fig. 3).

Fig. 3
figure 3

Average saccadic amplitude of both groups when viewing both radiograph and perspective images in both viewing conditions

Saccadic amplitude in radiograph images echoed the perspective task data. Both the architects and the naïve group were harmed by GCV, target present, F(1, 44) = 101.45, p < .01, ηG2 = 0.47; target absent, F(1, 44) = 184.16, p < .01, ηG2 = 0.58. Again, the naïve group displayed a higher saccadic amplitude than did architects, target present, F(1, 44) = 10.85, p < .01, ηG2 = 0.13; target absent, F(1, 44) = 33.98, p < .01, ηG2 = 0.34. There was an interaction between viewing condition and group only on target-absent trials, target present, F(1, 44) = 0.09, p = .76, ηG2 < .01; target absent, F(1, 44) = 7.02, p < .05, ηG2 = 0.05.

Time to first fixation represents the period of time that elapsed between the beginning of a trial and the first moment that a participant fixated on the target. When viewing perspective images, both the architect and naïve group’s time to first fixation was harmed by GCV, F(1, 44) = 44.23, p < .01, ηG2 = 0.28. There was no main effect of group on time to first fixation, F(1, 44) = 1.2, p = .28, ηG2 = 0.02. There was also no interaction between group and viewing condition, F(1, 44) = 0, p = .98, ηG2 < 0.01.

While viewing radiograph images, both the architects and the naïve group were harmed by GCV, F(1, 44) = 26.39, p < .01, ηG2 = 0.2. There was no main effect of group on the radiograph images, either, F(1, 44) = 1.35, p = .25, ηG2 = 0.02. Finally, there was no interaction between group and viewing condition, F(1, 44) = 1.38, p = .25, ηG2 = 0.01.

Decision time measures the amount of time that a participant takes to record their response after having first fixated the target. While viewing perspective images, the decision time of both the architects as well as the naïve group was harmed by GCV, F(1, 44) = 22.03, p < .01, ηG2 = 0.13. There was no main effect of group, F(1, 44) = 0.67, p = .42, ηG2 = 0.01. There was also no interaction between group and viewing condition on decision time, F(1, 44) = 0.61, p = .44, ηG2 < 0.01.

Decision time on radiograph images was similar to the perspective task. Both architects and the naïve group were harmed by GCV, F(1, 44) = 56.65, p < .15, ηG2 = 0.28. In contrast to the perspective images, there was a main effect of group on the radiograph images, with architects taking longer than the naïve group to make a decision, F(1, 44) = 9.12, p < .01, ηG2 = 0.13. However, there was no interaction between viewing condition and group, F(1, 44) = 2.86, p = .10, ηG2 = 0.02.

Scan path ratio is a metric of how efficiently the eye moves to a target’s location. It is calculated by comparing the distance the eye travels before reaching a target to the distance from the eye’s starting position to the target position (Castelhano & Henderson, 2007). On the perspective image task, there was no main effect of view type on either group, F(1, 44) = 1.34, p = .25, ηG2 = 0.02. There was also no main effect of group type on scan path ratios, F(1, 44) < 0.01, p = 0.98, ηG2 < 0.01. Finally, we found no interaction between group type and view type, F(1, 44) = 0.01, p = .92, ηG2 < 0.01.

The scan path ratios on the radiograph images reflected a similar pattern. There was no main effect of view type, F(1, 44) = 0.12, p = .73, ηG2 < 0.01, or group, F(1, 44) = 0.08, p = .78, ηG2 < 0.01. We also found no interaction between view type and group type, F(1, 44) = 1.08, p = .30, ηG2 = 0.01.

Qualitative reports

In a formal postexperiment interview, when asked “Did you employ any strategies in searching for targets in the perspective images,” seven out of 22 architects used language strongly suggestive of holistic processing. These phrases included “I first scanned, then searched,” “I looked at the overall image before the target popped out,” and “I let me eyes go fuzzy so the target would show itself.” None of the naïve participants, when asked the same question, indicated any use of holistic processing. A chi-squared test indicates that this pattern of responses was significantly different for the two groups, χ2(1, N = 46) = 4.74, p = .03. An exploratory analysis was conducted to evaluate the eye-tracking metrics of specifically this subset group of architects and how they compared with the other architects we recruited. Contrary to this group’s qualitative reports, the exploratory analysis did not yield significantly different values from the other architects. While this subset group of architects used language indicative of the sort of phenomenology typically associated with holistic processing, we found no evidence of this occurring.

Discussion

Given the strong evidence from radiology and other fields of visual expertise, we anticipated that visual experts across domains would utilize holistic processing in target search tasks (Brams et al., 2019; Gegenfurtner et al., 2011; Sheridan & Reingold, 2017). Prior work from our group and others has suggested that the size of the attentional window, or one’s functional visual field (FVF), varies depending on the task. Focused tasks and tasks where less is known about a target tend to lead to a smaller FVF, whereas tasks that are more familiar are associated with an expanded FVF (Belopolsky, Zwaan, Theeuwes, & Kramer, 2007; Drew, Boettcher, & Wolfe, 2017). One possible explanation for the superior visual performance associated with expertise could be an expert’s expanded FVF resulting from their trained understanding of the target search within a particular domain. This conceptualization of visual expertise would predict that forcing our experts to use a relatively small FVF with a GCV manipulation would differentially disadvantage performance in target searches. Nevertheless, architects exhibited lower degrees of each classic measure for holistic processing than the naïve controls. Architects had shorter saccades, a longer time to first fixation, and slower decision time than their naïve counterparts. And while there is some evidence of visual experts (e.g., TSA agents and orthodontists) exhibiting slower decision times than novices, architects further distinguished themselves as a unique class of visual expert in exhibiting none of the classic measures of holistic visual processing (Biggs, Cain, Clark, Darling & Mitroff, 2013; Jackson, Clark, & Mitroff, 2013). Although architects did not exhibit patterns of eye movements that are consistent with holistic processing when performing the perspective task, the combination of qualitative reports and the significant difference in the accuracy of architects compared with the naïve controls suggest that the task did, in fact, measure architects’ expertise.

It is notable that our expert population was much more experienced in their respective field than in many studies of visual expertise currently in the literature. Often, studies in expertise are criticized for employing intermediates rather than actual experts in their respective fields (Montero, 2016). Experts are commonly defined as people who have dedicated at least 10 or more years to improving their abilities within a particular domain (Bereiter & Scardamalia, 1993; Ericsson 2018; Montero, 2016; Yarrow, Brown, & Krakauer, 2009). Our study was unique not only in that it investigated a novel expert population but also that we recruited experts with an average of 19 years of experience in their field. In short, it is not likely that our data were affected by a lack of expertise in our measured population.

Additionally, although the richness in years of experience made for a more dramatic gap in the age of our expert and naïve groups, it is similarly unlikely that this had an effect on our data. Good evidence shows that, unsurprisingly, visual ability declines with age (Andersen, 2012; Owsley, 2011). However, the decline likely did not affect our data because the decline in visual ability occurs in populations significantly older (average age: 60) than were our architects (average age: 42; McKendrick, Weymouth, & Battista, 2013). Furthermore, prior work examining the developmental trajectory of holistic processing has suggested that the tendency to use holistic processing increases with age (Konar, Bennett, & Sekuler, 2013, 2010). This is inconsistent with our results, which showed no differences between our groups. Hence, any limitations that may exist in our data concerning holistic processing are likely due to factors other than age or degree of expertise.

Overall, our data suggest that the holistic visual processing model remains incomplete as an explanatory model of expertise across domains. We believe there are two explanations for this unexpected finding: holistic processing may either be task specific or domain specific.

Because our perspective image task was successful in capturing architect expertise without holistic processing behaviors, it may be that the task limited holistic visual processing. Even if architects tried to use holistic processing strategies (as their qualitative reports suggested), the perceptual affordances of the perspective task may have precluded their ability to utilize these strategies. If this is the case, it may be that holistic processing is task specific. The majority of tasks that have measured holistic processing in experts to date are radiograph searches in radiology (Brams et al., 2019; Sheridan & Reingold, 2017). Hence, our data interpreted in the foregoing way indicate that holistic processing may indeed be an artifact of the particular affordances present in radiographs, but not in perspectives. In other words, whereas radiographs afford the possibility of holistic processing strategies, our perspective task may not.

An alternative interpretation is that because the perspective task was successful in measuring the expertise of architects with respect to accuracy, it may simply be the case that architects do not employ holistic visual processing behaviors as their radiologist counterparts do. To our knowledge, this study was the first to measure the visual expertise of architects, and although we hypothesized that all visual experts would utilize the classic holistic processing strategies, it remained to be seen whether those holistic strategies were domain specific. If our data are interpreted in this way, they suggest that the holistic visual processing model is limited insofar as it is domain specific. In other words, it may be that both radiologists and architects process holistically, but in distinct ways. This may be due to the different types of search that radiologists and architects perform on a day-to-day basis (holistic vs. otherwise), or to the sorts of images that they are trained to look at (radiographs vs. blueprints).

Conclusions

The purpose of this study was to evaluate the holistic visual processing model of expertise in a new domain: architecture. To do so, we had to develop and validate a perspective-based task to measure expertise in this domain. We found that in target-present conditions for the perspective task, architects performed significantly better than the naïve group. This highlights the fact that our perspective task was successful in capturing the expertise of architects. However, both architects and the naïve group performed as expected on the radiograph tasks: poorly and with no evidence of viewing condition interactions that would indicate holistic processing. This suggests that our architect population is not better at visual search tasks across domains, but rather are specifically skilled in searching for targets in perspective images.

Despite the increased accuracy architects exhibited in the newly developed perspective search task, the unexpected failure to find unequivocal evidence in favor of holistic processing in this new group of visual experts highlights the possibility that visual expertise behaviors vary across different domains and tasks. Consequently, the data have led us to conclude that architects are either precluded from utilizing holistic processing strategies due to the construction of the specific task of this experiment, or otherwise do not engage in the same holistic processing strategies as other visual experts do. In either case, these conclusions would highlight an incompleteness in the holistic visual processing model as a measure and explanation of visual expertise. In order to better determine the limits and theoretical scope of the holistic visual processing model, further research should include more nonmedical domains and use nonradiograph target searches. We believe that an expansion in the scope of research done in holistic processing is a fruitful path to gather new, illuminating data about the nature of visual expertise more generally. This might allow us to generate and test a domain-general model of visual expertise. This study is a first step in inspiring this new direction of research and investigation.