Introduction

Face processing has been considered to be a special form of object recognition, because of its features of holistic processing (Farah et al., 1998), not commonly found for other objects. One feature of holistic processing is that faces are often processed at the level of the individual (e.g., “Paulo”), unlike common objects which are typically identified at the category level (e.g., “cat”; Rosch et al., 1976). This individual level of processing involves the discrimination of different faces that share the same set of features (eyes, nose, and mouth) and a common general configuration (eyes above nose, nose above mouth).

Holistic processing, or the obligatory tendency to process all parts as a perceptual unit rather than in isolation, is believed to be important for face recognition (Diamond & Carey, 1986; Gauthier & Tarr, 2002; McKone et al., 2012; Richler & Gauthier, 2014). Evidence for holistic processing, however, can also be found for stimuli other than faces. Holistic processing has been observed in experts in a range of objects and stimuli, such as X-rays (Bilalic et al., 2014), chess boards (Bilalic et al., 2011), fingerprints (Busey & Vanderkolk, 2005), and cars (Gauthier et al., 2003). Detailed processing of objects at the subordinate level seems to underlie this holistic processing in experts (Wong, et al., 2009a).

A commonly used paradigm to study holistic processing is the composite task (for a discussion, see, e.g., Fitousi, 2015; Richler & Gauthier, 2014; Rossion, 2013), in which participants perform a same-different task involving decisions about two subsequently or simultaneously presented stimuli on the basis of one half of each stimulus, while ignoring the other half. The composite effect refers to the fact that the processing of the task-relevant part of a face (e.g., top half) is influenced by an irrelevant half (e.g., bottom half). The composite task provides a stringent test of holistic processing: any interference in performance from the irrelevant on the relevant part indicates automatic and compulsory holistic processing of all parts of the stimulus.

Two types of designs can be used for the composite task: the “partial” design and the “complete” design (Gauthier & Bukach, 2007). In the partial design, the irrelevant face parts (e.g., bottoms) are always different, while the cued parts (upper halves) may be the same or different. Thus, in same trials, relevant top halves are the same, while irrelevant bottom ones are different. In different trials, both upper and lower halves are different. In contrast, in the complete design, both the target and irrelevant part of a test face can either be the same as or different from the study face. Recent work has questioned results based on the partial design for several reasons. For example, the partial design was found to be susceptible to response bias unrelated to holistic processing (Cheung et al., 2008; Richler, Cheung, & Gauthier, 2011a; Richler, Mack, et al., 2011b), including response biases driven by participant strategies (Richler, Cheung, & Gauthier, 2011a). Because of such concerns, the complete design is preferred.

In the complete version of the composite task (for a recent meta-analysis and review, see Richler & Gauthier, 2014), the response (same vs. different) and congruency between the halves of the two stimuli (congruent vs. incongruent) are orthogonally manipulated, resulting in four different conditions. In same–congruent trials, the critical and the irrelevant half of the two stimuli are the same; in the same–incongruent trials, the critical halves of the two stimuli are the same, but the irrelevant halves are different; in different–congruent trials, the halves (both the critical and the irrelevant) of the two stimuli are different; and in different–incongruent trials the critical halves are different, but the irrelevant halves are identical. The congruency effect (i.e., better performance in the congruent than the incongruent condition) is an indication of the influence of the irrelevant part on the response to the critical part. To establish whether this congruency effect suggests holistic processing, stimuli are presented with the top and bottom half aligned, or misaligned. If the congruency effect is due to holistic processing, only the aligned stimuli should show a congruency effect. In other words, in the composite task, holistic processing is expressed by a significant interaction between alignment and congruency.

Holistic processing in word recognition

Traditionally, face and word recognition have been seen as different research domains. Word recognition requires basic-level categorization (Wong & Gauthier, 2007), as words do not share the same number and order of elements with each other (Grainger, 2008). Detailed spatial relationships among letters are not informative of word identity either. Such observations have led Farah and colleagues (e.g., Farah, 1991, 1992; Farah et al., 1998; Tanaka & Farah, 1993) to portray face and word perception as the two opposite ends of a continuum of object recognition: holistic processing for faces versus part-based processing for words.

Such a division of part-based processing for words and whole-based processing for faces, however, may be an oversimplification. As explained earlier, face recognition poses a particular challenge for the human mind, given the fast and detailed analyses required to discriminate between highly similar faces. Holistic processing, the consideration of all parts of an object together, has been suggested to be a means to meet this challenge (Farah et al., 1998; Maurer, Le Grand, & Mondloch, 2002). Word recognition does pose a comparable challenge to the human mind as face recognition. Readers have to identify rapidly words formed by arranging a fixed number of letters from a limited set with a high self-similarity (Kleinschmidt & Cohen, 2006; Wong et al., 2011). In addition, the dual route model of reading states that words can be processed either on a letter-by-letter basis (alphabetic route) or via a direct route towards the word level (orthographic route; Coltheart et al., 2001). The latter route is preferably applied for frequently used words and may involve holistic processing. In addition, configural information manifested as font types has been suggested to be useful for letter and word recognition (Sanocki, 1987, 1988; Wong & Gauthier, 2007).

One of the findings suggesting holistic processing for words is the “word superiority effect” (Reicher, 1969; Wheeler, 1970). This effect refers to the finding that letters are recognized better in the context of a word than in isolation and suggests that whole word representations exist and can affect recognition at the letter or feature level (McClelland & Rumelhart, 1981; Rumelhart & McClelland, 1982).

Recently, Wong and colleagues have started using the complete composite task to study word processing. Using this task, they found evidence of holistic processing during the process of acquiring expertise with a writing system. Wong et al. (2011) observed holistic processing in English words and found stronger evidence for holistic processing of words for native English readers than non-native readers. Moreover, stronger evidence for holistic processing was found for words than for non-words in native readers (see also Schmitt & Lachmann, 2020). These results suggest that holistic processing is a hallmark of expertise with a certain language and writing system (Wong et al., 2011). Similar findings have been obtained for Chinese, a non-alphabetic writing system (Chen et al., 2013; Wong et al., 2012) and Portuguese (Ventura et al., 2017).

Evidence for the processing of configural information was found by Wong et al. (2019). They showed that fluent readers were more sensitive to differences in configural information (jittering of part/letter positions) between two simultaneously presented words in the familiar upright orientation than in the unfamiliar inverted orientation. This effect mimics that for faces, where inversion of the stimuli leads to worse discrimination performance, especially when configural rather than featural differences are involved (Rakover, 2013). These findings suggest that holistic processing for words and faces is both related to the acquisition of expertise and runs counter the idea that holistic and part-based are two extremes of object recognition (Farah, 1991, 1992; Farah et al., 1998; Tanaka & Farah, 1993).

Priming of holistic face processing

It has been argued that holistic processing of faces reflects an attentional strategy. According to this hypothesis, holistic processing is the outcome of an attentional strategy that has become automatized with experience (Richler et al., 2012; Richler, Wong, & Gauthier, 2011c). The extensive experience in utilizing all parts of a face leads to an automatic tendency to subsequently process any face holistically (Richler et al., 2012; Richler, Wong, & Gauthier, 2011c; Wong & Gauthier, 2010).

Several other findings suggest that holistic face processing can be modulated experimentally. First, negative mood induction, believed to promote local processing, was found to decrease holistic processing of faces (Curby et al., 2012). Second, holistic processing of faces could be modulated by a non-visual spatial semantic task that manipulated the construal level, i.e., whether actions are construed in an abstract or concrete manner (Trope & Liberman, 2010). Third, when exposed to questions about “why” events occur (high-level construal manipulation), participants showed higher holistic processing of faces compared to questions about “how” (Wyer et al., 2015). In addition, learned attention to diagnostic parts has been shown to improve holistic processing of faces (Chua, et al., 2014). Likewise, face-like holistic effects appear to require that both the task-relevant and task-irrelevant parts have a history of being attended and that the parts be perceptually grouped, allowing this attentional effect to apply to the entire object (Chua et al., 2015).

The importance of the participant's mindset for holistic processing was recently demonstrated by findings showing that the composite face effect is sensitive to contextual manipulations that induce bottom-up attentional biases. These biases subsequently penetrate holistic face processing. For example, in Gao et al. (2011), on each trial, participants were instructed to solve two independent tasks in sequence. First, to match two simultaneously presented Navon letters (Kinchla, 1974; Navon, 1977) compound hierarchical figures with both a local and a global structure, i.e., larger letters composed of smaller ones. In counterbalanced blocks instruction required attention to either the global (large letters relevant) or the local level (smaller letters relevant), while ignoring the irrelevant level. On each trial, after responding to the Navon letter task, participants were required to match the upper halves of two sequentially presented composite faces. Results suggest that attention to either the global or local level in the Navon task led to transfer effects across tasks and items, i.e., prior attention to the global features of Navon stimuli led to larger holistic processing of faces. These findings suggest the involvement of cognitive penetrability in holistic face processing. This means that the holistic processing of faces is sensitive to knowledge, beliefs, expectations, or other cognitive states (Pylyshyn, 1999).

Does holistic processing of words and faces reflect similar mechanisms?

In the present study, we are assuming that holistic processing of both faces and words is the outcome of an attentional strategy that has become automatized with experience (Richler et al., 2012; Richler, Wong et al., 2011).

The fact that holistic processing has been shown for both faces and words does not mean that similar types of mechanisms underlie the effects for the two types of stimuli (Chen et al., 2013). Indeed, using event-related potentials (ERPs), Chen et al. (2013) showed that holistic processing of words may have an earlier neurophysiological correlate (P1) than that (N170) commonly found for holistic face processing (e.g., Jacques & Rossion, 2009). This suggests that holistic processing for the two types of stimuli may have different underlying mechanisms, with an earlier neural locus for words. However, there is also evidence for a higher-level locus of holistic word processing. Ventura et al. (2017), for example, showed that the word composite effect is independent of surface features of words, with the same magnitude for words in normal fonts as compared with handwritten and alternating-cAsE fonts. The word composite effect might therefore stem from access to the representations at multiple levels, which has not been shown for faces before.

Besides a different timing, the nature of holistic processing may also differ for words and faces. For example, Ventura et al. (2019) used a paradigm with artificial objects similar to Richler, Bukach, and Gauthier (2009a), who showed that contextually induced congruency effects can occur within a single trial between objects of different categories. Ventura et al. (2019) used a different type of artificial objects, Ziggerins (Wong, et al.2009b), and in a stricter test, compared aligned words (which are processed holistically) to aligned pseudowords (which are not processed holistically), and found no evidence that an aligned word induces a stronger congruency effect on artificial objects than aligned pseudowords. Ventura et al. (2019) thus found a dissociation between face and word holistic perception in terms of their contextual influences.

Another study showed a similarly lasting duration of the composite effect for words and faces. In a study of face recognition, Richler, Mack, et al. (2009b) parametrically varied the stimulus duration from 17 ms to 800 ms. The holistic effect, as indexed by the congruency effect, was observed for exposure as brief as 50 ms. From 50 ms onwards it was affected neither by the duration of the study face, nor by the duration of the test face (Richler, Mack, et al., 2009b). Similar independence of presentation duration was found for word stimuli by Chen, Abbasi, Song, Chen, and Li (Chen et al., 2016), who found that variation in the exposure duration between 170 ms and 600 ms did not bring about significant changes in the holistic word effect. Considered more closely, however, these studies suggest that face holistic processes arise after 50 ms (Richler, Mack, et al., 2009b). However, because Chen et al. only evaluated exposure durations between 170 and 600 ms, it is unclear whether holistic processing of words arises as early as for faces.

Holistic processing has been shown for written words, signaled by the word composite effect, similar to the face composite effect. However, and in addition to the evidence reviewed above, which points to differences between face and word holistic processing, word and face recognition also involve different neural mechanisms with an opposite hemispheric lateralization (VWFA, e.g., Dehaene & Cohen, 2011; and fusiform face area, e.g., Kanwisher et al., 1997). Holistic word processes occurring at lexical orthographic locus (Ventura et al., 2017, 2019) might be influenced by other linguistic variables, including phonology. It is then possible that faces and words can both involve holistic processing in their own separate face and word processing systems, but by using different mechanisms.

Taken together, while evidence suggests that there are differences between holistic processing of words and faces, it is yet unclear exactly what these differences are. The present study aimed to determine the exact nature of the differences in holistic face and word processing by directly comparing the effect of attentional manipulations on holistic processing of both types of stimuli. To understand the method we used, recall that we are assuming that holistic processing is the outcome of an attentional strategy that has become automatized with experience (Richler et al., 2012; Richler, Wong et al., 2011). We here use these findings and evaluate the similarity of visual face and word holistic mechanisms using Navon stimulus priming together with the composite task, in a paradigm inspired by Gao et al. (2011). As in Gao et al., the primary task uses compound hierarchical figures in which local elements make up for a global figure in a congruent or incongruent manner. The present study, however, uses non-letter shapes, instead of letters, to create the compound hierarchical figures, for example, circles consisting of smaller circles (congruent) or of smaller triangles, squares, or diamonds (incongruent), for two reasons. First, the use of Navon stimuli composed of letters could lead to linguistic priming effects for the word, but not the face stimuli. Second, using non-letter shapes will reduce the effects of literacy differences in participants. Several studies have shown that the global advantage effect is more robust against stimulus properties, presentation modus (Lachmann et al., 2014), and participants’ literacy skills (Schmitt et al., 2019) for non-letters shapes than for letters.

The idea for testing the similarity of holistic processing in faces and words is as follows. If participants, by instruction, first focus on the smaller, local elements in the preceding Navon task, this may trigger local processing, and thus they will also focus more strongly on the relevant half in a subsequent face or word composite task. When, in contrast, for the primary Navon task participants are instructed to focus on the global figure, they may process the subsequent composite face or word item holistically as well. If holistic processing of words and faces is similarly affected by priming of local and global processing of the Navon stimulus, one should observe similar effects on processing the composite task. Because conclusions about the results may depend on showing an absence of an effect, the standard approach of combining a power analysis with null hypothesis testing and computing effect sizes was complemented with a Bayes analysis.

We first performed a pilot study to configure the Navon task. Since the global advantage effect was shown to depend on a number of stimulus features (e.g., Hübner, 1997; Kimchi & Palmer, 1982; Kinchla & Wolfe, 1979; Rezvani et al., 2020), presentation mode (e.g., Kimchi, 1992; Lamb & Robertson, 1988), and individual factors (e.g. Förster & Higgins, 2005; Kimchi, 1992; Schmitt et al., 2019) it is important to first establish a robust global advantage effect for our Navon stimuli. We first describe this pilot experiment (Experiment 1) and then move on to describe the primed composite face and words task (Experiment 2).

Experiment 1 (pilot study)

Method

Participants

A total of 24 students from the University of Kaiserslautern, aged between 20 and 31 years, naive to the task at hand, right-handed and with normal or corrected-to-normal hearing and vision, took part in the pilot experiment for course credit or a compensation of €10. As defined by recruitment requirements and confirmed prior to the study onset, all participants were fluent in English language as L2, the language used for the study instructions and communication. All participants were native speakers of German and, according to self-report, were not diagnosed as having any reading disorder. The study was conducted according to the guidelines of the Declaration of Helsinki and approved by the Ethics Committee of the Department of Psychology of the University of Lisbon. Participants all provided written informed consent prior to taking part.

Stimuli

Compound, hierarchical figures (Kinchla, 1974), also known as Navon stimuli (see Bouvet et al., 2011; Gao et al., 2011; Navon, 1977), were presented in black against a white background (see Fig. 1). We used small circles, diamonds, squares, and triangles as local elements to form global shapes (large circles, diamonds, squares, triangles). The combination of possible stimuli in all of their local and global configurations resulted in 16 distinct Navon-figures.

Fig. 1
figure 1

Compound hierarchical figures stimuli used in Experiment 1 and as the primary task in Experiment 2

Images were scaled down to the size of 548 × 548 pixels (5.2 × 5.2 cm), presented on a 1,280 × 1,024 pixels screen resolution and 85-Hz refresh rate, resulting in global shapes with a 4.96° of visual angle in width and height, composed of local elements with a visual angle of .42° in width and height (at a viewing distance of 60 cm).

Apparatus

A laptop (HP ProBook 650 G1) with the Xubuntu 18.04 operating system and the OpenSesame stimulus presentation software (Mathôt et al., 2012), connected to a 20-in. CRT monitor (Hitachi CM813ET Plus) was used. The CRT ensured correct timing of the stimuli, allowing presentation to be aligned with the screen refresh. Responses were recorded using a mechanical USB keyboard, using the ‘Q’ and ‘P’ keys only. Data collection was conducted in an experimental room, isolated from external light and sounds and with a constant ambient illumination.

Procedure

In different experimental blocks participants were presented with pairs of Navon stimuli. Depending on the instruction, they were required to indicate on each trial whether the two stimuli were the same or different on the global level (global shapes) or on the local level (local elements), respectively. The two stimuli to be compared were presented simultaneously on horizontally opposite sides of the center (2.6 cm or 2.10° of visual angle apart), at the vertical middle of the screen. Each trial started with a red fixation cross, presented for 500 ms, followed by the two stimuli, presented until the participants responded with the ‘Q’ or ‘P’ key on the keyboard (Fig. 2). If participants did not respond within 2,000 ms, the trial was recorded as an error and the next trial was automatically started. Accuracy feedback (“Correct” in green color or “Incorrect” in red color), as well as the time-out feedback (“Time-out” in blue color) was displayed for 500 ms. To reduce the total testing time and limit the effects of fatigue and practice on the average data, different random but representative sets of stimuli were selected for each participant and testing block (around 90100 trials per block, which depended on the random sampling that matched the requirement of equal numbers of same and different responses and a minimum number of trials per stimulus condition). The instruction (matching the Navon stimulus pairs on a global or a local level) and the assignment of keys for same and different responses (‘Q’ or ‘P’) were counterbalanced within and between the participants, resulting in four experimental blocks, each preceded by eight practice trials, randomly selected from the total stimulus set.

Fig. 2
figure 2

Stimulus sequence of Experiment 1 (pilot experiment). A red fixation cross was presented for 500 ms, followed by the two shapes, presented until participants responded with the ‘Q’ or ‘P’ key on the keyboard. Feedback (“Correct!”, “Incorrect,” or “Time-out,” after 2,000 ms) was then shown for 500 ms

Results and discussion

Figure 3 shows the average reaction times (RTs) of correct responses per condition after removing outliers, i.e. responses shorter than 150 ms or longer than the individual mean RT + 2 SD. A repeated-measures analysis of variance (ANOVA) revealed significant main effect on RTs for Instruction level, with faster responses for matching compound hierarchical figures on the global instruction level (M = 581.43, SEM = 18.7), than on local instruction level (M = 696.82, SEM = 26.2), F (1, 23) = 104.97, p < .001, partial η2= .820, 90% CI [.675 - .873]. The Response type main effect was also significant showing faster RTs when response type was same (M = 611.51, SEM = 19.9), than when it was different (M = 666.75, SEM = 24.6), F (1, 23) = 57.37, p < .001, partial η2 = .714, 90% CI [.506 - .799]. No significant interaction between Instruction level and Response type was found, F < 1.

Fig. 3
figure 3

Response times for each Instruction level (global or local) and Response type (same or different response). Significant main effects of Instruction level and Response type are found in the absence of an interaction. Error bars show the standard error of the mean across participants

A repeated-measures ANOVA on response accuracy (measured as the proportion of correct responses after removing outliers in the response times, i.e. responses shorter than 150 ms and longer than the individual mean RT + 2 SD) using the means per participant per condition showed the same pattern of results and therefore suggests no speed-accuracy trade-off (for details, see the Appendices).

The pilot study confirmed that the stimuli and the design used, evoke a robust global advantage effect, suggestive of a holistic processing for our stimuli and task. The task could therefore be used for priming in the main experiment (Experiment 2), which tested the effects of priming on subsequent composite task.

Experiment 2 (main experiment)

Method

Participants

A total group of 112 participants accepted our invitation for the experiment. Participants were psychology students from the University of Évora, all native speakers and skilled readers of Portuguese, with normal or corrected-to-normal vision and hearing, who received a course credit. None of them took part in Experiment 1. The study was conducted according to the guidelines of the Declaration of Helsinki and was approved by the Ethics Committee of the Department of Psychology of the University of Lisbon. All participants provided written informed consent.

On each trial, participants were instructed to perform two same-different tasks in sequence. The first task was the Navon matching task (see Experiment 1), whereas the second task was a composite stimulus task (faces or words). To avoid transfer of priming effects with one type of stimulus onto the other type of stimulus, the composite stimulus task was administered as a between-subjects variable. Fifty-two participants performed the face composite task ("face group"), whereas 42 participants performed the word composite task ("word group"). Group sizes differed because unequal numbers of participants showed up for the experiment.. After removing participants with accuracy less than 30%, the size in the two groups was close to balanced (40 : 38).

For our first analysis, we used null hypothesis significance testing, which can demonstrate the presence of an effect, but not the absence of an effect. However, an a priori power analysis would inform what number of participants would be needed to detect a certain effect size with a high probability. We based the expected effect size on the literature. The power analysis was performed for two effects of interest. The first one relates to the comparison within each group (faces or words) between the composite effect in the global versus the local priming conditions. For an estimate of the expected effect size, the result from Gao et al.'s study in the priming condition × alignment × congruency interaction was used, giving a partial η2 = .2. Assuming this effect size, a sample size of 13 and 16 per group would be required to achieve a power of .9 and .95, respectively, given α at .05 (calculations performed with G*Power, Version 3.0; Faul et al., 2009). The second effect of interest is the 2 × 2 interaction in an ANOVA with one between-subjects factor (Group) and one within-subjects factor (priming condition), with the composite effect as a dependent variable. No study is available to suggest the expected effect size for this interaction effect. Therefore, the expected effect size was based on earlier observations of the large effect size for the priming effect per se in Gao et al.'s (2011) study (partial η2 = .2), giving a medium effect size (partial η2 = .06) for the interaction with group. A sample size of 22 and 27 per group is needed to achieve a power of .9 and .95, respectively, given α at .5. To ensure sufficient power even when participants drop out or have to be removed from the analysis because of poor performance, we doubled the number of participants from the power analysis (e.g., De Gutis et al., 2013).

Stimuli

The stimuli for the priming (Navon) task were identical to those in Experiment 1 (see Fig. 1). For face composite task, 100 grayscale front-view images from the MPI face database (Troje & Bulthoff, 1996) with neutral expressions were used, cropped to remove the hair and ears. A total of 358 different face composites were created for the same-congruent-aligned; same-incongruent-aligned; different-congruent-aligned; different-incongruent-aligned; same-congruent-misaligned; same-incongruent-misaligned; different-congruent-misaligned; different-incongruent-misaligned trials. Aligned composites were then used in four aligned face stimulus blocks (132 trials each). Another four blocks with misaligned composite face stimuli (each 132 trials as well) were then run (i.e., presentation of stimulus alignment was blocked). The order of all the blocks was counterbalanced across participants using a Latin square. Since half the blocks were preceded by a global and half by a local instruction for the primary Navon task, each composite face was used four times as a study trial in the experiment. Composite images were created by dividing face images along a horizontal line at the bridge of the nose (Fig. 4a, 6.49 × 7.76 cm and 9.75 × 7.76 cm).

Fig. 4
figure 4

a Examples of stimuli – aligned (left) and misaligned (right) composite faces b Trial sequence for the composite faces-matching task

A similar setup was used for the word composite task (for the Word Group). For this task, sets of 132 four Consonant-Vowel.Consonant-Vowel (CV.CV) Portuguese words were used. As in the Face Task, 358 composites were created. Instead of dividing the stimuli into a top and bottom part, words were divided into a left and a right half, between the second and third letter (8.66 × 2.28 cm and 8.66 × 3.91 cm), as illustrated in Fig. 5a. Participants were always asked to indicate whether the left half was the same, ignoring the right halves. Each composite word was used four times as a study trial in the experiment.

Fig. 5
figure 5

a Examples of stimuli – aligned (left) and misaligned (right) composite words b Procedure words – trial sequence for the composite word-matching task

Procedure

Each trial consisted of two same-different two-alternative-choice-reaction-tasks, the second presented after responding to the first one. The first task was a Navon-figure matching task, with a global or local instruction blocked and counterbalanced across participants, as described in Experiment 1. For the Face Group the secondary task was a composite face matching task. For the Word Group the secondary task was a composite word matching task. The secondary task started 400 ms (blank screen) after the response to the Navon task, either with aligned faces or words, or with misaligned faces or words. For both tasks, participants used the ‘1’ and ‘2’ keys of the keyboard.

To explain the task, and to ensure participants understood the instruction, four examples on paper were shown and discussed with feedback from the experimenter. Participants then performed 16 computerized practice trials for the different stimuli using the same procedure as in the experimental trials, followed by the experiment. No feedback was provided for participant’s answers during practice and experimental trials.

Face and word stimuli were presented in the center of a 17-in. CRT monitor. First, a blank screen was presented for 1,000 ms, followed by a fixation cross for 500 ms, followed by the study stimulus, a face or word stimulus, for 400 ms. Then a mask was presented for 800 ms, followed by the test stimulus of the same category, face or word. This stimulus remained on the screen until response, or for a maximum of 2.5 s. Stimulus presentation and data collection was controlled by E-Prime 2.0. Participants were asked to perform the tasks as fast and as accurately as possible.

Participants performing the face composite task were asked to indicate whether the top two halves of the sequentially presented face stimuli were the same or different, ignoring the bottom halves (see Fig. 4b for an example). In between the two faces, a mask was presented, to avoid the use of motion cues to perform the task.

Participants performing the word composite task were asked to indicate whether the left two parts of the sequentially presented words were the same. As in the Face Task, a mask prevented the use of motion cues (Fig. 5b).

Results and discussion

RTs of correct responses and accuracy of 40 participants who conducted the face composite task and those of 38 participants who performed the word composite task were analyzed. Trials with RT < 150 ms or RT > mean RT + 2.5 SD were removed, leading to an exclusion of 3% of the trials for the Face Group and 4% for the Word Group.

First, responses to the priming task were analyzed to ensure that no group differences (Face Task/Word Task) occurred and that the data showed a global advantage effect, to ascertain that priming was the same across the two groups. This was indeed the case. There was no difference in RT and accuracy between the two groups on the priming (Navon) task. A global advantage effect was observed for both groups as well. In the Face Group, matching the compound hierarchical figures was both more accurate (.96 % vs. .94 %), t(39) = 2.49, p = .05, partial η2= .62, and faster on the global level (669.6 ms vs. 770.3 ms), t(39) = 10.64, p < .001, partial η2= .69, while in the Word group, evidence for a global advantage was restricted to RTs (673.15 vs. 765.8), t(37) = 12.0, p < .001., partial η2= .78. Both groups, however, did show priming and the expected effect, and therefore we proceeded to analyze the effects of this priming on the subsequent composite task.

The average data for the composite task are displayed in Fig. 6. To test the statistical significance of the observed pattern, the interaction was examined between Type of composite (Word Group/Face Group), and Type of priming (global or local; see Fig. 2) on a compound measure of the interaction between alignment and congruency on RTs. This compound measure of the composite effect/interaction between alignment and congruency on RTs was computed as: (aligned_incongruent-aligned_congruent)-(misaligned_incongruent-misaligend_congruent), thus higher values indicate higher congruency effect for aligned trials than misaligned trials, the pattern of results expected for holistic processing. An ANOVA revealed a significant between-subject main effect of Type of composite with a stronger composite effect in the Face Group (M = 18.39, SEM = 7.4), than in the Word Group (M = 3.95, SEM = 6.6), F(1,76) = 5.5, p = .021, partial η2 = .06. The Type of priming main effect was also significant, with a stronger composite effect for global priming (M = 19.55, SEM = 3.5) than local priming (M = 2.79, SEM = 4.4), F(1,76) = 11.47, p < .001, partial η2= .11. No interaction of these factors was evident, F (1, 76) = .17, p= .68, partial η2= .002, 90% CI [0-0,03].

Fig. 6
figure 6

Interaction/composite values (alignment × congruency) for reaction time (RT) separately for global and local priming and faces and words. Both types of composite tasks show a significant interaction with the type of priming, but the size of the difference between global and local priming was not significant. Error bars show the standard error of the mean across participants

A similar analysis was conducted on accuracy (Fig. 7). The compound measure of the composite effect/interaction between alignment and congruency on accuracy was computed as: (aligned_congruent-aligned_incongruent)-(misaligned_congruent-misaligend_incongruent). This formula took into consideration that higher values (higher accuracy) are expected for congruent trials. We used A scores (Zhang & Mueller, 2005). This analysis showed significant main effects of Type of composite, with higher composite values in the Face Group (M = .11, SEM = .01), than in the Word Group (M = .01, SEM = .02), F(1,76) = 74.27, p < .0001, partial η2 = .16, and of Type of priming, with higher composite values for global priming (M = .07, SEM = .02) than local priming (M = .05, SEM =.01), F(1,76) = 3.98, p < .05, partial η2 = .05. No interaction was evident, F (1, 76) = .74, p= .39, partial η2= .008, 90% CI [0-0,07].

Fig. 7
figure 7

Interaction/composite values (alignment × congruency) for accuracy measure separately for global and local priming and faces and words. Error bars show the standard error of the mean across participants

The analyses so far do not suggest a difference in the priming effect of the Navon task on the subsequent composite tasks, i.e., priming appears the same for words and faces. So far, however, we have used null-hypothesis significance testing, which can be used to reject a null hypothesis (there is no difference between words and faces), but not to demonstrate the null hypothesis. However, null findings also matter. Experiments have always produced null effects – indeed the possibility of obtaining a null effect is a primary motivation for conducting a study in the first place (Bialystok, 2020). The relevant factor is not necessarily the presence or absence of null effects, or even the frequency with which each outcome is obtained, but rather it is the ratio between positive and null results. In that sense, the interpretation of a null outcome in the context of a majority of null effects must be different from that for a null effect in a majority of significant differences (Bialystok, 2020).

To address the problem with null hypothesis significance testing in the absence of an effect, we performed additional Bayes analyses, which are a more suitable solution to compare the probabilities that the null or the alternative hypothesis is true. The analysis takes the prior probability into consideration and constructs a posterior probability using the observed data. Indeed, the Bayesian framework allows one to quantify how much more likely the data is under the null hypothesis compared to the alternative hypothesis. In those analyses, the Bayes factor BF10 indicates how likely the data are under the alternative hypothesis (e.g., that there is a group difference) compared with the null hypothesis (no group difference) and are directly interpretable as an odds ratio. Reporting Bayesian statistics can help answering whether there is good enough evidence for null differences.

We compared the null hypothesis and the alternative hypothesis for the interaction of Type of Composite (Word group/Face group) versus Type of Priming (Global/Local). Bayes factors (BF10) were computed using JASP (version 0.11.1.0). The first step in model specification concerns the type and spread of the prior distribution. For the most common statistical models, including ANOVA, certain “default” prior distributions are available that can be used in cases where prior knowledge is absent or vague. These priors are default options in JASP and were used in the present analyses. Bayes factors (BF10) yielded a value of 0.03 for RTs, indicating strong evidence for the null hypothesis, and a value of 0.054 for accuracy, again indicating strong evidence for the null hypothesis. These results strongly suggest no difference in priming for word and face composites. .

General discussion

The present study examined whether holistic processing of faces and words share common characteristics by examining the influence of global or local priming by a Navon matching task using compound hierarchical figures on the composite face or word effect for aligned and misaligned stimuli (which measures the extent of holistic processing in stimuli). We assume that holistic processing is the outcome of an attentional strategy that has become automatized with experience (Richler et al., 2012; Richler, Wong et al., 2011). First of all, we replicated the effects obtained by Gao et al. (2011), who previously showed that local or global processing in a Navon task primes global or local processing in a subsequently presented composite task for faces. The present results show that this effect not only takes place for faces but is also evident for composite words. Consequently, the present results suggest that both words and faces show holistic processing that is susceptible to attention manipulations. Because the effect of the instruction in the Navon task was of the same magnitude for faces and words (i.e., no interaction with type of stimulus was found), the results also suggest that holistic processing of the two different types of stimuli may rely on similar mechanisms.

Between-subjects designs are better suited than within-subjects designs because for the latter, there could be transfer of priming effects from one type of stimulus to the other, which may in turn render the priming effects for faces and words more similar even with counterbalancing. Considering the possibility of individual differences, psychometric studies showed that despite reliable variability between individuals in their ability to selectively attend to face parts on a composite test (VHPT-F), this variability appears to be unrelated to face recognition (Richler, Floyd & Gauthier, 2015). Further, Sunday, Richler and Gauthier (2017) found no evidence for reliable individual differences in a similar measure of the part-whole task. These results do not mean that holistic processing is not involved in face and expert object perception. Most of us have considerable experience with faces, and even the most limited amounts of experience with faces could be more than sufficient for high levels of holistic processing (Chua & Gauthier, 2020). One could nevertheless question if possible differences in the first place between the groups might have washed away group effects for priming. This cannot be fully excluded but is unlikely, because of the similar performance for the Navon stimuli task.

The results by Gao et al. (2011) and the present results with both faces and words suggest that the composite effect does not differ between words and faces and/or expert objects but reflects a domain-general processing mechanism that can be used both for word and face processing. This suggests that robust holistic processing underlies word recognition, and thus orthographic reading. In usual reading contexts, words appear close to other words both spatially (in terms of location) and temporally (as many words are recognized within a short time window). The human brain therefore needs to ensure that letters belonging to the same word are grouped into the same perceptual unit rather than mixed with letters from neighboring words. The present findings suggest that even though the brain can perform this task, holistic processing is susceptible to manipulations of attention.

The present results appear to be at odds with past studies that found no evidence for attentional modulation of holistic processing of words. Ventura et al. (2019), who utilized artificial objects, called Ziggerins (Wong, Palmeri, et al., 2009) found no evidence for holistic processing of Ziggerins. Words (supposedly treated more holistically) as inducers had similar effects on Ziggerins as pseudowords (supposedly treated less holistically). Ventura et al. (2019) thus found a dissociation between face and word holistic perception in terms of their contextual influences. In the present study, we show that both word and face composites are susceptible to attentional manipulations, but Ventura et al. (2019) showed that the word composite effect did not spill over to interleaved trials with novel objects as the face composite effect did. So how can we reconcile the differences between the present and past findings? Perhaps any attentional effect from the processing of words would be less long lasting as that from face processing, given (i) the larger number of words that have to be processed in a short time period during text reading, (ii) the more variable length of words, or (iii) the involvement of mechanisms at multiple levels (visual and lexical) for the word composite effect. The difference between the beginning of the first word and the second word composite in the interleaved task of Ventura et al. (2019) was 5,500 ms, as in the original study by Richler, Bukach, and Gauthier, (2009), allowing for all the three mechanisms mentioned. A shorter interleaved time might have given rise to the same kind of spill-over to interleaved trials.

Another conceivable interpretation is that, given presentation times in Ventura et al.'s, 2019 study were long (but in accordance with those used by Richler, Bukach, & Gauthier, 2009), the holistic word processes involved might depend strongly on a late lexical and orthographic stage, and the nature of these linguistic-dependent holistic processes may not be abstract enough to allow an influence on other, nonlinguistic categories. Linguistic regularities such as word frequency and transitional probabilities between sublexical units may lead to the construction of chunks at the whole-word level, according to statistical learning research (Orbán et al., 2008). Such an organization of the complex visual display of letters into representational objects may be crucial in satisfying the highly demanding visual task of word recognition and reading. Holistic word processing at this late lexical or orthographic stage, and the consideration of all parts of a word together, may thus have at its origin linguistic regularities and variables.

Although different neural substrates are involved for holistic processing for faces (FFA: Kanwisher et al., 1997) and words (VWFA: Dehaene & Cohen, 2011), our results therefore suggest that similar mechanisms underlie face and word holistic processing, Earlier modulation of attentional processes seems possible for both faces (Gao et al., 2011, and the present study) and words (the present study), but late/lexical word holistic processing, much influenced by linguistic factors, does not seem permeable to attentional modulations. It will be important in future studies to use the same Navon priming/composite task but manipulating the time allowed for first word/face presentation, from 50 ms to 800 ms, to define with precision the efficiency of word/face holistic processing and the time course of the attentional modulation of word/face holistic processing.