Theories of holistic face processing

A wealth of literature has posited that human faces are processed holistically (e.g., Curby, Huang, & Moerel, 2019; Tanaka & Farah, 1993; Yin, 1969; Young, Hellawell, & Hay, 1987; Zhao, Bülthoff, & Bülthoff, 2016a; see Richler & Gauthier, 2014, for a review), and several hypotheses have been proposed that might explain the origin of holistic face processing. One is the domain-specific hypothesis, which proposes that holistic face processing comes from either an internal representation or an innate template or from prior experience of face discrimination in early infancy (Kanwisher, 2000; McKone, Kanwisher, & Duchaine, 2007; Robbins & McKone, 2007). The template may take the form of a 3D facial structure (McKone, 2008; Zhao, Bülthoff, & Bülthoff, 2016b) or a 2D T-shape of upright front-view faces (Morton & Johnson, 1991; Rosenthal, Levakov, & Avidan, 2018; Rossion, 2013; Rossion & Boremanse, 2008; Tsao & Livingstone, 2008). Another potential explanation is based on the expertise hypothesis, which proposes that holistic processing develops with expertise, resulting in automatized attention to the whole object (Diamond & Carey, 1986; Gauthier & Tarr, 1997; Richler, Wong, & Gauthier, 2011b). Besides these two influential theories, recent studies suggest that both object-based (i.e., via a bottom-up route) and experience-driven (via a top-down route) perceptual grouping contributes to holistic processing (Zhao et al., 2016a, 2016b; see also Curby, Entenman, & Fleming, 2016; Curby, Goldstein, & Blacker, 2013; Zhou, Cheng, Zhang, & Wong, 2012).

Although there has been a long debate on which of these hypotheses is best supported by the existing evidence, surprisingly little is known about the nature of holistic face representations (for reviews, see McKone & Yovel, 2009; Murphy & Cook, 2017; Piepers & Robbins, 2012; Rossion, 2013). For example, although it is well accepted that configural/spacing information regarding facial features is involved in holistic face representations (e.g., Hayward, Crookes, Chu, Favelle, & Rhodes, 2016; Rossion, 2008), what constitutes “configural” information is somewhat under-specified (Burton, Schweinberger, Jenkins, & Kaufmann, 2015; Murphy & Cook, 2017). Face recognition is, for example, surprisingly insensitive to vertical and horizontal stretching. On the one hand, observers can easily recognize faces stretched to twice their true height (Hole, George, Eaves, & Rasek, 2002) or different photos of a familiar person that contain varied configural information (Burton et al., 2015). On the other hand, when people were asked to resize distorted faces to their correct shape, they were inaccurate at this task for both familiar and unfamiliar faces (Sandford & Burton, 2014). Such findings suggest that the spacing information in a holistic face representation may not be an absolute metric but rather coded relatively, in reference to other facial information (Burton et al., 2015).

Research on holistic face processing also suggests that the holistic face representation may not be an absolute template. For example, Richler, Floyd, and Gauthier (2014), using the Vanderbilt Holistic Face Processing Test, demonstrated that people process faces holistically even when tested with new images of the same person. However, research that manipulated motion and viewpoint obtained mixed results (see the following sections for details).

Therefore, the present study aimed to systematically investigate whether the holistic face representation is constant, that is, whether holistic face processing is tolerant of within-person variation such as motion variation and viewpoint variation.

Motion and holistic processing

One may encounter another person’s face in static (e.g., a photo) or dynamic (e.g., nodding, smiling) form, and configural information and facial features can relatively vary or remain invariant while faces are moving (Burton et al., 2015; Piepers & Robbins, 2012). Dynamic face stimuli can help clarify the nature of holistic face representations (Christie & Bruce, 1998; Piepers & Robbins, 2012; Zhao & Bülthoff, 2017) in the following ways (Christie & Bruce, 1998). First, a moving sequence may provide a more extensive set of exemplars to be stored for an individual face. Second, moving sequences should provide a better three-dimensional (3D) representation of a face than a static image (Ullman, 1979). Third, people may extract invariant characteristics of faces from moving sequences (i.e., features that remain constant across changes in viewpoint and expression). According to the research on familiar faces (Burton, Jenkins, Hancock, & White, 2005; Jenkins & Burton, 2008), a face’s invariant characteristics may take the form of an average representation.

There are two types of dynamic faces (for reviews, see Bruce, 1994; Xiao, Quinn, Ge, & Lee, 2012, 2013), one containing only rigid motion (e.g., nodding), the other elastic motion (e.g., smiling). Only a few studies have investigated holistic processing for either rigidly moving faces (e.g., Xiao et al., 2012; Zhao & Bülthoff, 2017) or those containing elastic motion (e.g., Cook, Aichelburg, & Johnston, 2015; Favelle, Tobin, Piepers, Burke, & Robbins, 2015; Xiao et al., 2013), and their findings have led to inconsistent conclusions. Here, we focused only on rigidly moving faces.

All earlier studies on holistic processing of dynamic faces have adopted the composite face task, which has been widely used to measure the degree of holistic processing of dynamic and static faces (e.g., Cheung, Richler, Palmeri, & Gauthier, 2008; Young et al., 1987). There are currently two popular versions of this task (see Richler & Gauthier, 2014, for a review; see Richler et al., 2014, for another version: the Vanderbilt Holistic Face Processing Test). One version is based on the “partial” or “standard” design (Young et al., 1987; see Fig. 1a for illustration). Here, in a trial, two faces appear in a sequence. The bottom halves of the two faces are always different, while the top halves may be the same or not, and the two halves of the faces may be aligned or misaligned. The task for viewers is to determine whether the top halves of the two faces are identical. When the top and bottom halves are aligned, it is difficult for participants to base their judgement only on the top half of a face and not be influenced by the bottom half. When they are not aligned, the bottom’s effect on the top decreases. Using this design, the strength of holistic processing is often measured by the difference between the aligned and misaligned conditions.

Fig. 1
figure 1

Stimuli, design, and trial procedures used in Experiments 1a and 1b. (a) Example of a complete composite design. The targets are the top parts of the faces. The frames indicate the stimuli that represent the partial composite design. (b) Examples of dynamic faces (top panel) and multi-static faces (bottom panel) at 30°, 60°, and 90° viewpoints. (c) Examples of a study-test consistent and study-test inconsistent stimulus pair

The second version of this task is based on the “complete” or “full composite” design (Farah, Wilson, Drain, & Tanaka, 1998; Richler et al., 2014; see Fig. 1a for illustration), where both the top and bottom halves of the two faces may or may not be the same. When both the top and bottom halves of the two faces are identical, or when both are different, a trial is called “congruent.” When only the top or the bottom halves of the faces are identical, a trial is called “incongruent.” When the top and bottom halves of the faces are aligned, the congruent condition facilitates correct decisions, while the inconsistent condition hinders judgment. When the top and bottom faces are misaligned, the congruent effect disappears. This paradigm thus measures the holistic processing of faces based on the interaction between congruency and alignment.

Xiao et al. (2012), using a partial composite task, had participants learn faces in rigid motion (coherent motion rotating from 90° left to 90° right) or multi-static faces (created with the same dynamic image frames but presented in a randomized order as shown in Fig. 1b (bottom panel) or in a sequence with large intervals between images to prevent apparent motion) and then tested them on static front-view composite faces. Their results show holistic processing for multi-static faces, while no such effect was observed for faces in rigid motion. These results suggest that rigid motion somehow disrupts a viewer’s ability to process subsequently seen test faces holistically, whereas presenting the same images without coherent motion does not seem to interfere with holistic processing. In contrast, Zhao and Bülthoff (2017), using a complete composite task, found that dynamic faces were processed as holistically as static faces, regardless of whether the study and test faces were presented in motion (coherent motion rotating from 30° left to 30° right) or static form.

We conjecture that the differences in viewpoints used in Xiao et al. (2012) and Zhao and Bülthoff (2017) (i.e., -90° to + 90° vs. -30° to +30°), and the differences in the use of static faces (i.e., multi-static vs. 0° face) might have contributed to the discrepancy between the two studies. Another possibility is that the authors used two different versions of the composite face task, which have been argued to differ regarding response biases (for reviews, see Richler & Gauthier, 2014; Rossion, 2013). In the partial design, the “same” trials are always incongruent while the “different” trials are always congruent, which potentially leads to a response bias (Richler & Gauthier, 2014). Furthermore, because often only “same” trials are used for the analysis of holistic processing, it is not possible to separate a potential response bias from sensitivity (Rosenthal et al., 2018). The complete design solves the response bias problem that is innate to the partial design, and its effect size is three times that of the partial design (Cheung et al., 2008; Richler & Gauthier, 2014; Richler, Mack, Palmeri, & Gauthier, 2011a).

Viewpoint and holistic processing

Viewpoint changes come from two types of face movements: picture-plane rotations (e.g., inverted faces in Mondloch & Maurer, 2008; Rosenthal et al., 2018; Rossion & Boremanse, 2008) and depth rotations (e.g., profile faces in McKone, 2008). Picture-plane rotations can distort the T-shaped template of a face. Depth rotations also distort the T-shaped construct of a face, but differently from picture-plane rotations, by shortening the distance between the eyes, which also leads to changes in configural information. Investigating the effects of picture-plane and depth-rotations on holistic processing will thus help us understand whether holistic face representation is tolerant of these types of within-person variations.

Previous studies using composite paradigms (e.g., Mondloch & Maurer, 2008; Rosenthal et al., 2018; Rossion & Boremanse, 2008) have shown that picture-plane rotations can affect the holistic processing of static faces, but the reported result patterns differ. Both Mondloch and Maurer (2008) and Rossion and Boremanse (2008) investigated the face partial composite effect at various rotation angles (0°, 30°, 60°, 90°, 120°, 150°, 180°). Mondloch and Maurer (2008) found a linear decrease in the face composite effect with rotation and suggested this to be a result of experience. It remains, however, unclear why holistic processing continued to decrease beyond 90° in their experiment. Rossion and Boremanse (2008) found an equally strong face composite illusion for faces presented at 0° and up to a 60° rotation, then a dramatic decrease at 90°, followed by a stable effect up to a 180° rotation. This pattern suggests that holistic processing only becomes evident when the stimulus orientation matches the viewer’s innate template for the upright position or their internal representation resulting from common visual experience. Rosenthal et al. (2018) adopted the complete composite task and observed a sharp decline in holistic processing when faces were rotated 30° away from the upright position. The differences between all other adjacent viewpoints (30°, 60°, 90°, 120°, 150°, and 180°) were not significant. This indicates that holistic processing of faces is highly orientation-specific and depends on upright face presentation and the influence of picture-plane rotations on the holistic processing of static faces is qualitative, not quantitatively linear, supporting the internal representation hypothesis. It should be noted that the study and test faces in Rossion and Boremanse (2008) and Rosenthal et al. (2018) were presented at the same viewpoints and alignment, while those details are not clearly described in Mondloch and Maurer (2008).

As evident from the above studies and other research that only compared 0° with 180° faces (e.g., Barton, Depak, & Malik, 2003; Rhodes, Brake, & Atkinson, 1993; Yovel & Kanwisher, 2004), inversion disrupts holistic processing of front faces (for reviews, see McKone et al., 2013; Rossion, 2008). Some studies, however, have found holistic processing for inverted front faces with a long presentation time of test face (e.g., 800 ms in Richler et al., 2011a; 2,500 ms in Curby et al., 2013) or a larger sample size (e.g., at least 60 in Susilo, Rezlescu, & Duchaine, 2013). Holistic processing for inverted faces was also observed in an internal-external context congruency paradigm (e.g., Meinhardt, Meinhardt-Injac, & Persike, 2019) or an aperture paradigm (e.g., Murphy, Gray, & Cook, 2020). These results support the notion that upright and inverted faces may rely on the same mechanisms but with reduced efficiency for inverted presentation (Sekuler, Gaspar, Gold, & Bennett, 2004; Willenbockel et al., 2010). That is, the holistic face representation may be tolerant of inversion (but see Cheng, McCarthy, Wang, Palmeri, & Little, 2018, for the other possibility that this mechanism might be serial or parallel processing of top and bottom components of composite faces without processing the entire face coactively).

As for dynamic faces, whether inverted faces are processed holistically is also still worth investigating. Zhao and Bülthoff (2017) observed different patterns of motion effect on upright faces and inverted faces. They found holistic processing for inverted dynamic faces, but not for inverted static faces. Such motion facilitation effect, however, was not observed for upright faces. Recent findings on static faces suggest that gestalt perceptual grouping cues could facilitate holistic processing (Curby et al., 2013; Curby et al., 2016; Zhao et al., 2016a). The holistic processing of inverted faces with rigid motion can also be explained by the gestalt perceptual grouping cues account (Zhao & Bülthoff, 2017) because rigid motion encourages the grouping of facial parts together based on gestalt principles of common fate and synchrony (Alais, Blake, & Lee, 1998; Lee & Blake, 1999; Piepers & Robbins, 2012; Wagemans et al., 2012). Such a perceptual grouping approach, however, may not apply to upright faces. In this sense, upright and inverted dynamic faces may not share the same mechanism.

As for depth-rotated viewpoints, McKone (2008) found equal holistic processing for all views of depth-rotated static faces (front, three-quarters, and profile) with the same viewpoints used for study and test faces. The author suggested that this finding could be due to an internal 3D face representation or sufficient experience with profiles to produce full levels of holistic processing. However, the question of how depth-rotated viewpoints affect the holistic processing of dynamic faces remains open.

Study-test consistency and holistic processing

When a viewer sequentially matches two unfamiliar faces, they might form a representation of the study face first and then compare the test face to it. Whether holistic face processing is sensitive to the within-person variations between the study and test faces (i.e., study-test consistency) will help clarify whether the representation of a face is tolerant of within-person variations.

As mentioned in the previous sections, some research has reported holistic processing across within-person variations between study and test faces. For example, Richler et al. (2014) used different images of the same person as the study and test face in the Vanderbilt Holistic Face Processing Test and observed holistic processing. Zhao and Bülthoff (2017) observed persisting holistic processing regardless of whether stimuli contained rigid motion in the studying phase, testing phase, or both.

However, holistic face processing might nevertheless be affected by within-person variations between the study and test face. The “name-physical disparity” in same-different judgments (Proctor, 1981) showed that the same judgment was faster when the stimuli are physically identical (e.g., A-A) than when they are only identical in name (e.g., A-a). This could be possibly observed in the same-different judgment of a composite face task. Therefore, it is possible that Richler et al. (2014) would have found a higher degree of holistic processing if the authors had used the same images of the same person for study and test. It is also possible that the persistent holistic processing observed across within-person variations in study and test faces in Zhao and Bülthoff (2017) stemmed from the fact that the variations between study and test faces (0° vs. 30°) were not large enough. Favelle et al. (2015) proposed that the elastic motion consistency of their study and test faces contributed to the discrepant results between their study and that by Xiao et al. (2013). The differences between the results patterns of other studies (Mondloch & Maurer, 2008; Rosenthal et al., 2018; Rossion & Boremanse, 2008) on the influence of picture-plane rotation on holistic processing of static faces might also be due to differences in study-test face consistency. Study and test faces were presented identically in Rossion and Boremanse (2008) and Rosenthal et al. (2018), while Mondloch and Maurer (2008) do not specify these details.

Present study

The present study aimed to investigate whether holistic representations of faces are tolerant of within-person motion and viewpoint variations by manipulating study-test consistency and study-test face orientation in a complete composite paradigm.

To investigate the effect of viewpoint changes on holistic processing, we used three types of depth rotation viewpoints: 30°, 60°, and 90°. The 30° and 90° viewpoints are consistent with those used in Zhao and Bülthoff (2017) and Xiao et al. (2012), respectively. The faces were seen in coherent motion rotating from the left 30°/60°/90° to the right 30°/60°/90° in steps of 10° at constant speed. The presentation time of each static image was 70 ms, which brought the total presentation time to 490 ms, 910 ms, and 1,330 ms for the 30°, 60°, and 90° stimuli, respectively.

To investigate whether holistic face processing is tolerant of motion changes, we used three different types of faces as static face stimuli. One was identical to the 0-static faces used in Zhao and Bülthoff (2017), with presentation times set to 490 ms, 910 ms, and 1,330 ms, for consistency with the 30°, 60°, and 90° moving face presentation times and to rule out a confounding of presentation time and viewpoint. The second type of static faces we used was identical to the multi-static faces used by Xiao et al. (2012). Multi-static faces are less likely to lead to 3D representations than dynamic faces but rather to multi-image-based representations. Moreover, given that both dynamic depth rotation and static depth rotation convey similar viewpoint information, 30°, 60°, and 90° depth-rotated static faces were also included as view-static faces.

Regardless of the viewpoints and motion types of the study faces, the test faces were always frontal-static images in Experiment 1a, where the viewpoint changed between study and test face (inconsistent study-test: e.g., 30°–0°, 60°–0°, 90°–0°) while the motion type could be the same (e.g., study 30° static face, test 0° static face) or different (e.g., study multi-static and dynamic face, test 0° static face). In Experiment 1b, the test faces were identical to the study faces to maintain study-test consistency, with study-test faces identical in viewpoint and motion, while viewpoints changed between pairs (consistent study-test: e.g., 30°–30°, 60°–60°, 90°–90°). In Experiment 2, study-test consistency was manipulated as a within-subject variable. Moreover, to ensure we investigated “pure” holistic processing of faces varied in motion and viewpoint, we also measured the holistic processing of inverted faces in Experiment 2.

This work was approved by the Institutional Review Board of the Department of Psychology at Sun Yat-sen University. This study was conducted following the Code of Ethics of the World Medical Association (Declaration of Helsinki).

Experiment 1a: Inconsistent study-test

Method

Participants

Our sample sizes in Experiments 1a and 1b and that of Experiment 2 was based on an exploratory parameter estimation by using the power analysis calculator in https://jakewestfall.shinyapps.io/pangea/ (Judd, Westfall, & Kenny, 2017; which has been used by Hertz, Blakemore, & Frith, 2020). The sample size estimated was 20 per face type with d=0.45, alpha of 0.05, and power of 0.9 in mixed design with Replicates of 12 (we used 12 face models per viewpoint*participant) and default var of error and viewpoint*participant. For counterbalance, we recruited 24 participants per group. As the current study design consists of four groups of participants, we recruited 96 college students with normal or corrected-to-normal vision at Sun Yat-sen University and divided them randomly into four groups. All participants provided prior informed consent and received payment after the experiment. One student’s data were excluded because his total accuracy was more than two standard deviations below the group mean. The specific assignment for the remaining 95 participants (72 females, 23 males; age 18–27 years, M = 20.75, SD = 2.05) was as follows: 0-static face type (24), view-static face type (23), multi-static face type (24), and dynamic face type (24).

Stimuli

Twelve 3D face models were generated using the FaceGen Modeller, based on 12 Asian faces (six females and six males). Gray-scale faces on black background images (270 × 270 pixels) were rotated from the left (30°/60°/90°) to the right (30°/60°/90°) in steps of 10° to create 0-static faces, view-static faces, multi-static faces, and dynamic faces. The 0-static faces were front-view faces. The view-static faces were static faces rotated in depth (30°/60°/90°). The dynamic faces were created by sequentially displaying facial images from 30°/60°/90° left to 30°/60°/90° right. The multi-static faces were created by keeping the left 30°/60°/90° and right 30°/60°/90° images but randomly displaying the remaining images in between. See Fig. 1b for examples of dynamic faces and multi-static faces. All stimuli were presented on a black background at a resolution of 1,920 × 1,080 pixels on a 23-in. screen.

Design

A 4 (face type: 0-static, dynamic, multi-static, view-static) × 2 (alignment: aligned, misaligned) × 2 (congruency: congruent, incongruent) × 3 (viewpoint: 30°/490 ms, 60°/910 ms, and 90°/1,330 ms) mixed design was used, with face type treated as a between-subject factor and all others as within-subject factors.

Procedure

We used a complete-design composite task (Fig. 1b). In each trial (see Fig. 1c), participants saw a fixation cross (500 ms), a study face (490 ms, 910 ms, or 1,330 ms), and then a mask (500 ms), followed by a test face that was displayed until a response was made. Participants made same/different judgments, as accurately as possible, regarding the top parts of the two faces that were presented sequentially. Study faces were always intact faces and, depending on the participant group, 0-static, view-static, multi-static, or dynamic faces. Test faces were aligned or misaligned and were always 0-static faces. Each participant went through eight practice trials and 288 experimental trials (2 alignment × 2 congruency × 2 same/different × 3 time/viewpoint × 12 exemplar). The experiment also contained a 1-min break every 60 trials.

Results and discussion

Participants’ performance was measured using response sensitivity (d’), which was calculated based on hit (correct identification) and false alarm rates (misidentification) in each condition (Stanislaw & Todorov, 1999), d’ = Z(hit) – Z(false alarm). Holistic face processing is indexed by an interaction between congruency and alignment (e.g., Zhao et al., 2016a). Discrimination performance should be better in congruent trials than in incongruent trials, and the congruency effect should be larger in the aligned condition than in the misaligned one. In the present study, we used the size of the interaction between alignment and congruency as the dependent variable: (aligned_congruent – aligned_incongruent) – (misaligned_congruent – misaligned_incongruent). Greenhouse-Geisser-corrected results are reported where the sphericity assumption was violated, and Bonferroni-corrected results are reported for post hoc analyses of the current and following experiments.

To examine whether there was a confounding of presentation time and depth-rotated viewpoints, a one-way repeated-measures ANOVA was conducted to test the holistic processing of 0-static faces at different times (see Fig. 2). The results of ANOVA showed no significant main effect of time (F(2, 46) = 0.206, p = .814, ηp2 = .009). We also conducted an Equivalence Test following the two one-side tests (TOST) procedure (Lakens, Scheel, & Isager, 2018) to test the equivalences among the three times. We took -0.5/0.5 as the lower/upper equivalence bound, 0.05 as the alpha level. The TOST results revealed that the differences between any two of the three times were all statistically equivalent to zero (t(23)≥1.76, ps<.05). Both the two pieces of evidence indicate that viewpoint was not confounded by presentation time. It should be noted that the 95% confidence intervals (95% CIs) never included 0, indicating that 0-static faces were processed holistically regardless of presentation time. This result replicates previous research (e.g., McKone, 2008; Rosenthal et al., 2018; Rossion, 2008; Rossion & Boremanse, 2008) that reported holistic processing for 0-static faces.

Fig. 2
figure 2

Holistic processing (HP) as a function of face type (dynamic, multi-static, view-static, and 0-static) and viewpoint (30°/490 ms, 60°/910 ms, and 90°/1,330 ms) in Experiment 1a. Here and elsewhere the blue shadow represents the 95% confidence interval (CI) for the mean. To show how alignment and congruency interact, response sensitivity (d’) as a function of face type, viewpoint, alignment, and congruency is also depicted. The mean effects in the column at the very right and in the bottom row represent the main effects of face type and viewpoint, respectively. Here and elsewhere error bars represent the 95% CI for the mean

To investigate how motion variance and viewpoint variance influence holistic face processing when study-test faces are inconsistent, a repeated-measures ANOVA was conducted with face type (dynamic, multi-static, view-static) as a between-subject factor and viewpoint (30°, 60°, 90°) as a within-subject factor (see Fig. 2). The results showed neither a main effect of face type (F(2, 68) = 1.991, p = .144, ηp2 = .055) nor one of viewpoint (F(2, 136) = 0.241, p = .786, ηp2 = .004), but revealed a significant interaction of both factors (F(4, 136) = 2.706, p = .033, ηp2 = .074). Simple effect analysis showed that face type did not have an effect at 30° and 60° but that at 90°, holistic processing was observed for multi-static faces (M = 0.856, 95% CI = [0.438,1.274]) at a larger degree than for dynamic faces (M = 0.143, 95% CI = [-0.275,0.561]) and view-static faces (M = -0.075, 95% CI = [-0.502,0.352]). The larger degree of holistic processing for multi-static faces than that for dynamic faces seems to be consistent with the findings of Xiao et al. (2012).

To compare the present results with those of Xiao et al. (2012), who adopted the alignment effect in the partial design as a dependent variable, a planned contrast was conducted for 90° dynamic faces and 90° multi-static faces. The results of this analysis showed no significant difference between 90° dynamic and 90° multi-static faces (t(91) = -1.134, p = .260), which is inconsistent with the results of Xiao et al. (2012).

It should be noted that the present experiment adopted a complete composite design to eliminate the response bias problem in a partial composite design. It is supposed that the response bias problems in the partial analyses will not present when these trials are gathered in the context of a complete design. Our analysis (see the Online Supplementary Materials for statistical results on response biases) confirmed this. It can therefore be reasonably assumed that the alignment effect found in the partial conditions within our complete design is not identical to that found in a real partial design in Xiao et al. (2012).

Similarly, to compare the present results with those of Zhao and Bülthoff (2017), a planned contrast was conducted for 30° dynamic faces and the 0-static faces at 490 ms, with holistic processing as the dependent variable in the complete design. The results of this analysis showed that 30° dynamic faces were processed as holistically as 0-static faces at 490 ms (t(91) = 1.163, p = .248), which is consistent with the results of Zhao and Bülthoff (2017).

These findings indicate that the discrepancy between Xiao et al. (2012) and Zhao and Bülthoff (2017) is not due to the different presentation times adopted in their studies but may be a result of the different types of static faces, different viewpoints, or different composite tasks that the authors used. Further experiments are needed to explore this issue in more detail.

In the present experiment, participants studied view-static faces and were tested on 0-static faces. However, McKone (2008) asked participants to study view-static faces and tested them on identical view-static faces. In Experiment 1b, we therefore used test faces that were identical to the study faces to explore whether the results from inconsistent study-test faces can be generalized to consistent study-test faces.

Experiment 1b: Consistent study-test

Method

Participants

As in Experiment 1a, power analysis revealed 20 per group would be required to achieve a power level of 0.90. Additional participants were recruited for counterbalance. Therefore, another 71 undergraduate students at Sun Yat-sen University took part in the experiment and were randomly divided into three groups. All participants provided prior informed consent and received payment after the experiment. Two students were excluded because their average accuracy was more than two standard deviations below the group mean. The specific assignment for the remaining 69 participants (56 females, 13 males; age 18–28 years, M = 20.70, SD = 1.98) was as follows: view-static face type (24), multi-static face type (22), and dynamic face type (23).

Stimuli, design, and procedure

The stimuli, design, and procedure were the same as in Experiment 1a, except that the test faces were identical to the study faces. We only included three types of faces in the present experiment: dynamic, multi-static, and view-static faces. It should be noted, however, that for the multi-static faces, images were randomly displayed during both study and test.

Results and discussion

A 3 (face type: dynamic, multi-static, view-static) × 3 (viewpoint: 30°, 60°, 90°) repeated-measures ANOVA (see Fig. 3) showed neither a main effect of face type (F(2, 66) = 0.726, p = .487, ηp2 = .022), nor one of viewpoint (F(2, 132) = .453, p = .636, ηp2 = .007), and no interaction of the two (F(4, 132) = .164, p = .956, ηp2 = .005). Although the null interaction observed here is inconsistent with the interaction we found in Experiment 1a, the present experiment replicated Experiment 1a in that dynamic faces were processed as holistically as view-static faces. Consistent with the findings of McKone (2008) and Experiment 1a, the present experiment shows that depth-rotation viewpoints do not affect holistic processing of dynamic faces and view-static faces, indicating that holistic processing is tolerant of viewpoint variation regardless of whether the study and test faces are identical.

Fig. 3
figure 3

Holistic processing (HP) as a function of face type (dynamic, multi-static, and view-static faces) and viewpoint (30°, 60°, and 90°) in Experiment 1b. To show how alignment and congruency interact, response sensitivity (d’) as a function of face type, viewpoint, alignment, and congruency is also depicted. The mean effects in the column at the very right and in the bottom row represent the main effects of face type and viewpoint, respectively

To investigate whether study-test consistency affects holistic processing, we combined the data of Experiments 1a and 1b and used study-test consistency as a between-subjects variable. A 2 (study-test consistency: inconsistent-Experiment 1a, consistent-Experiment 1b) × 3 (face type: dynamic, multi-static, view-static) × 3 (viewpoint: 30°, 60°, 90°) mixed-design ANOVA revealed a significant study-test consistency effect (F(1, 134) = 11.352, p = .001, ηp2 = .078), with a larger degree of holistic processing in the study-test consistent condition (M = 0.722, 95% CI = [0.568,0.876]) than in the study-test inconsistent condition (M = 0.354, 95% CI = [0.202,0.505]). No other main effects or interactions reached significance (ps ≥ .122, ηp2 ≤ .028).

This study-test consistency effect seems to indicate that holistic face processing is not tolerant of the motion and viewpoint variations we introduced in the current experiment, which contradicts our earlier conclusion. We conjecture that two different mechanisms might account for “physically matched” (e.g., consistent study-test (A30°–A30°, physically identical; A30°–B30°, physically different) in Experiment 1b) and “nominally matched” stimuli (e.g., inconsistent study-test (A30°–A0°, physically different but nominally the same; A30°–B0°, physically and nominally different) in Experiment 1a). When the study and test faces are physically identical, a template matching strategy may be adopted. When they are not physically identical, face representations in the brain need to be activated and compared.

We therefore further explored whether the null effect of face type and viewpoint on holistic processing and the study-test consistency effect can be replicated, in Experiment 2. Here, we included inverted faces and manipulated study-test consistency as a within-subject variable.

Experiment 2

Method

Participants

The sample size estimated was 15 per face type with d=0.45, alpha of 0.05, and power of 0.9 to detect the interaction of viewpoint and face type in a mixed design with Replicates of 8 (we used eight face models per viewpoint*participant*consistency*orientation) and default var of error and viewpoint*participant. For counterbalance, additional participants were recruited per group. Therefore, another 69 students at Sun Yat-sen University in China participated in the experiment and were randomly divided into three groups. The specific assignment for the 69 participants (51 females, 18 males; age 18–25 years, M = 20.27, SD = 1.55) was as follows: view-static face type (23), multi-static face type (24), and dynamic face type (22).

Stimuli, design, and procedure

The stimuli, design, and procedure were identical to Experiment 1a, except for the following three differences: (a) face stimuli were created based on only eight Asian faces (four females and four males); (b) a 2 (study-test consistency: consistent, inconsistent) × 2 (orientation: upright, inverted) × 3 (face type: dynamic, multi-static, view-static) × 3 (viewpoint: 30°, 60°, 90°) mixed design was used, with face type as a between-subject factor and the remaining variables as within-subject factors. Orientation and study-test consistency were designed as between-block factors. The order of blocks was counterbalanced across participants; (c) each participant went through eight practice trials and 768 experimental trials: 2 (consistency: consistent, inconsistent) × 2 (orientation: upright, inverted) × 3 (viewpoint: 30°, 60°, 90°) × 2 (alignment: aligned, misaligned) × 2 (congruency: congruent, incongruent) × 2 same/different × 8 exemplars. Participants took a 1-min break every 48 trials. It should be noted that the participants’ task was to make same/different judgments, as accurately as possible, regarding the top parts (eyes and forehead were included) of the two faces presented sequentially, no matter what their orientation was.

Results and discussion

We performed a 2 (consistency: consistent, inconsistent) × 2 (orientation: inverted, upright) × 3 (face type: dynamic, multi-static, view-static) × 3 (viewpoint: 30°, 60°, 90°) repeated-measures ANOVA. The results are summarized in Table 1.

Table 1 Statistical results of the 3 (Face type) × 2 (Consistency) × 2 (Orientation) × 3 (Viewpoint) ANOVA in Experiment 2

As expected, a significant study-test consistency effect was observed, with a larger degree of holistic processing in the consistent condition (M = 0.366, 95% CI = [0.258,0.474]) than in the inconsistent condition (M = 0.087, 95% CI = [-0.029,0.203]) (see Fig. 4).

Fig. 4
figure 4

Holistic processing (HP) as a function of face type (dynamic faces, multi-static, and view-static), orientation (inverted, upright), and consistency (consistent, inconsistent) in Experiment 2. To show how alignment and congruency interact, response sensitivity (d’) as a function of face type, orientation, consistency, alignment, and congruency is also depicted

Furthermore, as expected, we found a significant main effect of orientation with a larger degree of holistic processing for upright faces (M = 0.327, 95% CI = [0.197,0.456]) than for inverted faces (M = 0.126, 95% CI = [0.026,0.226]). Importantly, the three-way interaction between face type, consistency, and orientation was significant, F(2, 66) = 6.986, p = .002, ηp2 = .175.

To disentangle the three-way interactions, we performed two separate 2 (consistency) × 2 (face type) repeated-measures ANOVAs for upright and inverted faces. For upright faces, a significant main effect of consistency was observed (F(1, 66) =15.543, p <.001, ηp2 = .191), with a larger degree of holistic processing in the study-test consistent condition (M = 0.488, 95% CI = [0.337,0.640]) than in the study-test inconsistent condition (M = 0.165, 95% CI = [0.010,0.320]). This result replicates the findings of Experiments 1a and 1b, indicating that the study-test consistency effect occurs independently of whether study-test consistency is treated as a between-subjects or a within-subjects factor. The effect was not modulated by face type (F(2, 66) = 2.698, p = .075, ηp2 = .076), and no main effect of face type was found (F(2, 66) = 0.246, p = .783, ηp2 = .007). These results are also in line with Experiments 1a and 1b and indicate that dynamic faces are processed as holistically as view-static faces regardless of whether the study and test faces are identical.

Interestingly, for inverted faces, we also found a significant main effect of consistency (F(1, 66) = 5.984, p = .017, ηp2 = .083), with a larger degree of holistic processing in the study-test consistent condition (M = 0.243, 95% CI = [0.119,0.367]) than in the study-test inconsistent condition (M = 0.008, 95% CI = [-0.143,0.160]). This effect was not modulated by face type (F(2, 66) = 2.508, p = .089, ηp2 = .071), and no main effect of face type was observed (F(2, 66) = 0.289, p = .750, ηp2 = .009).

It should be noted that the confident interval for inverted faces in the study-test consistent condition did not include 0, indicating a significant holistic processing, while that in study-test inconsistent condition included 0, indicating no holistic processing. The holistic processing observed for inverted faces in the study-test consistent condition was mainly due to holistic processing for dynamic faces and multi-static faces (dynamic: M = 0.327, 95% CI = [0.108,0.547]; multi-static: M = 0.257, 95% CI = [0.046,0.467]). View-static face inversion, in contrast, eliminated all behavioral characteristics of holistic processing (View-static: M = 0.146, 95% CI = [-0.069,0.361]), consistent with previous research on inverted 0-static faces (McKone, 2008; Rosenthal et al., 2018; Rossion & Boremanse, 2008). These different response patterns for inverted faces in motion (dynamic and multi-static faces) and inverted static (view-static) faces are in line with the findings of Zhao and Bülthoff (2017). We discuss this further in the General discussion.

In Fig. 4, the interaction pattern for alignment and congruency seems to differ for upright and inverted faces. We therefore performed a 2 (alignment) × 2 (congruency) repeated-measures ANOVA on inverted faces for the study-test consistent condition. The analysis revealed a significant interaction between congruency and alignment (F(1, 66) = 15.270, p <.001, ηp2 = .188). A simple effect analysis showed a significant congruency effect for the aligned (p = .001) but not for the misaligned (p = .085) condition and a significant alignment effect for the congruent (p = .001) but not for the incongruent (p = .168) condition. These results indicate that the interaction pattern between alignment and congruency for inverted faces is similar to that for upright faces, although they seemingly look different.

The null main effect of viewpoint and lack of an interaction with consistency represent a replication of the findings of Experiment 1, indicating that holistic processing is tolerant of viewpoint variation regardless of whether the study and test faces are identical.

The interaction between viewpoint and face type in Experiment 1a was not observed in the corresponding inconsistent condition for upright faces in Experiment 2. This suggests that multi-static faces might lead to unstable results.

General discussion

The purpose of the present study was to investigate whether holistic face representations are tolerant of within-person motion and viewpoint variations by manipulating face type (dynamic, multi-static, view-static, and 0-static faces), viewpoint (30°, 60°, 90°), study-test consistency, and the orientation (upright, inverted) of study-test faces in a complete composite paradigm. First, we found a persistent study-test consistency effect with a larger degree of holistic processing for identical than for different study and test faces. Second, holistic processing was not sensitive to viewpoint changes via depth rotation or motion changes, regardless of study-test consistency. Third, inverted moving (dynamic and multi-static) faces in the study-test consistent condition were processed holistically.

Study-test consistency effect on holistic face processing

The study-test consistency effect is the main finding of the present study.

There are two possible explanations for this effect. First, as Favelle et al. (2015) proposed, when the stimulus format differs between study and test, less holistic processing may be a product of mismatching perceptual cues. To compensate for the mismatching, participants may adopt a diagnostic local processing strategy that results in a weaker composite effect. When the study and test faces are identical, then there are no differences in visual information and there should be no change in the viewer’s holistic facial processing strategy. This notion could explain how inconsistency impairs the holistic processing of upright faces.

Another possible explanation is object-based holistic processing (Zhao et al., 2016a, 2016b; see also Curby et al., 2016; Curby et al., 2013); since the study-test consistency effect was also observed for inverted faces in the current study. Specifically, holistic processing for inverted faces in the study-test consistent condition showed significance, mainly due to holistic processing for dynamic faces and multi-static faces, while inverted faces in the study-test inconsistent condition did not show holistic processing. The consistent perceptual grouping cues provided by the consistent rigid motion of consistent study-test faces could help activate holistic processing for inverted faces. We think this notion could explain how consistency improves the holistic processing of inverted faces.

Why did we not observe a motion facilitation effect for upright faces? Upright moving faces showed holistic processing, similar to static faces, in the present study as well as in Zhao and Bülthoff (2017). One possibility is that the top-down (e.g., expertise-based holistic processing) and bottom-up (e.g., motion-based holistic processing) route are independent and do not interfere with each other. This is evidenced by the finding that the processing of faces does not interfere with the processing of stimuli with salient Gestalt cues (Curby et al., 2019) but with the holistic processing of objects of expertise (Curby & Gauthier, 2014). However, Curby and Moerel (2019) found interference between the holistic processing of Gestalt line stimuli and faces when they were overlapped. Therefore, a more likely possibility is that the holistic processing of upright faces already reaches “the ceiling,” and that motion cannot further enhance it. As Zhao and Bülthoff (2017) have suggested: “When influential factors are available to activate holistic processing, adding additional cues does not necessarily enhance holistic processing.” Curby and Moerel (2019) also echoed this with their suggestion that “facilitation of holistic perception can only occur when the system is not overtaxed,” which is evidenced by car experts processing faces more holistically in the context of cars in a modified, rather than intact, configuration (Curby & Gauthier, 2014). This is consistent with the notion that inversion results in quantitative, not qualitative, changes in face processing (Cheng et al., 2018; Curby et al., 2013; Meinhardt et al., 2019; Murphy et al., 2020; Richler et al., 2011a; Sekuler et al., 2004; Susilo et al., 2013; Willenbockel et al., 2010). It is possible that a similar motion effect would be observed for upright faces and inverted faces when encoding duration was sufficient or the discrimination difficulty of inverted faces decreased as in Curby et al. (2013) or Murphy et al. (2020).

Is holistic face representation tolerant of motion and viewpoint variations?

Burton and his colleagues argued that configural processing (at least the type based on specific metrics measurements) cannot account for familiar face recognition (Burton et al., 2015; Sandford & Burton, 2014). In line with this argument, the present study’s study-test consistency effect for unfamiliar upright faces suggests that holistic face processing of unfamiliar upright faces is based on specific metrics and is not tolerant of within-person variation (at least not of the viewpoint and motion variations applied in the present study).

However, the null effect of motion and viewpoint on holistic processing we observed, regardless of study-test consistency, suggests that holistic face processing is, in fact, tolerant of motion and viewpoint. That is, distorting the T-shape or changing the configural information of a face (by shortening the distance between the eyes) by depth-rotations of 30°, 60°, or 90° did not impair the holistic processing of view-static and dynamic faces in our experiments, whether the study-test format was consistent or not.

Our findings thus seem to lead to two contradictory conclusions. We conjecture that the two different phenomena may reflect two different mechanisms, one for physical matching (i.e., consistent study-test) and one for nominal matching (i.e., inconsistent study-test) of face stimuli.

When the study and test faces are physically identical, a physical template matching strategy may be adopted. When they are not physically identical, face representations in the brain need to be activated and compared. The holistic face representations evoked by different motion types or different viewpoints might be the same, since no effects of face type and viewpoint on holistic processing were observed. We found the degree of holistic processing, however, to be larger for physically matched than for nominally matched faces, a finding that resembles the “name-physical disparity” in same-different judgments, as termed by Proctor (1981). When the stimuli are physically identical (e.g., A–A), the match is made on the basis of a physical code rather than the name code that is required when the stimuli are only identical in name (e.g., A–a). Physical code matches are assumed to occur at an earlier level of processing and to be available more rapidly than name codes.

According to this notion, the study-test consistency effect does not reflect tolerance of within-person motion and viewpoint variations, but “name-physical disparity” in the same-different judgments, while the null effects of motion and viewpoint on holistic processing indeed indicate that holistic processing is tolerant of within-person motion and viewpoint variations.

The formation of the same representation induced by different motion types or different viewpoints might also be based on experience. Experience-only theories may account for our results if we assume a “threshold” for experience: The experience of the viewer with the profile faces is sufficient to lead to the same representation as the 30° face, and the additional experience (front face or 30° face) does not lead to an increase in holistic processing intensity (McKone, 2008).

Theories of holistic face processing

The current results suggest that holistic processing may have three components. The first two are the two components of the dual-route theory (Zhao et al., 2016a, 2016b; see also Curby et al., 2016; Curby et al., 2013; Zhou et al., 2012). One is experience-based holistic processing, which leads to a higher degree of holistic processing for upright than inverted faces and also results in the constancy of holistic processing for upright study-test inconsistent faces. The other component is object-based perceptual grouping cues.

Here we propose a third one: study-test physical template matching. This physical template is different from the internal face-like template that exists for faces only, according to the domain-specific hypothesis (Kanwisher, 2000; McKone et al., 2007; Robbins & McKone, 2007). Instead, this physical template is a bottom-up physical template for any type of visual stimulus. This kind of template matching is also different from bottom-up object-based perceptual grouping, in that it requires the matching of two stimuli, while perceptual grouping focuses on a single stimulus.

We suggest that both the physical template matching component and the perceptual grouping component contribute to the holistic processing observed for the complete composite face effect. Figure 2a shows that the “same” judgment for congruent pairs is in fact a physical matching task, while the “same” judgment for incongruent pairs is a dimension-based (here, top-based) nominal judgment. The congruency effect is therefore mainly due to the template matching component. Figure 2a also shows that perceptual grouping is intact for aligned faces, while it is disrupted for misaligned faces. This indicates that the alignment effect is mainly due to the object-based perceptual grouping component.

We assume that the study-test consistency effect for inverted faces also stems from these two components. The perceptual grouping cues (motion) route could explain why inverted moving faces showed holistic processing while inverted view-static faces did not, and the template matching route could explain why holistic processing of inverted moving faces was only observed for study-test consistent face pairs but not for study-test inconsistent face pairs.

Conclusion

In sum, across two experiments, the present study describes a new phenomenon – the study-test consistency effect – based on the finding that the degree of holistic processing is larger when study-test faces are identical than when they are not. Holistic processing was found not to be sensitive to viewpoint changes via depth rotation or motion changes. Holistic processing was observed for study-test consistent moving (dynamic and multi-static) inverted faces, but not for view-static inverted faces. In light of current theories, we propose a physical template matching component of holistic face processing.

The present study has, however, several limitations. First, we used computer-generated faces that lack realistic surface reflectance information, which is often seen as critical for face recognition (Russell & Sinha, 2007). These stimuli might therefore overemphasize the holistic processing of structural information, as no other (i.e., surface reflectance) cues are present in the stimuli. This might limit the ecological validity of the observed results to some extent. Second, we did not separate viewpoint consistency from motion consistency in the present study. Accordingly, more research is needed to examine whether the consistency effect can be generalized to real faces and other designs that are used to investigate study-test consistency. To further understand how unfamiliar upright faces varying in viewpoint, motion, emotion, and other aspects can be represented and recognized, research using different images of unfamiliar people is needed.