Introduction

Figure disembedding ability is the capacity to visually locate and detect local elements immersed within a global configural shape (Witkin, 1950; Witkin et al., 1971). It has been classically demonstrated that, when participants are required to search for a simple figure (local level) integrated in a larger one (global level), their task is more difficult if the lines of the simple figure belong perceptually to a different visual configuration within the complex figure, an effect early referred to as “embeddedness” by Gottschaldt (1926, 1929).

Gottschaldt (1926, 1929) introduced the Embedded Figure Test as a suitable measure of the ability to disentangle a figure from the background. The test material is a series of meaningless geometrical patterns in which a simpler geometrical figure is embedded, and the task requirement is to pencil it in. Following the original Gottschaldt’s Hidden Figure Test (GHFT), different versions of the Embedded Figures test have been devised, in which individuals are required to identify a target (simple) shape within complex designs. The most known version is that developed by Witkin et al. (1971).

The ability of disembedding figures undergoes developmental changes across childhood, with younger children being less able to detect embedded figures from the background, as indexed by both time and accuracy measures (Amador-Campos & Kirchner-Nebot, 1997; Cecchini & Pizzamiglio, 1975; Goodenough & Eagle, 1963; Witkin et al., 1967). Moreover, children’s performance seems to be affected by socioeconomic status and sex, with better scores associated to higher socioeconomic status and to male sex, although the results are not entirely consistent (Cakan, 2003; Cecchini & Pizzamiglio, 1975; Forns-Santacana et al., 1993; Karp et al., 1969; Witkin et al., 1967).

Persons with autism are thought to perform better than typical controls in disembedding figures. In a seminal study, Shah and Frith (1983) observed superior performance of children with autism on the Children’s Hidden Figure Test (Witkin et al., 1971) and explained this finding in terms of an enhanced ability to focus on the details within the whole, ignoring the interfering effect of the overall configuration (gestalt). Other studies reported that individuals with autism are as accurate as typical controls in disembedding figures but often show faster response times (Horlin et al., 2016). The inconsistencies in the literature are probably due to methodological reasons such as heterogeneity of task versions, stimuli presentation, and response recording (for a discussion see White & Saldaña, 2011).

To account for the advantage of individuals with autism in disembedding figures, the “Weak Central Coherence” (WCC) model proposed that the perceptual profile in these individuals is characterized by a weakness in global processing of information and a tendency towards local processing (Frith, 1989). Such local processing style would not be intrinsically disadvantageous, but rather it would be related to the specific requirements of the task at hand, for instance being advantageous in tasks requiring a superior ability to process details, such as the Figure Disembedding and the Block Design tests (Happé & Frith, 2006). From another angle, the “Enhanced Perceptual Functioning” (EPF) model posited that individuals with autism would show an enhanced capacity in local processing rather than an impaired ability for global processing (Mottron & Burack, 2001; Mottron et al., 2006).

To address the complex differences between children with typical development and children with autism in figure disembedding ability in the clinical setting, we deemed useful developing a standardized version of the GHFT, with robust normative data available for both time and accuracy parameters. Hence, in the present research, we: (1) assessed developmental changes across childhood in GHFT performance; (2) provided normative data from a large sample of native Italian speaking children; (3) tested a group of children with autism on the standardized version of the GHFT.

To these aims, two studies were conducted. In both, we adopted the GHFT version used by Capitani et al. (1988), in which participants have to complete a first series of stimuli within 15 min and a second series within 5 min. As we assess children, we did not impose any time limit as in adult individuals, but rather recorded the time (s) needed to complete the whole series of patterns.

Study 1 assessed the changes in performance on the GHFT as a function of age and education in a sample of 7–11-year-old typically developing children. This age range was chosen since age consistently affects figure disembedding ability (Amador-Campos & Kirchner-Nebot, 1997; Bigelow, 1971; Cecchini & Pizzamiglio, 1975; Witkin et al., 1967), with a progressive maturation of global and local perceptual abilities, particularly during the early school period (Dukette & Stiles, 2001; Kimchi et al., 2005; Poirel et al., 2008). Response time and accuracy scores of the GHFT were computed using the LMS method (Cole & Green, 1992), which allowed for obtaining normalized growth centile standards.

Study 2 evaluated the GHFT performance of children with autism with respect to the normative data gathered from participants in Study 1, and also compared performance of the autism group with that of closely age-matched typically developing controls.

Methods

In the GHFT, participants are presented with 34 complex geometrical figures in which a simple shape is hidden (Capitani et al., 1988; some examples are provided in Fig. 1). Test stimuli are arranged in four tables. In the first three tables containing nine items each, participants need to search for the simple figure (on the left side) within the complex figure (on the right side) and highlight it in by a pencil. In the last table, instead, the simple figure is located at the center of the sheet; the participants have to identify it within seven different complex figures placed above and below the simple shape. Each correct choice is scored 1 (score range 0–34) to obtain the accuracy score. The time needed to solve each of the four tables is recorded and their sum provides the total time score.

Fig. 1
figure 1

Example of the stimuli from Tables 1 and 2 of the GHFT

The study was conducted according to the standards of the Helsinki Declaration and the study protocol was approved by the local ethical committee of the Department of Psychology of the University of Campania Luigi Vanvitelli (code: N:34/03.11.2020). Written informed consent was obtained from the parents of each participant involved in the study.

Study 1

Participants

Typically developing children were recruited from elementary schools located in the Campania region, Southern Italy. To be included in the study, each participant had to meet the following inclusion criteria: (i) a normal score (≥ 15th percentile of the Italian normative data; Pruneti et al., 1996) at the Raven’s Colored Progressive Matrices test (RPM; Raven et al., 1998); (ii) age range from 7 to 11 years; (iii) lack of neurologic, neuropsychological, or neuropsychiatric disorders, as reported by either parents or teachers; and (iv) Italian as native language. We recruited a sample of 403 children (188 males).

Children’s socioeconomic status (SES) was measured using the Hollingshead Four Factor Index of social status (Hollingshead, 1975), which estimates SES based on a weighted average of education and occupational level of both parents of each child (Venuti & Senese, 2007).

Statistical Analysis

An a priori power analysis for one-way analysis of variance (ANOVA) was carried-out by G*Power (Version 3.1.9.2), setting the following parameters: probability level (α), 0.05; statistical power (1 − β), 0.80; moderate effect size (Cohen’s f of 0.25) (Cohen, 1988).

The overall sample was divided into eight groups (Table 1); as in previous normative studies (e.g., Conson et al., 2019; Mozzanica et al., 2016), each group covered a 6-month range.

Table 1 Normative sample stratified by age

SES was split into two groups (i.e., high and low SES) using the K-means clustering procedure (non-hierarchical clustering) since no cut-off value is available in the literature.

Three one-way ANOVAs were carried out for evaluating the effect of age groups, sex, and SES on time and accuracy scores of the GHFT. Tukey’s honestly significant differences (HSD) tests were used for post-hoc comparisons.

Centiles for time and accuracy scores of the GHFT were computed using the LMS method (Cole & Green, 1992), which allows for obtaining normalized growth centile standards. The method assumes that data can be normalized using a power function, which stretches one tail of the distribution while shrinking the other. The optimal power (i.e., Box–Cox power transformation) to obtain normality was calculated for each age group and the trend summarized by a smooth (L) curve. Trends in the mean (M) and coefficient of variation (S) were similarly smoothed.

The resulting L, M, and S curves contain the information to draw any centile by the following formula:

$${\text{C }} = {\text{ M}}\left( {{1} + {\text{LSZ}}} \right){1}/{\text{L}}$$

where Z is the value of the z-score corresponding to centile. The 3rd, 5th, 10th, 15th, 25th, 50th, 75th, 90th, and 97th centiles were chosen as age-specific reference values.

All analyses were performed using IBM Statistical Package for Social Science (SPSS; Version 21; IBM Corp., Armonk, NY, USA), with p-value < 0.05 considered as statistically significant.

Results

The a priori power analysis revealed that at least 240 participants (i.e., 30 individuals for each age group) were needed to attain a moderate effect size.

K-means clustering identified two clusters with high (M = 44.73, SD = 9.07) and low (M = 24.29, SD = 7.97) SES score, containing 191 and 212 participants, respectively. Descriptive statistics for each age group are shown in Table 2.

Table 2 Descriptive statistics stratified by age range are shown as mean (standard deviation) or count (percentage), as appropriate

One-way ANOVAs showed a significant effect of age group on time (F = 2.85, p < 0.01, η2p = 0.05), and accuracy (F = 12.03, p < 0.01, η2p = 0.18) scores, but not of sex and SES (p > 0.05).

Tukey HSD post-hoc comparisons showed statistically significant differences in time scores between the 7.0–7.5 and 10.6–10.11 age groups, and a marginally significant difference between the 7.6–7.11 and 10.6–10.11 age groups. As for accuracy score, statistically significant differences were found between the 7.0–7.5 or the 7.6–7.11 and the 8.6–8.11, 9.0–9.5, 9.6–9.11, 10.0–10.5, or 10.6–10.11 age groups. Similarly, statistically significant differences were found between 8.0–8.5 or 8.6–8.11 and 10.0–10.5 age groups (Table 3).

Table 3 Tukey’s honestly significant differences (HSD) post-hoc test results for time (above the diagonal) and accuracy (below the diagonal) of Gottschaldt's Hidden Figure Test

Centile curves (Cole & Green, 1992) for time and accuracy scores are provided in Figs. 2 and 3, respectively.

Fig. 2
figure 2

Estimates of GHFT time centiles. Green, blue, light blue, purple, yellow, grey, and black curves represent the 3rd, 5th, 10th, 25th, 50th, 75th, and 90th percentiles, respectively (Color figure online)

Fig. 3
figure 3

Estimate of GHFT accuracy centiles. Green, blue, light blue, purple, yellow, grey, and black curves represent the 3rd, 5th, 10th, 25th, 50th, 75th, and 90th percentiles, respectively (Color figure online)

Time and accuracy centiles are reported in Table 4. Note that, when the percentile of interest is not available, it is possible to compute it by the formula reported above and considering the LMS parameters associated with each age group (Table 4). For example, to compute the 95th percentile (corresponding to a z score equal to 1.64) of the accuracy score for the 9.0–9.5 age group, the formula becomes 24 × (1 + 1.35 × 0.27 × 1.64)1/1.35 = 33.94 which can be approximated to integer score of 34.

Table 4 Age-specific percentiles for time and accuracy of Gottschaldt's Hidden Figure Test

Study 2

Participants

Twenty-two individuals with autism (three female; mean age = 9.2, SD = 1.43; age range 7–11) took part in the study. To be included in the study, each participant had to meet the same inclusion criteria as in Study 1.

Diagnosis of autism was reached after a multidisciplinary assessment by a neuropsychiatrist and a clinical psychologist trained in the evaluation of individuals with neurobehavioural disorders according to DSM-V criteria. Clinical diagnosis was validated by means of the Autism Diagnostic Interview-Revised (ADI-R; Rutter et al., 2003) and the Autism Diagnostic Observation Schedule Module 3 (ADOS-2; Lord et al., 2012).

We also recruited a sample of 22 typically developing children (three females; mean age = 9.2, SD = 1.44; age range 7–11) individually matched for age and sex with autistic children. Typically developing children were recruited from primary schools in Naples, in the Campania Region of Italy.

Since the RPM scores are well correlated with Wechsler Full Scale intelligence quotient (IQ) (e.g., O’Leary et al., 1991), they were used to estimate IQ. The estimated IQ of children with typical development (mean = 101.2, SD = 11.6) did not significantly differ (t-test = − 1.49, p = 0.143) from that of children with autism (mean = 106.3, SD = 11.2).

Statistical Analysis

A priori power analyses for sample size calculation were conducted with G*Power 3.1 by setting the following parameters: probability level (α) of 0.05, statistical power (1 − β) of 0.80, and large effect size (Cohen’s d of 0.80 for t-test and odds ratio of 6.71 for binary multiple logistic regression analysis) (Cohen, 1988; Faul et al., 2009).

We first compared autistic children’s performance with respect to the normative data gathered in Study 1.

Then, independent t-tests were conducted on both response time and accuracy scores of the GHFT as a function of groups (autistic children vs typical controls).

Finally, a binary multiple logistic regression analysis was performed to identify the scores of GHFT which were able to discriminate the children with autism from typically developing controls. To check for the reliability of the results due to the relatively small sample size, we computed 95% bias corrected and accelerated confidence intervals [95% CI] (1000 bootstrap samples) for the logistic regression coefficients. The bias of an estimate can be ignored if it is lower than 0.25 times its standard error.

The analysis was performed using IBM Statistical Package for Social Science (SPSS; Version 21), with p value < 0.05 considered as statistically significant.

Results

With reference to the normative data obtained from the Study 1, all children with autism had a time score at or above the 50% centile of the normative data, and particularly six of them (27.3% of the sample) had a time score above the 90th centile. By contrast, their accuracy scores ranged across the whole percentile range (Fig. 4).

Fig. 4
figure 4

Percentiles of age-corrected GHFT accuracy and time scores achieved by single individuals with autism in reference to normative data obtained from Study 1. Dimension of symbol is proportional to density of observations (the smallest symbols represent one individual, the largest four individuals). The dotted lines represent the 50th percentile

As it regards the comparison between children with autism with the age-matched controls, the a priori power analyses revealed that at least 42 participants (21 for each group) for t-test and 34 participants for binary multiple logistic regression analysis were needed to attain a large effect size, at a statistical power of 0.80 and α level of 0.05.

Mean response time and accuracy scores of the GHFT are reported in Fig. 5 separately for each group. Results of independent t-tests showed that children with autism were significantly faster (t-test = 3.01, p = 0.004, Cohen’s d = 0.90) than typically developing controls, while no significant difference was found on the total accuracy score (t-test = − 0.84, p = 0.40, Cohen’s d = 0.26).

Fig. 5
figure 5

Time and accuracy scores of the GHFT, separately for children with autism and typically developing participants. Boxes represent 25 and 75 percentiles. The solid line inside the box represents the median of the group, while the empty square in the box represents the mean. Bars above and below the boxes represent the interquartile range. Each individual dot represents a subject

The results of the binary multiple logistic regression analysis showed that only the time score of GHFT discriminated children with autism from typically developing controls (p = 0.01) with an overall accuracy of 68.2%. The bias estimates of the regression coefficients were lower than 0.25 times their standard errors, indicating no substantial bias, and thus, adequacy of the sample size (Table 5).

Table 5 Results of the binary multiple logistic regression analysis

Discussion

The main aims of the present investigation were, first, assessing developmental changes across childhood in GHFT performance and providing normative data on Italian language speaking children (Study 1), and second, assessing a group of children with autism on the standardized version of the GHFT (Study 2).

Results of Study 1 showed that, although a progressive reduction of time scores could be appreciated across all the age ranges, only the two farther age groups were significantly different. These results are in line with Witkin et al.’s (1967) data showing an overall progressive tendency for children to become faster at disembedding, with a levelling off of the trend from about 14 years onward. The accuracy score was statistically lower in 7.0–7.5 and 7.6–7.11 groups with respect to all the other age groups, and in 8.0–8.5 or 8.6–8.11 groups with respect to 10.0–10.5 age groups, thus suggesting a smooth increase of accuracy across childhood. These findings are consistent with previous data on elementary school children (Amador-Campos & Kirchner-Nebot, 1997), with a particular refinement of performance when comparing 7-year-old children with the older ones (Bigelow, 1971; Cecchini & Pizzamiglio, 1975).

Here we did not detect differences due to sex or socioeconomic status. Thus, the normative data only took into account the effect of age. Sex differences have been reported in the literature on figure disembedding but with some inconsistencies partially accounted for by age. In age ranges well comparable with the present one, significant sex differences did not emerge (Bigelow, 1971; Cecchini & Pizzamiglio, 1975; Corah, 1965) although a non-significant advantage of boys over girls was also reported (Amador-Campos & Kirchner-Nebot, 1997). Indeed, until age 10, boys and girls display similar figure disembedding abilities while they tend to diverge during adolescence (Witkin et al., 1967). As for the possible effect of SES on performance, in a review of studies on children aged 4.5–10.5 years Laicardi-Pizzamiglio and Pizzamiglio (1974) found that poorer figure disembedding was related to lower socioeconomic status, but this finding could be related to participant’s general cognitive abilities (Bigelow, 1971; Forns-Santacana et al., 1993). It is possible that in studies on children with a narrow range of general cognitive functioning the differences related to socioeconomic status in figure disembedding could be weakened, consistent with recent findings (Zappullo et al., 2020).

Results of Study 2 showed that children with autism were significantly faster than typical controls whereas the two groups did not differ in accuracy. This finding was also clear with reference to the normative data in Study 1: a substantial proportion of autistic children achieved a time score in the highest centiles of the normative distribution, but only in some of them the accuracy score was at high centiles.

It has been suggested that methodological factors affecting task performance at a behavioral level, rather than a true difference between children with autism and typical development in figure disembedding at a cognitive level, could account for reported group differences on figure disembedding tests (White & Saldaña, 2011). Indeed, several approaches in the analysis of response time differences between children with autism and typical development are available in literature each producing somewhat different data. Some authors computed the average response times for only correct responses (e.g., de Jonge et al., 2006; Morgan et al., 2003), whilst others examined response times to all stimuli, either replacing the maximum time allowed for incorrect trials (e.g., Jolliffe & Baron-Cohen, 1997; Ropar & Mitchell, 2001) or including searching times regardless of whether the response was correct or not (Edgin & Pennington, 2005). Here, we followed the last procedure and recorded the time (in s) needed to complete the whole series of 34 items, irrespectively of the correctness of the responses. The present results were consistent with Edgin and Pennington (2005) and supported the strength of children with autism in disembedding. White and Saldaña (2011) suggested that considering time irrespective of trial accuracy could be affected by different strategies used to detect targets on incorrect trials, as some participants might give up search for the target more quickly than others. However, White and Saldaña’s observations (2011) mainly referred to Witkin’s task or to comparable versions, as the Coates’ (1972) one, in which, for each trial, the participant has to locate a simple target figure (e.g., a triangle) within a complex picture. Such task versions can favor a practice effect, since one or two target figures have to be repeatedly searched within complex images (Gottschaldt, 1926; Ludwig & Lachnit, 2004; Witkin et al., 1967). Moreover, in the Witkin’s task, a small number of items is presented (12 items), thus often leading to a ceiling effect (De Jonge et al., 2006; Jolliffe & Baron-Cohen, 1997). Practice in disembedding could reflect the impact of experience in terms of both perceptual and procedural (searching strategy) factors but can be reduced by varying the stimulus material as in the GHFT (Ludwig & Lachnit, 2004).

In the GHFT, 27 pairs of different images are provided in Tables 1, 2 and 3, whereas in the last table only one target has to be identified within seven different complex images. Hence, this set-up allowed to markedly increase task complexity with respect to other available versions, such as Witkin’s or Coates’ tests, counteracting the impact of practicing. Indeed, we did not find evidence of ceiling in any of the tested populations. Importantly, increased task complexity, and reduced practice effect, are thought to affect response time more than accuracy (Ludwig & Lachnit, 2004; Zoccolotti & Pizzamiglio, 1982), thus favoring the sensitivity of response times in discriminating between participants with autism and typically developing controls. Our results from the comparison of autistic children’s performance with both normative data and individually matched controls consistently showed that GHFT faster time scores significantly relate to the presence of autism, consistent with evidence demonstrating that individuals with autism are able to locate the embedded figures more quickly than controls (de Jonge et al., 2006; Jarrold et al., 2005; Jolliffe & Baron-Cohen, 1997; Morgan et al., 2003; Pellicano et al., 2005).

Both the WCC (Happé & Frith, 2006) and the EPF (Mottron et al., 2006) predict superior disembedding abilities in individuals with autism, albeit hypothesizing different underpinning mechanisms. Comparing the explanatory power of these two models would require more complex paradigms than the one adopted in the present investigation (Horlin et al., 2016). Thus, our findings do not allow to shed light on the reasons why children with autism are faster than typically developing controls in disembedding figures. Moreover, we have to mention that a further limitation of the present study is the reduced possibility to generalize the results to age ranges different from that involved here. Indeed, as recalled above, critical developmental changes occur in global and local perception abilities during the elementary school period, but a progressive refinement of these abilities protracts across adolescence (Mondloch et al., 2003). Hence, future studies should test the GHFT in typically developing adolescents and assess its validity to test figure disembedding abilities in adolescents with autism.

In conclusion, the results of Study 1 on a large sample of typically developing children aged 7–11 demonstrated that GHFT accuracy and time scores differed across age groups, without sex and socioeconomic differences. Thus, we could provide normative data only considering children’s age. In Study 2, children with autism achieved time scores at or above the 50th centile with respect to normative values and significantly differed for time scores from a closely age-matched group of typically developing controls. Taken together, these findings indicate that the GHFT is a valuable tool for assessing developmental changes in children’s figure disembedding ability and tracing the functional cognitive profile of children with autism.