The early years of a child’s life set foundations for learning and development. Longitudinal studies show that certain early childhood experiences, interactions and environments can play a positive role in children’s life trajectories (Heckman & Karapakula, 2019). Early childhood education and care (ECEC), for instance, is shown to have a robust and long-term positive impact on children’s academic, vocational and life success (Sylva et al., 2004). However, variation in outcomes follows a gradient of quality across the ECEC and home learning environments that children experience (Sylva et al., 2004). These findings have created an imperative to identify the interactional, educational and structural elements that contribute to children’s learning and development, and later-life outcomes.

Conceptions of quality in ECEC are generally guided by theory (e.g. socioecological, attachment, learning), and evaluated in terms of children’s outcomes. While there is not one prevailing conception or model of quality in ECEC – nor, arguably, should there be given inter- and intra-national difference in communication, expectation and socialisation – there is nevertheless consensus across approaches that high-quality experiences, interactions and environments have a positive influence on children’s developmental progress and outcomes (OfSTED, 2022; Siraj-Blatchford & Sylva, 2004). Across these models, early childhood research that investigates ECEC quality – its characteristics, relative influences on child learning and development, and how to identify and foster these conditions – commonly separates quality into distinct yet related process and structural dimensions, rather than conceiving of quality as a unidimensional global construct (although some studies also conceive of global quality indices as meaningful, Dickinson, 2003; McGinty et al., 2012). Process quality relates to children’s interactions with adults and other children in their early learning environment, as well as learning experiences as shaped by, for example, curricula, resources and materials. By contrast, structural quality refers to organisational aspects of early learning environments, such as adult qualifications, ratio of children to adults, group size, funding and management of the early learning setting (Edwards, 2021). It is generally regarded that structural quality sets the conditions for process quality (Burchinal, 2018); that is, structural quality is necessary but not sufficient to achieve robust child-level impacts.

Substantial investment has aimed to characterise (Slot, 2018; Sylva et al., 2004) and measure ECEC quality (Ishimine & Tayler, 2014; Vermeer et al., 2016), and mobilise these insights towards boosting quality (Siraj et al., 2019), yielding a large corpus of research on quality dimensions believed to be influential to child development and learning. Yet, findings have been difficult to reconcile given inconsistency across settings, measures, methods and results (Brunsek et al., 2017; Perlman et al., 2016). There is pressing need to reconcile this evidence base, considering that quality ECEC is an increasing area of priority for governments internationally, and access to funding can be linked to meeting prescribed quality standards or thresholds. Given that the ECEC field has had to act, these quality standards and assessment criteria can at times appear selective (rather than systematic) or exogenous to the available evidence (see, e.g., suggestions of important areas that quality measures fail to capture; Burchinal, 2018). There is also widespread use of measures of ECEC quality within and beyond these quality assurance programs; however, the measures in use are disparate in the quality characteristics they privilege and are inconsistent in the evidence that their combination of quality characteristics yield better child outcomes (e.g. Perlman et al., 2016; Zaslow et al., 2006). A systematic review of the available literature, while ambitious, is thus essential to reconcile the large corpus of available evidence on quality dimensions to identify the strongest and most consistent evidence for relation with child outcomes and whether/how associations are influenced by study characteristics. To date, attempts to identify aspects of quality interactions that relate most highly to child outcomes have tended to focus on particular measures and/or contexts.

Measuring the Quality of ECEC Provision

International efforts to characterise, pinpoint and promote high-quality ECEC provision (ISSA, 2016; Slot, 2018) are buoyed by governments increasingly shifting their emphasis from enrolment in ECEC (with universal or near-universal participation in at least one year of ECEC now common in OECD nations; OECD, 2022) to ensuring that ECEC provision confers optimal developmental and educational benefits for children. This emphasis has heightened the need to devise and deploy measures that can articulate and capture a continuum of quality in ECEC provision, such that higher quality ratings are associated with better child outcomes.

Yet, the OECD (2022) notes that ECEC has greater variation in approaches to measurement of quality than in other educational contexts. The USA, for instance, employs a Quality Ratings and Improvement System (QRIS) that also includes licensing and inspection processes (National Center on Early Childhood Quality Assurance, 2018). The QRIS is mandatory in some US states and optional in others, and indicators of quality diverge across states. In the UK, the Early Years Foundation Stage and its associated quality assessment indicators are a required standard for ECEC providers that register with Ofsted. Wales and Scotland, however, utilise different systems (Department for Education, 2021). Australia is one of a few countries with a mandatory national ECEC system, which encompasses laws, regulation, a quality assessment and rating system and learning framework (ACECQA, 2020), although even in this case there are differences in implementation across states and territories.

Despite these disparate approaches, quality rating and improvement systems are implicated in a general increase in ECEC quality scores (and decrease in variability in quality ratings across the ECEC sector) in at least some nations (e.g. Melhuish & Gardiner, 2020). This highlights the viability and promise for this approach to capturing and fostering ECEC quality. However, it is questionable whether improvement in quality scores is the ideal marker of success (Burchinal, 2018). In each of these cases, operationalisation of the characteristics that exemplify quality differ, and are drawn from a differing evidence base and/or measures. Also problematic, the extent to which these national quality ratings align with child development is often shielded from public scrutiny. For example, although National Quality Standard (NQS) ratings for Australian ECEC services are publicly available (ACECQA, n.d.-a), data linking these with children’s outcomes are not. Where established and more-transparent quality rating scales are adopted, there are often questions of the degree to which these account for learning and developmental change in children. As such, it is unclear whether current efforts to quantify and improve quality are aptly targeted, and whether commensurate improvement in child outcomes has resulted from the identified increases in quality ratings for some segments of the ECEC sector.

Variation across these approaches demonstrates the need for tools to measure ECEC quality that show strong prediction of child growth and outcomes in ways that are appropriate to the context, culture and priorities. Several tools with these aims have been developed, measuring different aspects of structural and process quality in ECEC services. The most widely used scales include the Classroom Assessment Scoring System (CLASS; Pianta et al., 2007), the Early Childhood Environment Rating Scale 3 (and its predecessors ECERS-R and -E; Harms et al., 2015), the Infant/Toddler Environment Rating Scale (ITERS; Harms et al., 2003) and the Sustained Shared Thinking and Emotional Wellbeing scale (SSTEW; Siraj et al., 2015). These scales share the aim of identifying high-quality provision – and, in so doing, also suggest opportunities for high-quality provision to be promoted and sustained – yet vary widely in foci (e.g. the balance of structural and process quality dimensions they consider; curricular or interactional aspects of quality included), approach to observing quality (e.g. sustained observation over half a day or more; frequent but brief observation intervals; self-reported recollections) and their approaches for deriving quality indices therefrom (e.g. weighted methods that reconcile lower- with higher-order indicators; summation of indicators satisfied).

Despite their diversity, most of the quality measures have at least some evidence that their resultant indices predict children’s development or outcomes (Howard et al., 2020; Mashburn et al., 2008; Sylva et al., 2006), although their strength, breadth and consistency of prediction is highly variable. Yet, this still obscures insight into the specific characteristics – or combination of characteristics – that lead to stronger child outcomes. That is, these quality measures implicitly presume, by virtue of their scoring, that satisfaction of a greater number of their indicators – and especially higher-order quality indicators – is associated with an incremental increase in child progress. This may indeed be the case, but also plausible is that the degree of association with children’s outcomes is variable between quality indicators, and that unmeasured indicators of quality are additionally important (e.g. Burchinal, 2018).

To date, research and reviews on the predictive validity of quality indicators have focused on results for an individual context and/or measure, rather than a comprehensive review of measures, contexts and studies. More sophisticated and nuanced understandings of the particular quality indicators that most consistently and strongly account for children’s developmental progress are needed to better understand the nature of quality interactions, and ensure ‘quality improvement’ efforts are directed towards their most impactful and susceptible targets. Without this, we risk the situation whereby quality is defined, measured and promoted, and success in these efforts is defined by increased prevalence of these characteristics, rather than by measured improvement in aspects that matter to children’s wellbeing and outcomes.

Findings of Previous Reviews on ECEC Quality

Efforts to understand the link between ECEC quality and child outcomes is not new. Despite decades of investigation of ECEC quality using widely studied quality measures, such as CLASS and ECERS, reviews and meta-analyses continue to report inconsistent findings in their association with children’s developmental progress and outcomes (Brunsek et al., 2017; Burchinal, 2018; Keys et al., 2013; Perlman et al., 2016; Ulferts et al., 2019). For example, recent meta-analyses have found that various quality indices show, upon aggregation of findings, few or weak associations across various domains of child progress (Brunsek et al., 2017; Egert et al., 2018). Where there is more (but not complete) consensus, these conclusions tend to be rather broad; for instance, process quality tends to be more predictive of child outcomes than structural quality factors and scales and subscales that privilege these aspects, such as CLASS, tend to show more reliable prediction (Keys et al., 2013). Yet, these reviews have tended to focus on particular contexts and/or quality measures, precluding an explicit focus on the dimensions and conditions of quality that show more robust association with child progress and outcomes. Better understanding of how features of quality (e.g. adult–child interactions, content, instruction, educator decision-making) function as predictors of child learning and development is thus needed to build knowledge and inform quality improvement programs more effectively (Fenech, 2011; Gordon & Farran, 2022).

The Current Study

Evidence for dimensions of quality that have been proposed (and measured) in the literature is piecemeal – obscuring insight into which quality indicators show the most consistent and robust support for association with child development (and in what conditions/contexts). As such, current ECEC quality insights and efforts are drawn from studies and reviews focusing largely on the merits of individual quality measures. It is equally plausible, however, that most measures have elements with merit. Understanding how these elements may combine to derive conceptions and indicators of quality that optimally account for the experiences, interactions, environments and conditions for child growth – rather than which measure to select as currently constructed – requires a different approach. We seek to provide a consolidated and systematic review of studies concerning the relationship between discernible characteristics of process quality in ECEC (adult–child interactions and associated context and content) and child developmental outcomes. That is, we aim to identify characteristics of quality interactions that have the strongest evidence for impact on child learning and developmental outcomes. Specifically, this study was governed by three overarching research questions: (1) what is the scale and consistency of the evidence base on associations between various interaction quality characteristics in ECEC and child outcomes?; (2) on which study characteristics do inconsistent directions of association most widely diverge?; (3) do the patterns identified in the first two questions differ across interaction quality dimensions?

Methods

This systematic review was designed, conducted and reported in accordance with PRISMA guidelines for the reporting of systematic reviews (Liberati et al., 2009).

Eligibility Criteria

Of interest were studies assessing the association of early educator–child interaction quality, or any feature of these educator–child interactions, on children’s developmental outcomes. This included (1) ‘global’ or broad environmental quality ratings that assessed process quality, even if they also assessed structural features of the environment; (2) interaction-specific quality ratings that assessed one or more components of educator–child interactions; and (3) domain-specific quality ratings that assessed quality of instruction and stimulation in specific content areas. Additional eligibility criteria were as follows:

  1. 1.

    Study design: cross-sectional observational, longitudinal, cohort or validation designs. Experimental studies were included if results reported on whole-cohort associations, not just intervention-control contrasts.

  2. 2.

    Publication status: published in a peer-reviewed journal or by recognised government bodies.

  3. 3.

    Reporting: original research, published in English, with a full methods section.

  4. 4.

    Variables: measurement of one or more dimensions of educator–child interaction quality analysed as a predictor of one or more cognitive-academic or social-emotional competence outcomes.

  5. 5.

    Populations: ECEC studies with non-clinical samples from the general population; mean age of sample 3–5 years, and did not include children younger than 2 years of age or older than 6 years of age.

Search Strategy

A systematic search of seven electronic databases was conducted in November 2022 and included all publications from the year 2000 to the search date. Starting year was chosen to align with a concerted uptick in international research efforts to define and evaluate the impact of ECEC quality (e.g. EPPE study in the UK; Sylva et al., 2004). The databases searched were A Plus Education, PubMed, Web of Science, psycINFO, MEDLINE, ERIC and CINAHL.

Search strings were developed, piloted, revised and ultimately agreed upon by the research team. The strings addressed four inclusion criteria of interest: population/setting (i.e. interchangeable search terms related to early childhood); interaction (i.e. interchangeable search terms related to key characteristics of adult–child interaction or process quality, such as interaction, communication and relationship); quality (i.e. interchangeable search terms related broadly to ECEC quality or prevalent quality measures); developmental outcome (i.e. interchangeable search terms related to child development and outcomes, broadly conceived). This was intended to yield an overly comprehensive (and thus exhaustive) set of search results, which could be reduced in screening stages. The final search strings were as follows:

  1. (1)

    Population/setting: child [OR infant OR "early child" OR "early years" OR toddler OR preschool OR "child care" OR "child care cent" OR "child-care" OR "child-care cent" OR "child development cent" OR "early childhood education" OR "nursery school" OR "day care cent" OR "long day care" OR kindergarten] AND adult [OR educator OR teacher OR parent OR pedagogue OR mother OR father] AND

  2. (2)

    Interaction: play [OR interact OR convers OR language OR talk OR communicat OR cooperat OR "caregiver child relationship" OR "parent child relationship" OR "teacher student relationship"] AND

  3. (3)

    Quality: "child care" NEAR/3 quality [OR "early childhood education" NEAR/3 quality OR "classroom assessment scoring system" OR “CLASS” OR "early childhood environment rating system" OR "ECERS" OR "caregiver interaction scale" OR “CIS” OR "caregiver training" OR "classroom environment" OR "preschool evaluation" NEAR/3 quality OR "sustained shared thinking and emotional wellbeing scale" OR "SSTEW" OR "infant and toddler environment rating/scale" OR "ITERS"] AND

  4. (4)

    Developmental outcome: "child outcomes" [OR "social skills" OR "emotion regulation" OR "self-regulation" OR "sustained shared thinking" OR "academic achievement" OR "child development" OR "child language" OR litera OR "cognitive ability" OR "school readiness" OR vocab OR "academic performance"].

This strategy yielded 16,528 search results. An example search strategy can be found in Supplementary Online Resource 1.

Study Selection

Titles, keywords and abstracts of each study were screened against eligibility criteria by a single researcher. If a study appeared to meet eligibility criteria, or if relevance of the study was uncertain, full texts were obtained (n = 478; full text of 6 articles could not be retrieved even after contacting the authors, so were excluded). These full texts were each independently assessed for inclusion by two members of the research team, with a third team member moderating disagreement. Results were recorded in Covidence systematic review software (Veritas Health Innovation, 2022). Inter-rater agreement was 90.3%. This resulted in 69 eligible studies for inclusion. The main reasons for exclusion were as follows: the population was outside the target age range or age range was unclear; the study did not include a measure of interaction quality or an eligible child outcome measure; or the study included measures of interaction quality and child development, but did not examine association between them. Reference lists of eligible studies were manually searched for further relevant studies, using a snowball search strategy. This resulted in the identification of an additional 126 studies for full-text review, 21 of which were included in the review. Figure 1 gives an illustration of this process and the number of results at each step. Altogether, this process resulted in a total of 90 studies included in this review. The full list of included articles and main characteristics of the study can be found in Table 1.

Fig. 1
figure 1

PRISMA flowchart of the systematic review of quality in adult–child interactions and developmental outcomes

Table 1 Study and sample characteristics of the 103 included studies examining associations of adult–child interactions with children’s developmental and educational outcomes

Data Extraction and Risk of Bias Assessment

Data extraction was performed by two researchers. Information extracted from each study included sample size; child mean age and range, gender distribution; country whereby the research was conducted; description of sample population; measures to assess interaction quality; and developmental outcome and measure. Where quality interaction measures were reported at the domain/subscale level, only those domains which considered interactional quality were tabled. The direction of each association between interactional quality and outcome were also reported (positive, negative, null). Table 1 provides a summary of characteristics for the included studies.

The Newcastle–Ottawa scale (Wells et al., 2009), which was designed to appraise the risk of bias among non-experimental studies, was adopted (see Supplementary Online Resource 2 for risk of bias computation table). It includes eight items that measure aspects of methodological quality across three domains: sample selection, comparability and outcome. Studies scoring between 0 and 6 were coded as having high risk of bias and those scoring between 7 and 9 were coded as having low risk of bias. Correlation between the independent risk of bias assessments of two researchers was high, r = 0.90.

Classification Methods

The breadth of characteristics of educator–child interactions captured within this review made it necessary to group similar interaction types for the purpose of meaningful analyses. Whereas there is a general consensus distinguishing structural quality (i.e. characteristics and conditions of the setting often governed by regulation, such as adult/child ratios) and process quality (i.e. aspects related to interactions, support and instruction), there is less agreement on the presence, number, nature and merit of delineations within process quality. Yet, there is some conceptual support and precedent for further delineation of interactions to support activity, conversation, relationships and play from interactions related to content instruction (e.g. language, science, mathematics) (Burchinal, Kainz, & Cai, 2011; Sylva et al., 2006). While there is undoubtedly overlap and association between these two process quality dimensions (e.g. conversations and questioning are essential to each), this nevertheless permitted (1) evaluation of the differential associations of global versus finer-grained process quality indices with child outcomes and (2) the ability to assign studies to mutually exclusive categories (e.g. specific subscales related to quality of interactions within the context of language and literacy instruction, for instance, were ascribed to quality of interactions related to curriculum content; see Supplementary Online Material 3). Other groupings are possible, with potential implications for findings. Yet, in the absence of established models of process quality dimensions, this classification frame offered an initial approach to evaluating whether there is empirical basis for these related, but perhaps meaningfully distinct, dimensions within global ECEC process quality. The groupings, which for comparison also included ECEC global quality and global process quality, are described below and are further detailed in Supplementary Online Resource 3.

ECEC Global Quality. Operationalised as the overall quality of the environment and experiences in ECEC settings. These indices included some aspects related to adult–child interaction, but also included others such as structural or environmental quality. As an example, examinations using global ECERS-R ratings (Harms et al., 1998) were grouped here, as it combines ratings of indicators such as space and furnishings with educator–child interactions.

ECEC Process Quality. Operationalised as the social, emotional, physical and/or instructional elements of interactions that take place between the adult and child within a learning experience or social interaction (Pianta et al., 2005), yet was not further separated by interaction characteristic. For example, examinations using the total CLASS-Pre-K (Pianta et al., 2007) were grouped here, as it considers the quality of educator–child interactions across several domains (e.g. instructional support, classroom organisation and emotional support). Other examples include the ECERS-R Interactions subscale (Harms et al., 1998) and Caregiver Interaction Scale (CIS; Arnett, 1989).

Quality Pedagogical Interactions. Operationalised as more specific interaction features between the educator and child throughout the ECEC setting, which are not intentionally and explicitly related to content instruction. These were wide ranging, and further subdivided into four categories: (i) building positive relationships; (ii) routines and structure; (iii) educator–child conversations and questioning; (iv) supporting play. For example, studies indexing quality using ORCE or the CLASS-Pre-K subscale, Emotional Support, were included here because of their focus on relationship building and emotional connections.

Quality Interactions Related to Curriculum Content. This grouping included those interactions that focused on the development of specific knowledge and skills. These were wide ranging and further subdivided into seven categories: (i) global instructional support; or educators’ interactions in the context of (ii) literacy development; (iii) language development; (iv) mathematics learning; and (v) science and environment learning. Also grouped here was (vi) diversity (educators’ support and promotion of diversity in ECEC settings) and (vii) educator use of materials/resources (to support teaching). For example, studies that indexed quality separately for the ECERS-E subscales on Literacy, Mathematics, Science and Environment, or Diversity were included here because of their predominant focus on quality of interactions and practices related to instruction in these content areas. The CLASS-Pre-K subscale, Instructional Support, was also categorised here as (i) global instructional support.

To further support analysis and interpretation of the findings, the child outcomes were also classified into two main groupings, following conventions in other meta-analyses investigating association with broad child outcomes (e.g. Robson, Allen, & Howard, 2020): cognitive-academic outcomes and social-emotional outcomes. The cognitive-academic outcomes were further categorised as cognitive outcomes (e.g. executive function assessed by the Head-Toes-Knees-Shoulders task; IQ assessed by Hannover–Weschler Intelligence Scale); and academic, including specific and broad/global academic outcomes (e.g. vocabulary assessed by Peabody Picture Vocabulary Test; academic achievement using Woodcock-Johnson Test of Achievement). Similarly, the social-emotional outcomes were further categorised as either prosociality (e.g. measured by MacArthur Health and Behavior Questionnaire) or socio-emotional competence (e.g. measured by Social Skills Improvement System rating scales). The higher-level category outcome groupings (i.e. cognitive-academic; social-emotional) were used for initial synthesis of results, whereas the more refined groupings were considered to further examine and explain those overall findings. Further information on the groupings and specific constructs associated with these, by study, are provided in Table 1.

Research Synthesis

Most often, studies investigated multiple associations between educator–child interactions and child outcomes. For example, a study using the CLASS Pre-K instrument (Pianta et al., 2007) might have examined the associations between the measure’s three domains (Instructional Support, Emotional Support and Organisational Support) and various outcome measures (e.g. receptive and expressive vocabulary, early reading and mathematics skills, social competence). Due to this tendency, associations (rather than studies) were considered as the unit of analysis.

Using the classification method described for interactions and outcomes, variables for each association were categorised. The number of positive, negative and null associations were then determined for each interaction/outcome category (e.g. ECEC process quality and cognitive-academic outcomes). This strategy is useful for characterising the consistency (or variability) of evidence amongst disparate measures and methods (where, by contrast, their combination of effect size would have unclear meaning and utility). For instance, the aggregation of effect sizes for meta-analysis would yield robust, aggregate estimates of associations between ECEC quality and child outcomes. Yet, in fields where diversity in methods and measures is high, as in the current review, meta-analysis often is not recommended (indeed, prior meta-analyses have typically been confined to research on particular quality measures). Moreover, if there is wide inconsistency in results (e.g. a balance of positive and negative associations), exploration of factors that could account for this variability may be preferable to approaches that seek their statistical synthesis. The current approach has been used to good effect to yield useful insights across highly heterogeneous literatures to show, for example, that negative body image is associated with higher levels of neuroticism and lower levels of extraversion (Allen & Walter, 2016) and somewhat more negative body image amongst cancer survivors than for healthy controls (Lehmann, Hagedoorn, & Tuinman, 2015).

Variables such as study design, study country, sample size, study risk of bias rating, as well as the quality measure used and outcome assessed (e.g. academic, attention-control, prosociality, social-emotional competence) were used to examine points of difference between (and thus plausible explanation for) studies that reported positive, negative and null associations for each interaction/outcome combination. For example, the set of studies finding positive associations of ECEC global quality scores with cognitive-academic outcomes were contrasted with those finding null associations, according to quality measure adopted (e.g. CLASS vs ECERS), child outcome sub-group (e.g. cognitive vs academic), study design (i.e. cross-sectional vs longitudinal), risk of bias (i.e. high vs low) and sample size. Factors for which there was (and was not) a pronounced difference between these sets of studies were highlighted.

Results

Overview of Studies

The characteristics of included studies are presented in Table 1. Seventy-one of the 90 studies included in the analytic sample reported significance for at least one association investigated. Twenty-five studies reported cross-sectional data and 65 longitudinal data. The included studies comprised 108,693 children. Sample sizes ranged between 44 and 13,565, with a mean of 1207.70 (SD = 2180.10, median = 511). The gender distribution of the sample was relatively even (49.6% female), and the sample had a grand mean age of 48.83 months (SD = 7.43). These calculations are based on available data, as the gender and mean age were not available for all studies (see Table 1). Eight studies included children in both centre-based and non-parental home-based care (see Table 1). Their analyses were applied on this integrated sample. This analytic integration, and small number of studies spread across all of our quality categories (except ECEC Global Quality), precluded our ability to isolate patterns of association specifically for home-based care. As a result, these settings are integrated in our analysis, although they represent only a small proportion of our associations (less than 10% of our associations include home-based care settings). While samples were recruited from a wide range of countries, the USA was the most highly studied nation (66.7%), followed by Germany (6.7%), China (4.4%) and Australia (3.3%). The Netherlands and Chile each had two studies (2.2%); and Denmark, Canada, Ecuador, England, Ethiopia, Indonesia, Norway, Peru, Portugal, Singapore, Turkey and United Arab Emirates each contributed one study (1.1%). Methodological quality of the included samples was mixed, with risk of bias ratings ranging from 3 to 9 (M = 6.92, SD = 1.43). In total, 63 studies (70.0%) were assessed as having low risk of bias (rating 7–9) and 27 (30.0%) as high risk (rating 0–6).

The following sections examine the patterns of associations with children’s outcomes for the various interaction dimensions considered in this study (e.g. ECEC process quality, quality pedagogical interactions). The unit of analysis in relation to child outcomes was at the level of association. Given that studies typically reported associations between multiple quality indices and multiple child outcomes, the 90 studies yielded 870 associations. Nevertheless, results initially report on our quality categories at the level of the study (i.e. number of studies, sample size, child outcomes considered, risk of bias, study country), to first characterise this literature. Subsequent reporting generally focuses on the level of associations (i.e. percentage of positive, negative and null associations), with the retention of study-level reporting where this was more appropriate (e.g. risk of bias ratings, sample size). Further detail on study characteristics and findings can be found in Supplementary Online Resource 4.

ECEC Global Quality

Sixteen studies investigated 70 cross-sectional or longitudinal associations of ECEC global quality with child outcomes. These studies operationalise global quality as the accumulation of diverse infrastructural, environmental and experiential indicators believed to be influential to child development, within formal ECE settings. The 17 studies were diverse in sample size (M = 1590.00; ranging from 131 to 8950), child outcomes they considered (academic achievement, attention control, broad or global academic outcomes, prosociality, social-emotional competence) and risk of bias ratings (M = 6.88; ranging from 3 to 9). There was less variability in study country (58.8% USA, the rest from Germany, England, Indonesia, Chile and Australia) and quality measure (all studies used ECERS-R or -3, alone or as part of a composite score).

Within the studies, evidence for the association of global quality indices with child outcomes was limited. Specifically, 12 of the 70 associations (17.1%) were significant, and all significant associations were positive. This pattern was similar across both outcome categories: cognitive-academic, 7 significant from 50 associations (14.0%); social-emotional, 5 significant from 20 associations (25.0%).

For the 50 associations with cognitive-academic outcomes, there was little clear pattern that differentiated studies yielding positive versus null associations. Studies finding positive and null associations did not markedly differ in quality measure, child outcome (i.e. positive associations found for 13.2% of academic, 16.7% of attention control and 16.7% of broad or global academic outcomes), study design (i.e. 16.7% of cross-sectional studies reported significant positive association compared to 13.2% for longitudinal studies) or risk of bias ratings (Mpos = 6.29, range 5–8; Mnull = 6.98, range 3–9). Studies finding positive associations had a smaller (albeit still large) mean sample size (Mpos = 481.86, range 131–733) than studies reporting null associations (Mnull = 1829.02, range 131–8950).

For the 20 social-emotional outcomes evaluated, studies more often reported a significant association if they were conducted outside the USA (44.4% significant, compared to 9.1% for US studies), cross-sectional (33.3% significant, compared to 12.5% in longitudinal designs) and higher in their risk of bias (Mpos = 6.00, range 3–8; Mnull = 7.00, range 3–9). These studies did not clearly differ on quality measure, child outcome or sample size.

ECEC Global Process Quality

Thirty-one studies investigated 141 associations of ECEC global process quality with child outcomes. Global process quality was operationalised as the social, emotional, physical and instructional elements of interactions that take place between an adult and child(ren) within a learning experience or social interaction (Pianta et al., 2005). These studies were diverse in quality measure used (ECERS-R or -3 subscales, 38.7%; CLASS or CLASS PreK, 35.5%; ECERS-E, 16.1%; CIS, 9.7%; and SSTEW, 3.2%), sample size (M = 1929.65; ranging from 102 to 13,565), child outcomes (academic achievement, attention, broad cognition, social-emotional competence) and study quality (M = 7.19; range 3–9). There was less variability in study country (64.5% from USA, the rest from Germany, England, Ecuador, China, Australia and Africa).

Amongst these studies, 44 of the 141 associations (31.2%) were significant–all but two of these were positive. This pattern of results was similar across outcome categories: cognitive-academic, 34 significant from 107 associations (31.8%; 33 positive, 1 negative); social-emotional, 10 significant from 34 associations (29.4%; 9 positive, 1 negative).

For cognitive-academic outcomes, studies reporting significant positive associations had larger samples (Mpos = 2457.21; range: 366–13,565; Mnull = 1676.00; range: 138–8950), were more likely to be conducted outside the USA (43.6%, compared to 23.5% from USA) and were more likely to be longitudinal in design (33.3%, compared to 14.3% for cross-sectional). There were also differences by quality measure (in descending order, by percent of associations significant): ORCE (100.0%), composite indices of multiple (sub)scales (62.5%), CLASS Pre-K or K (44.8%), SSTEW (40.0%), ECERS-E (37.5%), subscale/factor scores from ECERS-R or 3 (14.6%) and CIS or MELE-A (0.0%).

For social-emotional outcomes, factors that differentiated studies finding positive from null associations were sample size (Mpos = 3051.44; range: 138–8950; Mnull =1464.83; range: 138–8950), study design (41.7% significant in cross-sectional, compared to 18.2% from longitudinal designs). Significant positive associations were also more prevalent in studies from the USA (37.5%) compared to those conducted outside the USA (16.7%). One study from the USA (N = 422) found significant negative association of ECERS-E with children’s social skills, as measured by the SSIS.

Quality Pedagogical Interactions

Fifty-three studies investigating 338 associations between quality pedagogical interactions and child outcomes were diverse in sample size (M = 821.38, range 44–4029), child outcomes assessed (academic achievement, attention, global cognition, prosociality and social-emotional competence) and risk of bias ratings (M = 6.79, range 3–9). Studies were also diverse in the dimensions of quality pedagogical interactions considered: building positive relationships, 186 associations; routines and structure, 104 associations; conversations and questioning, 37 associations; and supporting play, 11 associations. There was less variability in the quality measures used (71.7% using domains of CLASS, CLASS PreK or Class-T) and study context (69.8% from the USA, the rest from the Netherlands, China, Australia, Canada, Chile, Denmark, Germany, Peru, Portugal, Turkey, United Arab Emirates, Africa and Singapore).

Building Positive Relationships

Forty-seven studies investigated 186 associations between building positive relationships and child outcomes. Building positive relationships was operationalised as positive social-emotional practices in child–educator relationships (as indexed, e.g., by the CLASS emotional support subscale, or ORCE). Forty-two of the 186 associations (22.6%) were significant – 40 positive, 2 negative. This pattern was similar across outcome categories: cognitive-academic, 35 of 149 associations significant (23.5%; 34 positive); social-emotional, seven of 37 associations significant (18.9%, 6 positive).

For the 149 cognitive-academic outcomes, significant positive associations were more often reported for studies from the USA (27.3%, compared to 10.3% for non-US countries). There were also differences by quality measure, as follows (in descending order, by percent significant): CLASS score, generated from factor analysis (75.0%), ORCE (55.0%), study specific or adapted codes (50.0%), relevant subscales of the COS-K (33.3%%), CLASS, -PreK or -T subscale (14.9%), SSTEW or HOME subscales (0.0%). The study finding a negative association was not clearly distinct; they used CLASS subscale and found a negative association with an attention control outcome (for which other studies found positive association).

For the 37 social-emotional outcomes, studies more often reported positive significant association if they were conducted in the USA (22.2% significant compared to 0.0% for non-US countries), investigated social-emotional outcomes (21.4% significant, compared to 0.0% for prosociality outcomes) and had larger sample sizes (Mpos = 1443.97, range 223–2983; Mnull = 808.00, range 120–2800). There were also differences according to the quality measure used (in descending order of proportion significant): relevant subscales of COS-K (50.0%), study specific or adapted codes (50.0%), ORCE (33.3%), relevant subscales of the CLASS, -PreK, -T (14.3%) and relevant subscales of the AP, SSTEW or HOME (0.0%).

Routines and Structure

Thirty-two studies investigated 104 associations of routines and structure with child outcomes. Routines and structure were operationalised as positive organisation and management of child routines and structure throughout the day (e.g. CLASS organisational support subscale). Twenty-six (25.0%) associations were significant and all except one were positive. This pattern of results was similar across outcome category: cognitive-academic, 23 of 97 associations significant (24.7%; 23 positive); social-emotional, 2 of 7 associations significant (28.6%; all positive).

For the 97 cognitive-academic outcomes, studies finding positive and null associations did not notably differ in quality index as all studies utilised the organisational support subscale of the CLASS or CLASS-PreK, or a factor score that aligns with this dimension (e.g. positive management and routines – also generated from the CLASS instrument). There were differences by study design (only longitudinal studies reported significant associations, albeit only 29.1% did so) and study country (27.9% significant for USA, compared to 13.8% from non-US countries). For the six social-emotional outcomes, all evaluated the same quality measure and child outcome; however, the sample size yielding positive associations was larger (Msig = 137, range 491–2983) than those yielding null associations (Mnull = 1053.20, range 422–2237).

Conversations and Questioning

Seven studies investigated 37 associations of conversations and questioning with child outcomes. This was operationalised as the use of conversational features that are conducive to fluency, connectedness and joint engagement (e.g. conversational turns, open-ended questions, cognitive facilitation). Thirteen of the 37 associations (35.1%) were significant and positive, and all were from the cognitive-academic outcome grouping (13 of 31 associations significant, 41.9%). None of the 6 associations tested for social-emotional outcomes were significant.

For cognitive-academic outcomes, there was little clear pattern that differentiated studies yielding positive versus null associations. The studies finding a significant association had lower risk of bias (Mpos = 6.00, range 3–9; Mnull = 5.00, range 3–9), but this was skewed by a single study with higher risk of bias accounting for ~45% of associations considered (n = 14, Wasik et al., 2006).

Supporting Play

Two studies investigated 11 associations between supporting play and child outcomes. This was operationalised as interactions and educator facilitation in the context of play. Seven of the 11 (63.6%) associations were significant, and all positive. The sole socio-emotional outcome was significant and positive. For cognitive-academic outcomes, 6 of 10 associations (60.0%) were significant and positive. All 10 of the cognitive-academic associations were reported in a single study (Bigras et al., 2017), and thus there were no differences in quality measure, study design, study context, sample size or risk of bias rating. There was also little differentiation in outcome measured (>75.0% academic).

Quality Interactions Related to Curriculum Content

Fifty-seven studies investigated 323 associations between quality interactions related to curriculum content and child outcomes. These were diverse in sample size (M = 890.98; range 115–4029), child outcomes assessed (academic achievement, attention, prosociality and social-emotional competence) and risk of bias ratings (M = 6.95, range 3–9). Studies were also variable in the dimensions of quality examined (i.e. instructional support, literacy and mathematics instruction, language modelling and reasoning, language stimulation, use of materials, explicit print instruction). There was less variability in the quality measures used (63.2% used CLASS, CLASS-PreK or CLASS-T domains; 17.5% used ECERS-E, -R or -3 subscales) and study context (64.9% conducted in USA, the rest from Germany, Australia, China, Denmark, England, Peru, Portugal, the Netherlands, Turkey, United Arab Emirates, Chile, Africa, Norway and Singapore).

Global Instructional Support

Forty-two studies investigated 130 associations between global instructional support and child outcomes. Global instructional support was operationalised by these studies as the intentional teaching strategies and elements to support children’s learning (e.g. provision of quality feedback, language modelling). Of the 130 associations, 38 (29.2%) were significant, and all except one were positive. This pattern differed between cognitive-academic outcomes (37 of 114 associations significant, 32.5%) and social-emotional outcomes (2 of 16 associations significant, 12.5%).

For the 114 cognitive-academic outcomes there was little that differentiated studies that yielded positive versus null associations. The main differentiation was in quality measure: only CLASS, -PreK or -T indices of global instructional support yielded significant associations (35.0% of the time). Other measures including relevant subscales of COS-K or ECERS-R, or composite indices, did not yield significant associations.

For the 16 social-emotional outcomes, for which only 2 significant associations were found, cross-sectional studies yielded a higher rate of significant associations (25.0%) than did longitudinal studies (8.3%). Moreover, studies yielding significant associations tended to be smaller in sample size (Mpos = 309.00, range 196–422; Mnull = 1288.07, range 120–2983) and have greater risk of bias (Mpos = 6.50, range 6–7; Mnull = 7.57, range 4–9).

Literacy

Twelve studies investigated 45 associations of child outcomes with quality-focused interactions between educator and child to support literacy development (e.g. shared book reading, literacy instruction, use of materials). Fourteen of 45 associations (31.1%) were significant, all positive. This pattern of results differed for cognitive-academic (14 of the 38 significant, 36.8%) and social-emotional outcomes (0 of 7 associations significant).

For the 38 cognitive-academic outcomes, studies yielding significant effects were larger in sample size (Mpos = 1788.00, range 265–2857; Mnull = 1392.29 range 379–2857), more often from the USA (44.0%, compared to 23.1% for non-US studies) and cross-sectional (50.0%, compared to 34.4% for longitudinal studies). The proportion of studies that found significant associations also differed by interaction measure as follows (in descending order): relevant (sub)scales of the WRITE (100.0%) or TBRS (100.0%), SABR (25.0%), relevant (sub)scales of ECERS-E (25.0%), measures specifically designed for that study (20.0%), ELLCO (0.0%) or Emerging Activities Snapshot (0.00%). All significant associations were with academic outcomes, with no significance for the five non-academic associations (e.g. attention control or global cognition) evaluated. All seven social-emotional associations yielded null results.

Language

Ten studies investigated 79 associations between quality interactions focused on language development and child outcomes. Interactions focused on language development were operationalised by these studies as interactions between educator and child to support language development (e.g. language modelling). Thirteen of the 79 associations (16.5%) were significant, all positive. This pattern of results was similar across outcome categories: cognitive-academic outcomes, 11 significant from 62 associations (17.7%); social-emotional outcomes, 2 significant from 17 associations (11.8%).

For the 62 cognitive-academic associations, only longitudinal studies yielded positive findings (albeit only 19.3%). The proportion of studies finding positive associations also differed by quality measure, as follows (in descending order): ORCE (75.0%), relevant subscales of the SSTEW (30.0%), Emerging Activities Snapshot (14.3%), LISn (6.3%), relevant (sub)scales of the ECERS-R (0.0%) and bespoke measures (0.0%). For the 17 social-emotional outcomes, significant associations were only reported for relevant subscales of the ECERS-R (40.0%) and not for other measures (e.g. SSTEW, LISn, ORCE). The sample size for positive associations was also higher (Mpos = 1497.50, range 138–2857) than those reporting null associations (Mnull = 869.13, range 138–2857).

Mathematics

Seven studies investigated 31 associations between quality of interactions focused on mathematics and child outcomes. These included interactions between the educator and child to support learning mathematics (e.g. mathematics instruction and supports). Of the 31 associations, 10 were significant (32.3%), all positive. This differed for the cognitive-academic (9 of 23 associations significant, 39.1%) and social-emotional outcomes (1 of 8 associations significant, 12.5%).

For the 23 cognitive-academic outcomes, studies yielding significant associations were larger in sample size (Msig = 2.250.89, range 532–2857; Mnull = 1375, range 325–2857) and from the USA (60.0%, compared to 23.1% in non-US studies). There were also differences by quality measure (in descending order by percent significant): relevant (sub)scales of the TBRS (100. 0%) or (sub)scales of the ECERS-E or -3 (17.6%). For the eight social-emotional outcomes, the single reported significant association was from a study with a smaller sample size (N = 491.00; Mnull =1218.29, range 138–2857), the only one from the USA, and the only study to utilise ECERS-3 subscales (the null associations were found using relevant subscales of ECERS-E).

Science and Environment

Three studies investigated 18 associations between the quality interactions focused on learning science and environment and child outcomes. Of the 18 associations, three were significant (16.7%) and this pattern of results was similar for cognitive-academic (2 of the 11 significant, 18.2%) and social-emotional outcomes (1 of 7 associations significant). Two positive associations were found for cognitive-academic outcomes and a negative association was found for social-emotional outcomes.

For the cognitive-academic outcomes, the study yielding the two positive associations was smaller in sample size (N = 669.00) than those yielding null associations (Mnull = 2127.67, range 669–2857). Similarly, for the social-emotional outcomes, the study yielding the significant negative association was also smaller in sample size (N = 138) than the studies yielding null associations (Mnull = 1393.33, range 669–2857). All studies utilised the same instrument (subscale of the ECERS-E), although the study yielding the significant association was cross-sectional (the others longitudinal). No other differentiating factors were identified.

Diversity

Three studies investigated 18 associations between the quality of diversity interactions and child outcomes. Diversity was operationalised by these studies as how well educators plan for individual learning needs and promote equality. Of the 18 associations, only 3 were significant (16.7%), all positive and for cognitive-academic outcomes. None of the seven social-emotional associations were significant. All positive and null cognitive-academic associations were from two studies (Howard et al., 2020; Sylva et al., 2006). In this study, positive associations were found for mathematics concepts and non-verbal reasoning, but not for pre-reading, language, spatial awareness or independence and concentration.

Discussion

The current study aimed to review and reconcile the evidence for associations of proposed dimensions of quality early interactions in formal ECEC with child outcomes. Specifically, we sought to uniquely categorise and identify which interaction quality indicators related to what child outcomes, considering the number, direction and consistency of associations. Results provided limited evidence that global ECEC quality indices predicted current or subsequent outcomes, and this increased only marginally when considering dimensions of interaction quality – regardless of whether those indices concerned relational or content interactions. A notable exception was supporting play, which showed the most consistent prediction (albeit within a limited evidence base and varying interpretations defining ‘play’) amongst the ECEC quality dimensions examined. These results raise questions for contemporary ECEC quality assumptions. That is, our results do not challenge that high-quality ECEC is influential to child development and learning – indeed, associations were found at a higher-than-expected rate if it were not a genuine relationship – yet do challenge approaches that imply indicators of quality (or global indices of quality) will show comparable levels and patterns of prediction across child outcomes.

This was indicated most strongly by the limited evidence for prediction of child outcomes by global quality indicators, wherein only 14% of 50 associations with cognitive-academic and 25% of 20 associations with social-emotional outcomes were significant. This may be related to the fact that global indices necessarily reconcile (and often give equivalent weight to) quality dimensions that are more and less influential to the outcome considered. For instance, ECERS-R scores combine quality ratings for interactions and activities with ratings for space, routines and furnishing (Harms et al., 2005). However, research indicates that measures of structural quality show low direct association with children’s developmental progress (Mashburn et al., 2008). Our finding contrasts common practice in some national ECEC Assessment and Rating programs wherein, for instance, Assessment and Rating outcomes generate, report and support promotion of overall quality ratings for ECEC centres (ACECQA, n.d.-b).

Rates of significance improved when considering ECEC process quality (32% of associations with ECEC global process quality were significant). For quality pedagogical interactions, 27% of cognitive-academic and 20% of social-emotional associations were significant and positive. This was comparable for quality interactions related to content; 29% of the cognitive-academic and 10% of social-emotional associations were significant. The quality indicator that showed most consistent prediction with child outcomes was ‘supporting play’ (60% of associations with cognitive-academic outcomes and 100% with social-emotional outcomes were significant, albeit this is derived from only 11 associations in two studies). The limited corpus of studies notwithstanding, there is theoretical and research support for the role of play in supporting early learning and development (e.g. Mathers et al., 2014) with most significant associations with language and communication.

Less expected was low rates of significance for quality dimensions aligned with child outcomes, such as building positive relationships with social-emotional outcomes (19% of 37 associations significant and positive); and global instructional support with cognitive-academic outcomes (32% of 114 associations significant). This may be related to the inclusion of structural quality elements (e.g. the number, breadth and duration of resources available) even within measures that focus on interaction quality. For instance, lower-order quality indicators (which must be satisfied to achieve higher quality rating scores) for ECERS-E Mathematics largely consider availability, diversity and qualities of early mathematics resources, whereas resource use and mathematics interactions are largely concentrated at higher quality levels (Sylva et al., 2010). Alternatively, associations between ECEC quality indices and child outcomes might be low. This is consistent with previous reports of modest associations (Mashburn et al., 2008), with subsequent explanations proposed that include unrealistic expectations and suboptimal scoring of quality rating measures (Thorpe et al., 2022); that quality indices should consider context in identifying and weighting indicators (contrasting treatment of quality as an immutable standard; Burchinal, 2018; Hunkin, 2018; Rentzou, 2017; Thorpe et al., 2022); and limited dose and duration of ECEC (compared to the early home learning environment; Gilley et al., 2015).

However, this pattern of highly inconsistent associations between quality indices and child outcomes was not uniform. Rates of significance differed by measure, whereby significance was more often found with CLASS subscales. This is consistent with prior reviews of this measure (Burchinal et al., 2011; Perlman et al., 2016). CLASS differs from some of the other reviewed quality measures in that there is more exclusive focus on what the educator is (or is not) doing, with less inclusion of structural elements. There is no systematic comparative review of quality measures found in the literature with which to triangulate this speculation, although recent meta-analyses of child outcomes associated with CLASS show significant pooled effects (Burchinal et al., 2011; Perlman et al., 2016), supporting overall predictive validity of this tool.

The rate of significance also tended to differ by: study design, with less rigorous studies (i.e. higher risk of bias, smaller sample size, cross-sectional) often finding higher rates of significance; and geography, with studies in the USA more often finding significance. The study design influences are unsurprising given smaller samples, reporting omissions and fewer controls can yield less reliable findings (Downes et al., 2016), and that associations are expectedly higher when the outcome is more proximal in time to the predictor (Hilton & Patrick, 1970). It is less clear why higher rates of significance might be found in the USA, although it is notable that both US and non-US studies routinely adopted measures that were designed in (and, arguably pertaining to) the US context. If this is indeed an influential factor – although our analyses cannot determine this – it would support arguments for quality definitions, indicators and designations as contextually bound, given interaction characteristics may have differential impacts across contexts. It is notable that most of the studies using CLASS (created in the USA) were conducted in the USA (69%), so findings may benefit from this cultural alignment.

While the current review provides an important overview and reconciliation of evidence on the child-level associations with ECEC quality, it does not consider the size of association or meta-analyse these to derive a pooled estimate of the magnitude of prediction that can be reliably ascribed to specific quality dimensions (and how these might be moderated by study characteristics). That is a different question, however, and future efforts to meta-analyse this research can be aided by our characterisations of quality dimensions, summary of evidence, and illustration of inconsistencies in measures, designs, outcomes and findings. Our approach is uniquely well situated, for instance, to characterise the available research and methods, and highlight inconsistencies in direction of associations (e.g. positive, null and negative) that were found for some outcomes. Nevertheless, future meta-analyses will be important to estimate the reliability and magnitude of association between quality dimensions and child-level outcomes. Those studies would do well to consider moderators of effect, such as sample characteristics, country of study and measures.

The issue of country of study is particularly salient in relation to the current study, given that more than half of the studies reviewed derived from the USA and this over-representation was exacerbated by considering multiple associations reported in each of the studies. A related issue for further investigation, which was beyond the aims and scope of this review, is cultural alignment; that is, the possibility that associations between quality indices and child outcomes may be stronger when the quality rating instrument is used in its country of origin (due, e.g., to emphasis of culturally important and socialised expectations). Given a prevailing consensus that there are not universal indicators of quality across cultures and contexts, this gives rise to important questions around which dimensions of quality are beneficial for whom, when and in what circumstances.

This study reviewed research published in academic journals and published quality measures, which may not take into account government quality rating tools. This is necessary, however, as these measures are often less transparent in content and/or degree of association with child outcomes. More transparency is needed to permit emerging insights, public discourse and improvements in the aspects of quality, to guide practices that have the greatest impact on child development and outcomes. There was also an under-representation of non-centre-based ECEC and, where this was included (in eight of the 90 studies), it was integrated into a single analytic sample that also (predominantly) contained children in centre-based ECEC care. Yet, it may not be the case that patterns of effect are identical in home-based ECEC settings, and future research should investigate this possibility. Furthermore, given the unparalleled impact of the home learning environment, it would be interesting to understand dimensions of family–child interactions that relate most highly to children’s outcomes.

This review raises important questions of current ECEC quality assumptions and practices. First, it suggests that current collections and/or methods of combining indicators of high-quality interactions in ECEC settings are inconsistent predictors of child outcomes. Additional research is needed to more explicitly indicate which indicators (rather than measures) are most predictive, and optimal methods of quantifying and combining them (e.g., see Thorpe et al., 2022). Appropriate research designs are essential here, although were not always adopted in the current corpus of data. Specifically, designs and analyses should concentrate on change in child-level outcomes over the period of ECEC attendance, rather than simply prediction of concurrent or subsequent developmental status (which can be influenced by a range of factors, such as conflation of children in higher SES areas more likely to have higher achievement and be enrolled in an ECEC service higher in quality) (Alexandersen et al., 2021). Second, it questions assumptions that quality is an immutable set of environments and experiences that are universally promotive of development. This does not contest that there are widely beneficial practices for child development, but rather that specific manifestations – that is, the nature, delivery, timing, frequency, duration and situation – of these practices may not be equally beneficial across contexts. Last, it challenges practices that imply or confer unwarranted stature to global quality designations. While many of these critiques are not new, the current review is the first to systematically reconcile this substantial literature archive to demonstrate the scope, scale and some of the nuances of these issues.