Introduction

While research has investigated native language abilities of children/adolescents with poor literacy skills quite extensively, less attention has been paid to their success in learning a foreign language in formal educational settings. Relatives, teachers, and allied health professionals often assume that the difficulties children/adolescents with poor literacy skills experience in their native language will transfer to the new language being learned (Sparks 2016). As a consequence, children/adolescents with poor literacy skills receive less support and are even exempted from foreign language instruction in many countries. One example is an Italian law, allowing children/adolescents with poor literacy skills to be completely excused from foreign language learning (see Palladino et al. 2013). Researchers in the field of foreign language learning express their concern about these policies, as to date the evidence regarding foreign language difficulties in poor readers/spellers is scarce. Wight (2015) suggests that the policies and practices of exempting students from foreign language study demonstrate that they are often discharged “(1) based on personal beliefs and preferences rather than on the basis of a carefully considered consensus of inclusion, and (2) in the absence of actual data about the potential successes of students with special needs” (pp. 41–42, Wight 2015). Although these policies aim to protect children/adolescents with poor literacy skills from experiencing failure, they also impede those students to gain cognitive and professional advantages associated with foreign language learning (e.g., on inhibitory control—Bialystok and Majumder 1998; in theory of mind development—Kovács 2009; on metalinguistic knowledge—Bialystok 2012). Moreover, access to cultural diversity remains limited without being able to speak an additional language. Despite having a profound impact on students’ future opportunities, these decisions are—to date—not based on a systematic evaluation of the existing evidence.

Thus, this study aimed to address this gap by conducting a systematic review of the available evidence on foreign language attainment in children/adolescents with poor literacy skills. More specifically, we identified, critically appraised and synthesized existing evidence reported by past research studies to address the following two research questions:

  1. 1.

    How successful are children/adolescents with poor literacy skills in learning a foreign language, as compared with children/adolescents with typical literacy skills?

  2. 2.

    Is successful foreign language attainment in children/adolescents with poor literacy skills influenced by moderators such as participant characteristics, foreign language instruction, and foreign language assessment?

In this way, we intended to provide a systematic overview of the current state of research and provide useful input for future research in this field by highlighting limitations of past research, as well as research gaps that need to be addressed.

In coherence with the preferred reporting items for systematic reviews and meta-analyses from the PRISMA statement (Moher et al. 2009—see S1 in the Supplemental Materials for a completed checklist of the PRISMA items: https://osf.io/be9x4/), we define the main elements of our research questions according to the PICO acronym (population, intervention, comparison, outcome—Moher et al. 2009) in the following sections.

Population: Children/Adolescents with Poor Literacy Skills

Worldwide a significant proportion of children/adolescents present with poor literacy skills that cannot be explained by medical, emotional, or neurological difficulties or insufficient literacy instruction. Prevalence rates of children/adolescents with poor literacy skills range from 3.1 to 17.5% across languages (e.g., 3.1–3.2% in Italian, Barbiero et al. 2012; 5% in German, Müller et al. 2014; 17.5% in American English, Shaywitz et al. 2008). The severity and type of difficulties children/adolescents with poor literacy skills experience varies across languages as a function of the complexity of the target writing system they are learning (Goulandris 2003; Landerl et al. 1996; Sprenger-Charolles et al. 2011). For instance, within alphabetic writing systems, “transparent” scripts with simple phoneme–grapheme rules (e.g., German, Italian, Spanish) are easier to learn than “opaque” scripts with complex phoneme–grapheme rules (e.g., English and French—Katz and Frost 1992; Ziegler et al. 2010).

Many different terms have been used to describe poor literacy skills: e.g., developmental dyslexia and/or dysgraphia, specific reading and/or spelling difficulty, and reading and/or spelling impairment, deficit, or disability (Elliott and Grigorenko 2014; Siegel 1988, 2007). In the present review, we use the term “children/adolescents with poor literacy skills” or “poor readers/spellers” to refer to students with low scores on reading and/or spelling tests.

The term poor literacy encompasses both reading and spelling difficulties in the present review. With respect to reading difficulties, we included inaccurate or slow reading of words, nonwords, sentences, or texts (International Dyslexia Association 2012). While some poor readers/spellers may additionally struggle with reading comprehension tasks, students solely facing reading comprehension difficulties and no other reading deficits (e.g., inaccurate or slow word or nonword reading) have been excluded from the group of children/adolescents with poor literacy skills in this study. This is mainly due to the fact that reading comprehension difficulties have been shown to be associated with oral language deficits (e.g., poor vocabulary knowledge, poor sentence comprehension), especially when they occur without additional difficulties in reading accuracy and/or fluency (e.g., Oakhill et al. 2003). Concerning spelling difficulties, children/adolescents with poor literacy skills may struggle in spelling words or nonwords either in dictation tasks and/or in spontaneous text production (Kohnen et al. 2015).

Moreover, we defined poor literacy skills as a below average performance (i.e., either one standard deviation, 1 year or grade below the expected level) in either a reading or spelling task or in both (Elliott and Grigorenko 2014). Studies in which participants were included based on self- or teacher reports were excluded (Snowling et al. 2011). We only incorporated studies assessing participants from their first year of formal schooling up to the last year of secondary education, whereas studies investigating students in post-secondary education were excluded.

Comparison: Children/Adolescents with Typical Literacy Skills

To be included in this review, studies had to compare foreign language performance of poor readers and spellers with control participants demonstrating typical literacy skills. Once again, reading or spelling tests had to be used to confirm typical literacy skills. Control participants had to score within the expected age group, grade, or less than one standard deviation below the expected level (Elliott and Grigorenko 2014).

Intervention: Foreign Language Instruction

Although the manner in which a foreign language is being instructed also influences its attainment (Saito and Hanzawa 2016), in this review, we considered all types of classroom-based foreign language instruction (e.g., language vs. content-based approaches; instruction vs. immersion approaches). However, children could only have little or no access to the foreign language being instructed outside of the classroom (see p. 9 Dixon et al. 2012 for a similar delimitation of foreign language context). Thus, all the studies stating that participants had additional access to the foreign language, other than limited access for example to music, films, or travel experiences, were excluded. Examples of excluded reports are studies focusing on the foreign language learning abilities of heritage speakers (i.e., children/adolescents that are exposed to a minority language at home).

Outcome: Foreign Language Attainment

Foreign language attainment involves mastering distinct subskills, such as for example discriminating foreign language speech sounds, comprehending spoken words, or reading and spelling words. It could be possible that children/adolescents with poor literacy skills show a lower performance in only some (e.g., reading and spelling), but not all foreign language subskills. A detailed investigation of existing research on different foreign language subskills of poor readers/spellers can shed light on this issue. However, as different research traditions in the foreign language learning literature have used different labels to describe these subskills, it is sometimes difficult to reconcile classification systems. For example, some authors distinguish between oral and written language, while others contrast receptive and expressive modalities (Nation 2013). Again, others segregate tasks focusing on prelexical, lexical, and nonlexical processing mechanisms (de Bot 1992; de Bot et al. 1997). For the purpose of this review, we consider a classification based on the tasks used to measure foreign language attainment (e.g., picture naming, rhyme detection, lexical decision, etc.) as the most appropriate one to give the reader a concrete idea of the measures that were used in past research. Table 1 shows this classification.

Table 1 Foreign language outcome measures

As we were mainly interested in exploring the linguistic aspects involved in foreign language attainment, we decided to exclude studies solely focusing on foreign language learning motivation. We included both self-developed and standardized language tests.

Potential Moderators of Foreign Language Attainment in Children/Adolescents with Poor Literacy Skills

Our secondary research question aimed to investigate if successful foreign language attainment in children/adolescents with poor literacy skills might be influenced by moderators such as participant characteristics, foreign language instruction, and foreign language assessment. We anticipated 11 potential moderators and predefined subgroups for each of these variables. This information is summarized in Table 2 and described in the following sections.

Table 2 Moderators and subgroups

Participant Characteristics

A first potential moderator refers to children/adolescents’ native language profile. Past research has shown that children/adolescents with poor literacy skills have very heterogeneous performance patterns in their native language (Bishop and Snowling 2004; Friedmann and Coltheart 2016; McArthur et al. 2013; Moll and Landerl 2009). This heterogeneity may extend to foreign language skills such that only some, but not all poor readers/spellers show a lower attainment than their peers with typical literacy skills. To investigate this issue, we planned to explore the impact of participants’ native language profiles on their foreign language attainment.

Based on past research, we first distinguished between poor readers/spellers that only show difficulties in written native language skills and others that also show impaired oral language abilities (Bishop and Snowling 2004). Within the written language domain, we aimed to explore evidence on three subgroups of children/adolescents with poor literacy skill: poor readers/good spellers, good readers/poor spellers, and poor readers/poor spellers (Moll and Landerl 2009).

Finally, we were interested in the impact of different types of reading deficits on foreign language attainment. More specifically, we differentiated poor readers/spellers with sublexical deficits (i.e., difficulties converting letters to sounds), lexical deficits (i.e., difficulties in recognizing written words leading to inaccurate or slow word reading), mixed deficits (i.e., sublexical and lexical impairments), and other reading deficits (e.g., processing letter order, recognizing letters, ordering letters, and moving letters between words—Friedmann and Coltheart 2016; Friedmann and Lukov 2008; Friedmann and Rahamim 2007; Kohnen et al. 2012, 2018; McArthur et al. 2013; Sotiropoulos and Hanley 2017).

Another moderator that could contribute to variable foreign language performance is the diversity of linguistic backgrounds. Several studies have reported higher foreign language attainment in bilingual as compared with monolingual students with typical literacy skills. For example, Nair et al. (2016) found a significantly better performance in early and late bilinguals as compared with monolinguals in a novel-word-learning task. Similarly, Tremblay and Sabourin (2012) reported significantly higher speech perception abilities in multi- and bilinguals as compared with monolinguals. Therefore, we aimed to distinguish between studies including only monolingual, bilingual, or both monolingual and bilingual participants.Footnote 1

Foreign Language Instruction

The quantity of foreign language input has often been highlighted as an important variable to explain foreign language attainment (Wright 2013). Thus, we decided to explore the impact of frequency and duration of foreign language classes on foreign language attainment in children/adolescents with poor literacy skills. In addition, language pairing between native and foreign language can moderate foreign language attainment (Connor 1996; Odlin 1989). Structural similarities or differences between the native and foreign language, for example between Indo-European and non-Indo-European languages, have been shown to either facilitate or impede the acquisition of a foreign language (Connor 1996; Melby-Lervåg and Lervåg 2014; Odlin 1989).

Moreover, Bialystok et al. (2005) pointed out that the orthographic similarity between two writing systems (e.g., alphabetic vs. ideographic writing systems) could explain the extent to which bilingual children were able to positively transfer literacy skills across languages. Within alphabetic writing systems, it seems that the consistency with which a grapheme is mapped onto a phoneme is an important moderator of literacy performance (e.g., Seymour et al. 2003; Ziegler et al. 2010). This is especially relevant in children/adolescents with poor literacy skills. Although students with poor literacy skills have been identified in many different languages (e.g., Frost 2012; Ziegler et al. 2003), some performance patterns seem to differ across languages (Goulandris 2003; Landerl et al. 1996; Moll et al. 2014). Lastly, onset age of foreign language instruction has been shown to influence foreign language skills in children/adolescents with typical literacy skills and was therefore also included as a potential moderator in this review (e.g., Bialystok 1997; DeKeyser et al. 2010; Friederici et al. 2002; Johnson and Newport 1989).

Foreign Language Assessment

Similar to onset age, the age at which foreign language abilities are assessed could play a role in explaining foreign language attainment in poor readers/spellers. Indeed, Bialystok (1997) highlighted that older learners rely on wider previous knowledge than younger ones and are therefore able to include new information into already existing conceptual categories. In contrast, younger learners tend to create new categories for the input they receive, which sometimes involves a longer learning process. Similarly, DeKeyser (2000) reported that younger learners rely to a greater extent on implicit mechanisms that may no longer be available to older learners. Older learners depend much more on explicit learning mechanisms. Both types of learning mechanisms have been shown to be beneficial in developing different foreign language subskills. For instance, it appears that speech production relies to a greater extent on implicit learning, while grammatical knowledge is acquired faster through explicit teaching (DeKeyser 2000). Thus, the age at which a foreign language assessment is conducted might impact the magnitude of group differences between children/adolescents with poor and typical literacy skills.

Method

The procedures used in this review were predefined in a protocol registered in the International Prospective Register of Systematic Reviews PROSPERO (see https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=69980). Differences between the protocol and this final report are detailed in the Supplemental Materials (see S2).

Eligibility Criteria

We summarized the selection criteria in nine “signaling questions” shown in Table 3.

Table 3 Eligibility criteria

These questions were used during title and abstract as well as full text screening. Furthermore, we only included group comparison studies (as opposed to case studies or case series) that reported on a quantitative comparison of both participant groups on a foreign language outcome measure. We only excluded data reported within doctoral dissertations if the same information was published in a peer-reviewed article.

Information Sources

We searched the following databases: Ovid databases (PsycINFO, PsycARTICLES, Medline, Embase), Wiley Database, PubMed, ProQuest (ERIC, ProQuest dissertations and Linguistics and Language Behavior Abstracts (LLBA)), and Web of Science. Furthermore, we used PsycExtra and Google Scholar to identify gray literature and hand searched the following journals: International Journal of Bilingual Education and Bilingualism, Bilingualism: Language and Cognition, Second Language Research, TESOL Quarterly, and International Journal of Bilingualism. We also planned to contact authors of more than three independent studies that met inclusion criteria per e-mail and ask for unpublished material, although this did not occur.

Search

Figure 1 details the search strategy used in this review.

Fig. 1
figure 1

Search strategy

This strategy was entered into the databases detailed above between February 10th and 26th of 2017 with no language or date limits. We adapted this search strategy to the requirements of the different databases (see Supplemental Materials S3).

Study Selection

First, we imported the titles and abstracts of the identified studies to Covidence (www.covidence.org). Duplicates were automatically deleted. Further studies that we clearly identified as duplicates were also dismissed. Using the signaling questions (see Table 3), two reviewers independently determined eligibility of studies based on titles and abstracts. Disagreements between two reviewers were resolved through an additional judgment by a third reviewer. In a second step, we repeated the same procedure for the full texts of the studies included from the first stage. If the study did not meet the eligibility criteria, we registered the reason for exclusion in Covidence.

Data Collection Process

In order to extract relevant data from included studies, we customized a data extraction form in Covidence (see Supplemental Materials S4). The first author completed the data extraction form for each of the included studies. The other two authors double-checked data entry against the original studies. Any incongruities were resolved by returning to the original data in the study. In cases of missing data, the corresponding authors of studies were contacted, and if this information was not available, the study (or measure) was excluded from the review.

Data Items

Overall, we collected data on 15 oral and written foreign language tasks (see Table 1). Similar tasks were grouped together (e.g., auditory discrimination of phonemes and syllables as discrimination of speech sounds). Included studies presented data on foreign language outcome measures as continuous data. Therefore, we extracted the mean, standard deviation, and number of participants of the group of children/adolescents with poor literacy skills and the control group for each outcome measure reported in each study. In relation to our second research question, we also gathered information on 11 moderators (see above Table 2).

Risk of Bias in Individual Studies

Two reviewers independently assessed the risk of bias of each included study by applying an adaptation of the ROBINS-I rating scale (Sterne et al. 2016—see Supplemental Materials S5). We judged the quality of each study within the following domains: (a) confounding, (b) selection of participants into the study, (c) classification of participant groups, (d) deviations from intended interventions, (e) missing data, (f) measurement of outcomes, and (g) selection of the reported result. For the last three domains, we did not only assess each study but also each outcome measure. Each domain was judged as having a low, moderate, serious, or critical risk of bias or providing no information in this respect. Again, judgments from two reviewers were juxtaposed, and in case of disagreements, decisions were discussed between all three reviewers, leading to a re-evaluation of the risk of bias in some cases. The entire process was completed in Covidence. As Sterne et al. (2016) suggest that most nonrandomized interventions studies will at least present an overall moderate risk of bias, we decided to only exclude studies with serious or critical risk of bias from further analyses.

Summary Measures

We planned separate meta-analyses for the 15 foreign language outcome measures described in Table 1. However, we only completed analyses if information from at least two studies was available (Borenstein et al. 2009). Following common meta-analytic procedures, we used standard mean differences (SMDs) with Hedges correction g for small sample sizes (Borenstein et al. 2009) as the unit of analysis. This allowed us to compare the average foreign language performance between children/adolescents with poor and typical literacy skills.

However, we were concerned that this information might not be representative of the performance of individual children/adolescents with poor literacy skills. Based on the heterogeneous performance of poor readers/spellers documented in past research, the extent to which group averages capture individual performances of children/adolescents with poor literacy skills would be expected to vary. Results from meta-analyses solely based on SMDs might therefore have limited potential to guide practical implications for individual children/adolescents with poor literacy skills.

To address this limitation, we also computed a second overall effect focusing on the variability across participant groups (the natural logarithm of the ratio between the coefficients of variation of both participant groups—CVR; Nakagawa et al. 2015). This allowed us to compare the magnitude of performance variability between poor and typical readers/spellers’ foreign performance. Such meta-analytic procedures have recently proven useful to determine the magnitude of intersubject variability in the field of biological evolution and nutrition (Nakagawa et al. 2015; Senior et al. 2016). To our knowledge, our study is the first to apply this procedure to synthesize evidence on children/adolescents with poor literacy skills so far. Given the well-documented heterogeneity of the study population (Bishop and Snowling 2004; Friedmann and Coltheart 2016; McArthur et al. 2013; Moll and Landerl 2009), adopting this procedure seems justified.

Synthesis of Results

Both types of effect sizes (SMDs and CVRs) were derived from the mean (M), standard deviation (SD), and number (n) of participants for each foreign language task. Some studies reported information on more than one group of children/adolescents with poor literacy skills or more than one control group. In those cases, we combined the M, SD, and n of both groups or excluded one of the groups (Borenstein et al. 2009).

Furthermore, many studies used more than one task for the same outcome measure (e.g., phoneme deletion and substitution tasks to assess phonological awareness). In these cases, SMDs and CVRs were computed separately for each task, and subsequently, values were aggregated for each outcome measure (Borenstein et al. 2009). Such aggregation methods take into account the correlation between aggregated tasks. However, as this information was not available for many studies, we assumed a large correlation of r = 0.50 (Cohen 1988) based on the similarity of the tasks being aggregated (Borenstein et al. 2009). The same procedure was followed for longitudinal studies reporting more than one data-point per outcome measure (Borenstein et al. 2009).

Before aggregating SMDs, we ensured that a negative difference indicated that the control group performed better than the group of children/adolescents with poor literacy skills for all comparisons. If measures were based on the occurrence of errors (instead of accuracy rates), the sign of the SMDs was reversed (Borenstein et al. 2009).

Based on all potential moderators that could influence foreign language attainment, we expected to find significant heterogeneity between studies. Therefore, we decided a priori to use random effects modeling to consider the study inverse variance and the between-study variance. We used Cochran’s Q statistic with a significance level of p < 0.05 to determine the presence of heterogeneity among effect sizes. Furthermore, to quantify heterogeneity, we calculated τ2 and I2, as an index of the variation between study effect sizes. We followed the guidelines of Higgins et al. (2003) and considered I2 values around 25%, 50%, and 75% as low, moderate, and high heterogeneity, respectively. To measure the overall effect, we used Z statistics with a Bonferroni-corrected significance level according to the number of comparisons that we were performing (Borenstein et al. 2009). Overall effects were only interpreted in the absence of significant heterogeneity between study effect sizes (Q statistic p > 0.05).

Risk of Bias Across Studies

In order to assess reporting bias, we completed a funnel plot analysis and applied the trim and fill method by Duval and Tweedie (2000a, b) and Duval (2005), as implemented in the metafor package in R (Viechtbauer 2010). Following the recommendations of Viechtbauer (2010), we selected the estimator “R0,” as it provides a test of the null hypothesis that the number of missing studies on the chosen side of the funnel plot is zero. We tested this for both sides of the funnel plot.

Moderator Analyses

To explore the impact of specific moderators on the foreign language attainment of children/adolescents with poor literacy skills, we planned to compute separate analyses for 11 moderators (see Table 2). However, in line with Littell et al. (2008), these analyses were only computed if data was available from at least 10 studies. We completed separate random mixed modeling meta-analyses for each moderator subgroup using the metafor package in R (Viechtbauer 2010). Finally, we used a Z test with an adjusted significance level according to the number of comparisons computed to detect significant differences between the overall effects of each moderator subgroup (Borenstein et al. 2009).

Results

Study Selection

The study selection process is illustrated in Fig. 2 and involved two rounds.

Fig. 2
figure 2

Flowchart of the search procedure

Round 1 started with the identification of a total of 1913 study reports, of which 543 duplicates were removed. Titles and abstracts of the remaining 1370 references were screened, and in total, we excluded 1312 studies that did not meet inclusion criteria. We reviewed full texts of the remaining 58 studies and excluded 42 of them on the basis of our inclusion criteria. Therefore, round 1 of the search procedure ended with 16 studies meeting inclusion criteria. In round 2, we additionally checked the reference lists of these 16 included studies and found 44 additional references that were imported into Covidence. We also screened the titles and abstracts of studies that cited the 16 included studies in Google Scholar and found 53 new relevant references that were also added to Covidence. Of these 97 new references, 70 duplicates were removed. We screened titles, abstracts, and full texts of the remaining 27 studies, but none of these studies met inclusion criteria and, thus, did not enter the current review.

Study Characteristics

Within the 16 studies included in this review, a total of 968 participants (404 poor readers/spellers and 564 control participants) were assessed. All studies focused on the comparison of children/adolescents with poor and typical literacy skills on at least one oral or written foreign language outcome measure. Studies were completed in the Netherlands (six studies), Italy (3 studies), Hong Kong (3 studies), Norway (2 studies), Poland (1 study), and China (1 study).

Participants

In the Supplemental Materials (see S6), we provide details on the information extracted from each study. With respect to participants’ native language profile, only three studies reported this information. Bekebrede et al. (2009) and Morfidi et al. (2007) included only children/adolescents with average oral native language skills, but poor written native language skills. None of the studies distinguished between selective reading and spelling deficits, and only two studies listed information on reading difficulty subtypes. While Bekebrede et al. (2009) included children/adolescents with lexical and other reading deficits, Haisma (2009) assessed participants with sublexical and lexical reading difficulties. Information was completely missing for 13 studies.

Concerning linguistic background, four studies reported that their participants had a purely monolingual background. Only Morfidi et al. (2007) stated that their sample also included bilingual children. Information was missing for the remaining 11 studies.

Foreign Language Instruction

In three studies, frequency of foreign language instruction consisted of two to four classes per week, and in two studies, participants received between 30 and 60 min of instruction per class. Information was missing for 13 studies.

No language limits were introduced in the search criteria, yet, the foreign language assessed in all studies was English. The native languages of participants were Dutch (six studies), Italian (3 studies), Cantonese (3 studies), Norwegian (2 studies), Polish (1 study), and Mandarin (1 study). Therefore, in terms of language pairings, 12 studies focused on the combination of two Indo-European languages, with alphabetic writing systems. In all cases, the native language was a predominantly regular orthography paired with a predominantly irregular foreign language orthography. In the remaining four studies, the language combination was a non-Indo-European native language with an ideographic writing system combined with a foreign Indo-European language with an alphabetic writing system.

Onset age of foreign language learning was early childhood (onset age before 6 years) for three studies, late childhood (onset age from 6 to 11 years) for five studies, and adolescence (onset age from 12 to 17 years) for three studies. None of the studies reported on participants in the transition from adolescence to adulthood (onset age from 18 years onwards). Lastly, information was missing for five studies.

Foreign Language Assessment

The age at foreign language assessment was late childhood (6–11 years of age) for seven studies and adolescence (12–17 years of age) for nine studies. None of the studies assessed participants in early childhood (before 6 years of age) or the transition from adolescence to adulthood (from 18 years of age onwards).

An overview of the foreign language outcome measures collected by each study can be found in the Supplemental Materials (see S7). No information was available on foreign language speech discrimination and production. Receptive vocabulary knowledge and word production skills were investigated by six and five studies, respectively. One study tested sentence comprehension and production and another one measured short-term memory. Four studies explored phonological awareness, while two studies measured letter knowledge. Word and nonword reading skills were assessed by 14 and seven studies, respectively. Seven studies gathered information on orthographic knowledge and four assessed reading comprehension. Spelling skills were measured by eight studies and two studies tested translation skills.

Excluded Studies

Reasons for excluding studies at the full text screening phase can be found in the Supplemental Materials (see S8).

Risk of Bias within Studies

Figure 3 provides an overview of the risk of bias of the included studies following the ROBINS-I rating scale (Sterne et al. 2016—see S9 for details).

Fig. 3
figure 3

Overview of risk of bias of included studies (n = 16)

As the information contained in the study reports was not sufficient to assess all of the domains of risk of bias, in many cases, we contacted the authors to obtain additional information.

Confounding

All studies had at least a moderate risk of bias in this domain, as they were all nonrandomized control. This means that by default confounding is expected, as different baseline characteristics of participants could have influenced the results (Sterne et al. 2016). However, all studies reported information on the measurement and control of other important confounding domains such as age, socioeconomic status (SES), nonverbal reasoning, or oral native language skills in a reliable and valid manner. Studies also employed pairwise matching between experimental and control participants or statistical adjustment with Bayesian analysis or ANCOVAs.

Selection of Participants into the Study

Two conditions had to be fulfilled to reflect a low risk of bias in this domain. First, the start of foreign language instruction and the timing of foreign language assessment had to be the same for most participants (Sterne et al. 2016). All included studies fulfilled this condition. Second, participant selection had to be unrelated to foreign language attainment (Sterne et al. 2016). This condition was only validated for the Helland and Morken (2016) study, as they explicitly stated that parents of all children in the participating schools were contacted. Therefore, this study received a low risk of bias judgment. In contrast, the remaining studies either did not report how they selected participants into the study or mentioned that school staff (e.g., counselors, teachers, etc.) had selected the participants. This indicates a moderate risk of bias, because school staff could have selected participants based on their knowledge of the participants’ foreign language performance. However, all studies, except de Bree and Unsworth (2014), confirmed the literacy status of each of the participants either through an external diagnosis of poor literacy skills or by administering a literacy test. In contrast, de Bree and Unsworth (2014) solely relied on the information of school staff for the selection of participants into the study. Although the authors acknowledged the presence of selection bias in their study in a footnote, this still represents a serious risk of bias. Therefore, we excluded this study from further analysis.

Classification of Participant Groups

Two aspects were crucial in this risk of bias domain. First, participant group status had to be well defined (Sterne et al. 2016). Children/adolescents with poor literacy skills had to perform at least 1 SD, year, or grade below the expected level on one or more of the following measures: word/nonword reading accuracy, reading fluency, and/or spelling. While all studies fulfilled this condition, Ding et al. (2013) and Helland and Morken (2016) still showed a moderate risk of bias. The group of children/adolescents with poor literacy skills in the study of Ding et al. (2013) was selected because they scored 1 SD below the mean of the study sample itself (n = 102). This represents a moderate risk, because the identification of children/adolescents with poor literacy skills could have been biased by the characteristics of the sample of total participants from which they were selected. Helland and Morken (2016) identified poor literacy skills based on a below average performance (< 25th percentile) on at least two out of four literacy measures. While for three of these measures independent standardized test norms were available, one (i.e., text reading) was developed by the authors (see Helland et al. 2011, for additional information to Helland and Morken 2016). For this measure, the cutoff criterion (< 25th percentile) was based on the sample of the study (n = 42). Therefore, similar to Ding et al. (2013), the identification of children/adolescents with poor literacy skills could have been biased by the characteristics of the specific study sample. Based on this information, we assigned a moderate risk of bias to Helland and Morken (2016).

A second crucial aspect pertaining to classification was that the assignment of participant status should not have been determined retrospectively (Sterne et al. 2016). Although two studies were especially vulnerable with respect to this condition, after a careful analysis, we decided that no risk of bias was present. First, Zhou et al. (2014) completed a longitudinal study in which poor literacy status was defined at the last testing point and retrospectively assigned to two previous testing points at earlier developmental stages. With this risk of bias domain in mind, we had already excluded the first two previous testing points during data extraction. Therefore, no risk of bias was present for the third data-point which was included in this review. Similarly, in their longitudinal study, Helland and Morken (2016) determined poor literacy status at the last testing point and retrospectively assigned to two previous testing points at earlier developmental stages. However, the authors only reported on foreign language outcome measures for the third testing point, when poor literacy status was defined. Therefore, no risk of bias was identified.

Deviation from Intended Interventions

All studies showed a low risk of bias in this domain. Łockiewicz and Jaskulska (2016) were the only authors who reported information on a potential risk of bias of deviation from intended interventions in the form of extracurricular private foreign language tutoring. Furthermore, foreign language teachers could have provided educational accommodations to children/adolescents with poor literacy skills if they had knowledge on their students’ native language performance (e.g., dyslexia diagnosis). However, both situations of potential risk of bias reflect usual practice and are therefore assigned a low risk of bias (Sterne et al. 2016).

Missing Data

We were not able to assess this risk of bias domain for most studies, because no explicit information was given in the studies, with the exception of Helland and Morken (2016) and van Viersen et al. (2017). A low risk was assigned to studies where the number of participants in the results matched the number of participants in the methods section. However, other studies did not provide the number of participants when presenting results. These cases were marked as providing no information to assess this risk of bias domain.

Measurement of Outcomes

All studies showed a low risk of bias with the exception of Palladino et al. (2016). In this study, spelling errors were scored according to a predefined grid. Since no information was available regarding the reliability of this measure, we assigned a moderate risk of bias.

Selection of Reported Results

We identified a moderate risk of bias for all studies because no preregistered protocols or statistical analysis plans were available for any of the studies (Sterne et al. 2016). However, the information on outcome measurements in the methods and results section of each study report was consistent.

Overall Risk of Bias

We determined the overall risk of bias as the highest risk of bias judgment received by a study in any of the domains (Sterne et al. 2016). This was a moderate risk of bias for all studies, with the exception of de Bree and Unsworth (2014). In this study, we found a serious risk of bias in the selection of participants and, therefore, assigned an overall serious risk of bias. This led to the exclusion of this study.

Meta-analyses of Foreign Language Outcome Measures

Before conducting the analyses, we completed data transformations (e.g., merging data from two data points, two control or experimental groups, etc.). Details are specified in the Supplemental Materials (see S10) and the datasets and code used to compute the analyses can be accessed on the Open Science Framework at https://osf.io/np97f/. Overall, we were able to compute separate random effects modeling meta-analyses for 10 out of 15 foreign language outcome measures. Analyses for the remaining five foreign language outcome measures were not possible due to limited data (less than 2 study reports). Results are presented in Table 4.

Table 4 Meta-analytic results

Meta-analytic results revealed overall SMDs and CVRs for 10 foreign language outcome measures. However, only four out of 10 overall SMDs could be interpreted, due to significant heterogeneity between study effects (Q statistic p < 0.05). The interpretable effects concerned foreign language receptive vocabulary knowledge, phonological awareness, letter knowledge, and reading comprehension. For foreign language receptive vocabulary knowledge, children/adolescents with poor literacy skills on average showed a similar performance to their peers with typical literacy skills. In contrast, poor readers/spellers scored poorer on foreign language phonological awareness, letter knowledge, and reading comprehension.

Results based on SMDs are derived from group averages, and they may not take into consideration individual differences within each participant group. Therefore, in addition to overall SMDs, we computed overall CVRs. This complementary analysis allowed us to estimate to what extent results from overall SMDs were likely to vary across individual poor readers/spellers. Due to significant heterogeneity between study effects (Q statistic p < 0.05), CVRs could only be interpreted with respect to two foreign language outcome measures, namely for phonological awareness and reading comprehension. According to these results, poor readers/spellers varied significantly more than control participants in their foreign language phonological awareness performance. In contrast, performance on foreign language reading comprehension varied to a similar extent across participant groups. Figures 4 and 5 depict results for each foreign language outcome measure in the form of forest plots.

Fig. 4
figure 4

Meta-analyses on foreign language: a receptive vocabulary knowledge, b spoken word production, c phonological awareness, d letter knowledge, and e word reading. Note. SMD = standardized mean difference; CVR = natural logarithm of the ratio of coefficients of variation. (Nakagawa et al. 2015)

Fig. 5
figure 5

Meta-analyses on foreign language: a nonword reading, b orthographic knowledge, c reading comprehension, d spelling, and e translation. Note. SMD = standardized mean difference; CVR = natural logarithm of the ratio of coefficients of variation. (Nakagawa et al. 2015)

Moderator Analyses

Due to serious data limitations (less than 10 studies), only three out of 11 planned moderator analyses (see Table 2) could be conducted on only one of the 15 foreign language outcome measures planned for this review (Littell et al. 2008). Thus, we focused on investigating whether (a) language pairing between native and foreign language, (b) onset age of foreign language instruction, and (c) age at foreign language assessment provided a better understanding of the heterogeneity observed between study effects in foreign language word reading. An adjusted significance level of p < 0.003 was applied to correct for the five additional comparisons (on top of the 10 previous comparisons) that were computed as part of the moderator analyses (Borenstein et al. 2009). Results revealed no significant impact of any of the three moderators mentioned above, neither for the SMD, nor for the CVR of the foreign language word reading performance between children/adolescents with poor and typical literacy skills. The exact values and forest plots reflecting these results are attached in the Supplemental Materials (see S11 to S13).

Reporting Bias

The influence of reporting bias could only be investigated with respect to the results of the meta-analysis on the SMDs for foreign language word reading due to limited data. Results of a funnel plot analysis with the trim and fill method by Duval and Tweedie (2000a, b) and Duval (2005), as implemented in the metafor package in R (Viechtbauer 2010), showed no evidence of reporting bias. The null hypothesis that the number of missing studies of the funnel plot was zero could not be rejected for the right (p = 0.25) or for the left side of the funnel plot (p = 0.50; Viechtbauer 2010). The corresponding funnel plot is attached in the Supplemental Materials (see S14).

Sensitivity Analysis

In order to assess if meta-analyses were influenced by our risk of bias assessment, we repeated the meta-analyses including the data reported by de Bree and Unsworth (2014), the only study that had been excluded due to a serious risk of bias in the selection of participants. This study contributed information to the outcome measures receptive vocabulary knowledge, word reading, nonword reading, and orthographic knowledge. Results revealed a difference between the analysis with and without the data from de Bree and Unsworth (2014) for receptive vocabulary knowledge, but not for word reading, nonword reading and orthographic knowledge. For receptive vocabulary knowledge, the overall difference in average performance between groups was not significant without de Bree and Unsworth (2014), but reached significance when this study was included. The overall difference between performance variation between groups was not interpretable due to significant heterogeneity with and without de Bree and Unsworth (2014). Overall effects for word reading, nonword reading, and orthographic knowledge could not be interpreted due to significant heterogeneity with and without de Bree and Unsworth (2014). Results are detailed in the Supplemental Materials (see S15).

Discussion

The structure of this section follows the guidelines provided by the Cochrane Collaboration (Higgins and Green 2008). First, we present a summary of our main results, followed by an analysis of the overall completeness and applicability of the evidence that was summarized. Second, we assess the overall quality of the evidence for each foreign language outcome measure according to the Grades of Recommendation, Assessment, Development and Evaluation guidelines (GRADE, Schünemann et al. 2008). Lastly, we portray implications for practice and future research.

Summary of the Main Results

Figure 6 summarizes the main results of this review.

Fig. 6
figure 6

Summary of main results of this review. Note. FL = foreign language; SMD = overall standard mean difference; CVR = overall coefficient of variation; PRS = poor readers/spellers; CG = control group; NL = native language; n/i = not interpretable due to significant between-study heterogeneity

From a total of 2010 study records initially identified, only 16 (< 1%) met inclusion criteria for the current review. Fifteen study reports displayed low to moderate risk of bias and were thus entered into the meta-analyses. Only one study was excluded due to serious risk of bias. We extracted data on 15 foreign language outcome measures for this review. Meta-analyses could not be conducted for the following five measures due to insufficient data: foreign language discrimination of speech sounds, production of speech sounds, sentence comprehension, sentence production, and short-term memory.

In contrast, we computed separate meta-analyses for the remaining ten foreign language outcome measures: receptive vocabulary knowledge, spoken word production, phonological awareness, letter knowledge, word reading, nonword reading, orthographic knowledge, reading comprehension, spelling, and translation. Results on overall SMDs revealed significant between-study heterogeneity for spoken word production, word reading, nonword reading, orthographic knowledge, spelling, and translation, and so the interpretation of these values was not possible. To investigate the source of between-study heterogeneity, we planned to compute moderator analyses on 11 moderators that we assumed could have an impact on foreign language performance in children/adolescents with poor literacy skills. However, this was not possible for foreign language spoken word production, nonword reading, orthographic knowledge, spelling, and translation due to insufficient data. We were able to conduct moderator analyses on foreign language word reading, although the information reported in the included studies was only enough to investigate the impact of three of the 11 moderators addressed in this review (i.e., similarity between native and foreign language, onset age of foreign language instruction, and age at assessment). Information was insufficient for the remaining eight moderators. Results showed that neither the similarity between native and foreign language, onset age of foreign language instruction nor age at assessment could explain the between-study heterogeneity that we found for foreign language word reading.

Overall SMDs for foreign language receptive vocabulary knowledge showed that children/adolescents with poor literacy skills on average achieved a similar performance as the control group. In contrast, their average performance was poorer on foreign language phonological awareness, letter knowledge, and reading comprehension as compared with the control group. Complementary meta-analyses on the difference of performance variation (CVR) revealed that the performance of individual poor readers/spellers in phonological awareness varied significantly more than that of control participants. In contrast, performance in reading comprehension varied to a similar extent across both participant groups. Comparisons of overall performance variation between poor readers/spellers and control participants could not be determined for foreign language receptive vocabulary knowledge and letter knowledge, as CVR results reflected significant between-study heterogeneity.

Overall Completeness and Applicability of Evidence

Overall completeness and applicability of the evidence summarized in this review is limited in three ways. First, the information summarized in this review only refers to some foreign language outcome measures (i.e., receptive vocabulary knowledge, word production, phonological awareness, letter knowledge, word reading, nonword reading, orthographic knowledge, reading comprehension, spelling, and translation). In contrast, insufficient information was found for other foreign language outcome measures (i.e., foreign language discrimination and production of speech sounds, sentence comprehension, and production skills). Hence, the findings are restricted to a small set of foreign language skills.

Second, foreign language attainment of children/adolescents with poor literacy skills was investigated with native speakers of a variety of Indo-European and non-Indo-European native languages. However, information on some of the most widely spoken native languages such as English and Spanish is currently missing. Furthermore, while no language limits were introduced in the search criteria, the foreign language assessed in all studies was English. This limits the conclusions that can be drawn from this review with respect to other languages. Considering that the English orthography has been characterized as an “outlier orthography” due to its irregular grapheme–phoneme mappings (Share 2008), it is unclear how generalizable these results are to more “regular” orthographies.

Lastly, and most importantly, significant heterogeneity between study effects seriously limited the interpretation of available evidence for several foreign language outcome measures (i.e., foreign language spoken word production, word reading, nonword reading, orthographic knowledge, spelling, and translation). While most studies indicated a trend toward a lower foreign language attainment of children/adolescents with poor literacy skills as compared with the control group, the magnitude of this effect varied significantly from study to study. Where does this heterogeneity come from? In the registered protocol of this review, we anticipated 11 moderators that could represent potential sources of heterogeneity. Yet, due to limited data, we were only able to investigate the impact of three moderators (i.e., similarity between native and foreign language, onset age of foreign language instruction, and age at assessment) on only one of 15 foreign language measures (i.e., foreign language word reading). While none of these three moderators contributed to explaining the observed between-study heterogeneity, it is possible that some of the other eight moderators, for which we did not have enough data, influenced the results.

Quality of the Evidence and Potential Biases in the Review Process

To assess the overall quality of the evidence for each of the foreign language outcome measures, we applied the GRADE guidelines (Schünemann et al. 2008). Following the guidelines, we began by judging all studies as “low quality of evidence,” because they were all nonrandomized trials. The evidence for each of the foreign language outcome measures could later be changed (i.e., upgraded or downgraded) following the criteria of Schünemann et al. (2008). We detail the specific reasons for each decision below.

For foreign language reading comprehension, we upgraded the evidence to a level of moderate quality of evidence, due to the overall large SMD in the absence of significant heterogeneity between studies. Furthermore, children/adolescents with poor and typical literacy skills varied to a similar extent in their performance. In contrast, we maintained a judgment of low quality for the evidence on foreign language phonological awareness and letter knowledge. Despite obtaining large overall SMDs in the absence of significant between-study variance, it was unclear to what extent SMDs were representative of the performance of individual poor reader/speller participants. In the case of foreign language letter knowledge, the reasons for this was a heterogeneous CVR. In contrast, for foreign language phonological awareness, results showed a higher performance variation in the poor reader/speller participant group than in the control group.

Finally, we downgraded the quality of the evidence on foreign language receptive vocabulary knowledge, spoken word production, word reading, nonword reading, orthographic knowledge, spelling, and translation to very low. For foreign language receptive vocabulary knowledge, we conducted a sensitivity analysis with and without the data of de Bree and Unsworth (2014). This was the only study that was excluded due to a serious risk of bias in the selection of participants. The results of the sensitivity analysis were inconsistent. Therefore, the evidence on foreign language receptive vocabulary knowledge in children/adolescents with poor literacy skills should be interpreted with caution. For foreign language spoken word production, word reading, nonword reading, orthographic knowledge, spelling, and translation, overall effects were not interpretable due to significant between-study variance. Moderator analyses for foreign language word reading did not contribute to explaining the observed variability, and so results should also be interpreted with caution. No potential publication biases were identified.

Implications for Practice

The results of this review provide evidence that children/adolescents with poor literacy skills on average show a lower attainment than their peers with typical literacy skills in foreign language phonological awareness, letter knowledge, and reading comprehension. However, we also found evidence of higher performance variation in foreign language attainment of poor readers/spellers than of control participants for phonological awareness. The sources of this variability have so far not been addressed by current research. Therefore, although children/adolescents with poor literacy skills seem to have a greater risk of experiencing foreign language learning difficulties, this might not be true for every individual child or adolescent with poor literacy skills. The common belief that children/adolescents with poor literacy skills show a lower foreign language attainment than their peers with typical literacy skills cannot be confirmed by the results of this review. Parents, teachers, and clinicians should keep in mind that an individual student with poor literacy skills might be just as successful as other students with typical literacy skills. Instead of relying on the false common belief that all poor readers/spellers will struggle in learning a foreign language, foreign language attainment should be closely monitored and support put in place when necessary.

Implications for Research

Available evidence on foreign language attainment in children/adolescents with poor literacy skills shows several limitations. Most importantly, future research should aim to better understand individual differences in foreign language attainment of children/adolescents with poor literacy skills. First, an investigation of the impact of moderators related to participant characteristics might aid in better understanding the variability observed in foreign language attainment in children/adolescents with poor literacy skills. Past research has described heterogeneous native language profiles of children/adolescents with poor literacy skills that are likely to contribute to individual differences also in foreign language attainment (Bishop and Snowling 2004; Friedmann and Coltheart 2016; McArthur et al. 2013; Moll and Landerl 2009). Similarly, participants’ linguistic background has been related to individual differences in foreign language attainment and is likely to be a source of the heterogeneity of foreign language attainment in poor readers/spellers reflected/observed in the results of this review (Nair et al. 2016; Tremblay and Sabourin 2012).

Second, information on the frequency, duration, and onset age of foreign language instruction should be reported. This would make it possible to assess the impact of these variables in future meta-analyses. Furthermore, studies are needed to investigate foreign language attainment in children/adolescents with poor literacy skills learning a foreign language other than English. Although English is undoubtedly the most frequently instructed foreign language worldwide, we need to know if the difficulties observed in children/adolescents with poor literacy skills in learning English as a foreign language extend to other foreign languages. Related to this, future studies should also assess children/adolescents with poor and typical literacy skills that speak some of the most frequently spoken languages, such as English and Spanish, as native language. This would allow for a more representative picture of foreign language attainment in poor readers/spellers for a large number of the world’s population.

Finally, future meta-analyses focusing on heterogeneous populations, such as children/adolescents with poor literacy skills, should include computations related to variation of outcome measures to capture individual differences. The common practice of only computing overall effect sizes based on central tendency measures such as the SMD between groups can result in misleading answers to practically relevant research questions. For example, in the current review, only relying on overall SMDs would have led to the conclusion that children/adolescents with poor literacy skills show a lower foreign language attainment than their peers with typical literacy skills. However, the fact that foreign language performance varied more in children/adolescents with poor literacy skills than in control participants emphasizes that this conclusion might not apply to a significant proportion of poor reader/spellers.

Conclusions

This report provides the first systematic synthesis of the available evidence addressing the question if children/adolescents with poor literacy skills show a lower foreign language attainment than their peers with typical literacy skills. This represents a unique contribution, because the success achieved by poor readers/spellers in different foreign language outcome measures as compared with typical readers/spellers has so far only been investigated in individual studies. We therefore contribute a basis to understand the results of individual studies in a broader context and increase statistical power to estimate overall effects (Borenstein et al. 2009). Furthermore, this systematic review provides an overview of the available evidence on this issue which can be useful to researchers in guiding future studies, as well as parents, teachers/specialists, and policy makers in guiding educational decision-making.