Introduction

Selective attention is a fundamental cognitive skill that is crucial for listening as it occurs in real-life situations where multiple irrelevant stimuli exist. It is suggested that selective attention is required to inhibit distracting stimuli such as noise and to selectively attend to speech for better understanding of spoken messages (Shinn-Cunningham & Best, 2008). Paradigms such as dichotic listening have been used to study the relationship between individual differences in working memory (WM) and selective attention in adults. For example, results from the dichotic listening paradigm suggest that adults with high working-memory capacity (WMC) have better controlled attention to process task-relevant information than those with low WMC (Colflesh & Conway, 2007; Conway, Cowan, & Bunting, 2001). Conway et al. (2001) using the selective attention paradigm asked participants to repeat stimuli presented to one ear while ignoring the stimuli in the other ear, and found that only 20% of individuals with high WMC detected their own name presented in the to-be-ignored ear compared to 65% of individuals with low WMC. Furthermore, in a dichotic divided-attention paradigm, Colflesh and Conway (2007) asked participants to repeat stimuli presented to one ear and also listen to their own name in the other ear. Interestingly, they found that 66.7% of the participants with high WMC were able to detect their name compared to 34.5% of the participants with low WMC. These results imply that individuals with high WMC are better able to flexibly adjust their focus of attention to attend to relevant stimuli depending on task demands.

Considerable evidence also supports the hypothesis that distracting background noise has a detrimental effect on cognitive task performance such as speech recognition, recall from verbal short-term memory, and reading comprehension (Francart, van Wieringen, & Wouters, 2011; Jones & Macken, 1993; Jones & Morris, 1992; Martin, Wogalter, & Forlano, 1988; Oswald, Tremblay, & Jones, 2000; Salamé & Baddeley, 1982). Often, the effect of noise is examined in terms of cross-modal distraction, called the irrelevant sound/speech effect. Serial recall is one of the most commonly used tasks to study the irrelevant sound effect because encoding and maintaining information in a temporal sequence is fundamental to most cognitive activities (Baddeley, 1990). Findings from a vast body of literature suggest that serial recall of visually or auditorily presented stimuli is impaired in the presence of irrelevant sound, irrespective of whether the sound is presented during the presentation of to-be-remembered stimuli or during the retention interval before recall (Colle & Welsh, 1976; Ellermeier & Zimmer, 1997; Hanley & Broadbent, 1987; Neath, 2000; Salamé & Baddeley, 1982). According to the WM model by Baddeley and colleagues, verbal auditory stimuli have automatic access to the phonological short-term memory store. Therefore, any target stimuli that are verbally coded are subject to interference by irrelevant speech/sound (Hanley & Bakopoulou, 2003; Salamé & Baddeley, 1982, 1989). Theoretically, the magnitude of irrelevant sound effect (i.e., difference in serial recall performance with and without irrelevant sound) is predicted to be negatively related to WMC. That is, individuals with high WMC are expected to be less susceptible to the effects of irrelevant sound than those with low WMC. However, findings from irrelevant sound effect studies in adults (Beaman, 2004; Elliott & Briganti, 2012; Sörqvist, 2010; Sörqvist, Marsh, & Nöstl, 2013) have consistently found no significant association between WMC and the magnitude of irrelevant sound effect.

Based on empirical evidence that WMC reflects individuals’ ability to maintain attention control over task relevant information while inhibiting task irrelevant information (Barrouillet & Camos, 2010; Colflesh & Conway, 2007; Garavan, 1998; Kane, Conway, Hambrick, & Engle, 2008; Kane & Engle, 2003; Redick, Heitz, & Engle, 2007), it is reasonable to assume that individuals with better attention control would be less distracted by irrelevant noise. The absence of an association between WMC and irrelevant sound effect in adults, is in contrast, however, to findings from the dichotic listening study by Conway et al. (2001), where a positive relation between WMC and inhibition of irrelevant channel information was observed. In addition, findings from irrelevant sound effect studies in children have suggested that as WMC increases with age, the magnitude of irrelevant sound effect decreases (Elliott, 2002) and that informational masking effects are larger in younger children relative to teenagers and adults (Wightman, Kistler, & O’Bryan, 2010). These results are interpreted as better controlled attention leading to a reduction in irrelevant sound effect.

Although dichotic listening paradigms have been used to support the controlled attention theory of WM in adults, studies in children have explored the effect of auditory distraction and its association to children’s WMC mainly using the cross-modal irrelevant sound effect (Elliott, 2002; Elliott & Cowan, 2005; Klatte, Lachmann, & Meis, 2010). While Elliott (2002) suggested a developmental reduction in irrelevant sound effect in school-age children, Klatte et al. (2010) reported that irrelevant sound effect was independent of age. Klatte et al. (2010) also found that attention generally did not play a role in irrelevant sound effect when stimuli such as tones, syllables, or narrative speech were used as irrelevant sounds. According to Klatte and colleagues, whereas developmental change occurred for the attention capture mechanism, interference resulting from the obligatory access to verbal short-term memory was independent of age. Similarly, Elliott and Cowan (2005) also found that cognitive processes that facilitate WM performance did not appear to prevent irrelevant sound interference in 7- to 9-year-olds. They suggested that processes such as verbal rehearsal that facilitated WM span performance may be counterproductive in preventing irrelevant sound interference particularly when the distractor was irrelevant speech. Aubuchon, McGill, and Elliott (2018) investigated the influence of auditory distraction on children’s verbal rehearsal during serial recall. Their findings implicated that irrelevant sound not only disrupted rehearsal processes due to the obligatory auditory overlap but also via diversion of attentional resources from rehearsal. Similarly, the duplex mechanism account of auditory distraction describes two distinct forms of interference, one from changing state stimuli and the other from deviant auditory sounds (Hughes, 2014). Relative to adults, interference from under-developed attention control mechanisms in children cause greater distractibility than interference-by-process mechanisms (Elliott et al., 2016; Hughes, 2014). In a study comparing auditory distraction in adults and children, Joseph, Hughes, Sörqvist, and Marsh (2018) found that poorer attention control (represented by the deviant sound effect) and not rehearsal-based processes, led to children’s vulnerability to cross-modal auditory distraction. According to their study results, susceptibility to changing state sounds (typically representing interference-by-process) was the same in children and adults and was not influenced by developmental changes in short-term memory or working-memory ability (Elliott & Cowan, 2005; Elliott et al., 2016; Joseph et al., 2018). That is, rather than differences in rehearsal abilities, susceptibility to auditory distraction was attributed to greater attentional diversion in children regardless of the task (Joseph et al., 2018).

In contrast, Röer, Bell, Körner, and Buchner (2018) reported that children were equally susceptible to cross-modal auditory distraction as young and older adults, regardless of whether the distractors were changing state or deviant sounds. The authors argued that their result contradicted the attention-control theory, but was compatible with the renewed view of age-related cross-modal distraction (Guerreiro & Murphy, 2010). According to this view, irrelevant auditory distraction is easily filtered out at both peripheral and central processing when the relevant information is visual. Hence, irrelevant speech effect is independent of age especially when the distraction is auditory and the relevant information is visual (Röer et al., 2018). This view also suggests that age-related differences may be more pronounced for unimodal distraction when relevant and irrelevant information are presented in the same modality. However, there is a paucity of studies that have used unimodal auditory distraction in both children and adults (Guerreiro & Murphy, 2010).

Based on our review of the literature, results on developmental change in irrelevant sound effects are mixed and most studies have used auditory distraction in cross-modal paradigms. It is still unclear whether WMC, an index of attention control, influences susceptibility to auditory distraction in children. In this study, our aim was to investigate auditory distraction due to informational masking and its association with WMC in a large sample of school-age children. To measure auditory distraction due to informational masking, we used a dichotic listening task. The dichotic digit task, which requires presentation of two streams of digits to each ear simultaneously (one to-be-ignored and one to-be-attended), was performed with and without distracting multi-talker babble (MTB). Because of the perceptual overlap between the target speech (digits) and the auditory distractors (MTB), the dichotic listening task emulates real-life conditions such as listening in a classroom. The nature of the dichotic digit task also makes it suitable to investigate two types of auditory distraction simultaneously, children’s susceptibility to intrusion from the to-be-ignored ear and their susceptibility to distracting background MTB.

Based on the attention-control theory of WM, we predicted that controlled attention is crucial for binaural separation during dichotic listening; hence it was hypothesized that intrusion errors would be negatively related to children’s WMC in both conditions (with and without MTB). Furthermore, based on previous cross-modal irrelevant sound-effect studies (Elliott & Cowan, 2005; Elliott et al., 2016; Joseph et al., 2018), we expected no significant correlation between WMC and susceptibility to auditory distraction (the latter indexed by the difference between overall errors in the MTB and no-MTB conditions).

Method

Data were collected at the University of Central Arkansas as part of a larger project. Each parent/caregiver and child gave informed consent for participation. Children received toy prizes and the parent/caregiver received monetary compensation and a hearing-language screening report of the child’s performance. Each child was tested individually in a quiet room. All experimental tasks were created and delivered using E-Prime 2 software (Schneider, Eschman, & Zuccolotto, 2012) on a PC with an external sound card (SoundDevices, USB pre2) and Sennheiser HD280 Pro earphone. Children completed multiple cognitive and language tasks for a larger project over two separate visits. They received a fixed task order. The dichotic listening tasks with and without MTB, respectively, were completed in separate sessions. The working-memory tasks were completed in the same session, but were separated by other language tasks. A scheduled rest break was built into each session. The total duration of each session was 1.5 hours.

Participants

Exclusionary criteria for participation were global intellectual disabilities, autism, frank neurological disorders, seizure disorder, hearing loss, and stuttering. 125 children (63 girls and 62 boys) participated (mean age = 9.57 years; range = 7.0–12.08 years; SD = 1.48). Figure 1 shows the distribution of the number of children by age. All participating children passed a pure-tone hearing screening that was conducted in a sound-treated audiometric booth using a clinical audiometer (ASHA, 1997). All children had normal-range non-verbal IQ scores as demonstrated on the Test of Nonverbal Intelligence-4 (Brown, Sherbenou, & Johnsen, 2010). The children also had normal or corrected peripheral vision as reported on the parent questionnaire and confirmed by vision screening using the Snellen chart. The majority of the parents reported no concerns regarding their child’s listening and spoken-language abilities. A subset of children was reported to have attention problems, difficulties in listening and recall, or academic difficulties. However, most of these children did not have significant functional impairments that qualified them for special services (except for seven children diagnosed with developmental language disorder or dyslexia). Sixteen children had a diagnosis of ADD/ADHD (attention deficit disorder/attention deficit hyperactivity disorder). The pattern of results remained the same, even when the subset of children with a diagnosis were excluded from analysis. Hence full-sample results are reported.

Fig. 1
figure 1

Distribution of the number of children by age. Bars are binned by 6 months

Dichotic listening task

Stimuli: Digits one through nine (excluding seven because it was bi-syllabic) spoken by female speaker in standard American English were recorded. The recording was done in a double-walled sound-treated booth using a Sennheiser microphone (Sennheiser 845 S) and Avid Pro Tools pre-amplifier (Mbox Pro). Digits were then edited to normalize duration and level, using Adobe Audition 3.0 and Avid Audio Pro Tools 8.1. The duration of each digit was compressed or lengthened to make it equal to 500 ms and amplitudes were normalized to be of equal RMS value. Finally, visual inspection using spectrogram and waveform analysis, along with a listening check, was performed for all stimuli by four adult listeners, independently. This was to ensure that there was no audible distortion. Multi-talker speech babble for use as irrelevant background noise was created from a recording of a simultaneous spontaneous conversation between three school-age children (two boys and one girl) and two young adults (female). A single initial instruction and prompt was given for the participants to start their conversation. They were asked to discuss any topic of their choice. Conversations were recorded using a microphone (SoundField SPS 200) kept in the center on a tripod. Recorded babble was edited using Avid Pro Tools and Harpex sound-editing software to eliminate silent pauses and to create a continuous sounding MTB.

Procedure: The dichotic listening task was developed by Cherry (1953) to measure selective attention. This task was performed with and without multi-talker babble in the same order across participants. MTB condition was always in the second session. The signal-to-noise ratio was set at +8 dB (i.e., digits were 8 dB louder than speech babble) based on pilot data, to ensure that children were able to recognize the closed-set digits with 100% accuracy in the presence of MTB. This was to ensure that during the dichotic listening task, speech babble engaged cognitive resources due to informational masking (from attentional distraction) and not due to energetic masking of digits.

Children were presented with digit triplets simultaneously to both ears. Identical digits did not occur at the same time in both ears. Digits were time-aligned such that they began and ended exactly at the same time in each ear. The intensity of the digits presented to the two ears was the same and it was set at 75 dB SPL (sound pressure level). Digits were presented using a Sennheiser HD 280 Pro headphone. At the beginning of each trial, the child was prompted to pay attention to the randomly selected ear and ignore the digits presented to the opposite ear. The ear to-be-attended to was indicated by a beep in that ear simultaneous with an arrow on the screen pointing in that direction. Half of the trials were directed to right ear. Children recalled the digits by touching a 3 x 3 grid on the computer screen. On each trial, three digit pairs were presented with an inter-stimulus interval of 500 ms and children were instructed to recall the target digits in the same order of presentation. There are three possible errors on each trial. In the irrelevant-noise condition, babble was continuously present (i.e., during the prompt, stimuli presentation, as well as during recall). A total of 30 trials per condition were used to measure the errors in dichotic selective attention in this task.

In addition to total number of recall errors of any type in the MTB and no-MTB conditions, intrusion errors, i.e., recall of digits from the to-be-ignored ear (irrelevant channel interference) were also calculated. An error was counted as an intrusion if the digit recalled erroneously was in the same serial position in the to-be-ignored ear.

Working-memory tasks

Two tasks were used to obtain a composite (mean of z scores) measure of WMC (Conway et al., 2005).

Auditory WM task

Children completed the auditory WM subtest of the Woodcock-Johnson III Test of Cognitive Abilities (Woodcock et al., 2007). Children were presented with the standard audio-recorded lists of words and numbers at 75 dB SPL via headphones. Stimuli occurred at an inter-stimuli interval of 1 s (e.g., Stimuli: coat, 5, juice, 9). List length ranged from two to seven items with three trials at each length. The child was asked to repeat the words first in the correct order followed by the numbers in the same order (Response: coat, juice, 5, 9). If the child accurately recalled both the words and the numbers, the trial was given a score of 2 points. If either the words or the numbers were inaccurately recalled, then the score was 1 point. The examiner discontinued the test when three consecutive zero scores occurred at any list length. The outcome variable was total recall accuracy.

Digit working-memory task (visually presented)

This measure was based on the complex memory-span paradigm (Daneman & Carpenter, 1980) with processing and storage components. Computer-paced (time-controlled) stimuli were presented to the child. Practice for each task component preceded the test trials. First, the child saw a single-digit number on the screen followed by the next screen with two red squares on the top portion of the screen (i.e., small-small; big-big; small-big; or big-small randomly presented). The child was asked to judge if the two squares they saw were the same or different and provide their answer by touching a box labeled “Same” or “Different” on the lower half of the screen (Note: On practice trials it was ensured that each child could read these two words). After a same-different judgment, another single-digit number appeared followed by another pair of squares. After each set of items, the child recalled the numbers on a 3 x 3 grid on the screen. Numbers 1–9 (except 7) were used. The length of each list ranged from two to five items with three trials at each length. The outcome was total digits recalled in correct order and the maximum score was 42.

Analysis

Cronbach’s alpha coefficients of internal consistency for non-standard experimental tasks (digit WM and dichotic listening) were good (α >.85). The two WM tasks were significantly correlated (r =.55, p < .001) and a composite (mean) of z scores from the two was created and re-standardized as an index of WMC. Outcome variables from the dichotic listening task administered with and without MTB were: Total errors of any type and total intrusion errors from the to-be-ignored ear. Children’s magnitude of susceptibility to MTB was operationalized in two ways: (1) by subtracting the total number of errors of any type made without MTB from the total number of errors of any type in MTB condition for each child (Elliott & Cowan, 2005) and (2) a ratio of the total number of errors of any type in MTB divided by the total number of errors of any type in no-MTB condition.

First, exploratory data analysis was conducted, including computation of summary statistics, correlations, and visualization of aggregated data (collapsed across trials, by condition) via scatterplots. Paired t-tests on the aggregated number of intrusions and total errors was then conducted to assess differences between trials conducted with and without MTB. In order to investigate the association between the magnitude of susceptibility to noise and WMC, after controlling for age, multiple linear regression was applied.

Finally, generalized “logistic” linear mixed-effects models (GLMM) were fit separately for intrusions and total errors of any type to further investigate the role of WMC and potential moderating effect of MTB, while controlling for children’s age. GLMM can be performed without the need to aggregate data across trials, thus not losing valuable information. One benefit of GLMM method is that participants with partial data are able to be incorporated, instead of utilizing naïve list-wise deletion. More importantly, the true hierarchical nature of the repeated measures is captured and correctly modeled, thus avoiding inflation of error rates and spurious results (ecological fallacy, Simpson’s paradox). For each of the two outcomes, a series of two-level, random intercept nested models were fit based on the theoretical foundation previously presented, and the likelihood ratio test was employed to assess significance of model terms (Hox, Moerbeek, & Van de Schoot, 2018).

All analyses were conducted in R 3.6.1 (R Core Team, 2019) and the `glmer()` function in the `lme4` package (Bates, Maechler, Bolker, & Walker, 2015) was utilized for GLMM analysis, with a significance level of .05 being applied unless otherwise stated. A full documentation of all code and output is provided in the Supplemental Material.

Results

One child who was missing both WM measures due to attrition was excluded from all analyses. Similarly, three subjects only completed the dichotic listening task without MTB and were thus excluded from the paired t-tests and multiple linear regression but were able to provide information to the GLMM analysis. Summary statistics for all measures are presented in Table 1.

Table 1 Summary statistics for participant and aggregated trial measures

As shown in Table 2, participant age was correlated with WMC (r = .50, p < .001), yet scatterplots revealed a quadratic trend with the initial positive association plateauing around age 10 (see Fig. 7, Supplemental Material). A curved trend was also observed for proportion of total errors of any type (r = -.42, p < .001), such that increasing age was associated with fewer errors up until around age 9, at which point errors leveled off, both with and without MTB (see Fig. 12, Supplemental Material). Interestingly, there was a weaker, t (245) = 3.91, p < .001 (Supplemental Material, p. 32) overall linear correlation for intrusions with age (r = -.20, p < .001) compared to total errors of any type (r = -.42, p < .001). Without MTB, the trend in proportion of intrusions was similar to proportion of total errors of any type, with increasing age associated with fewer intrusions up until about age 9, where they plateaued; however with MTB, the proportion of intrusions was fairly constant across all ages (see Fig. 13, Supplemental Material). Scatterplots further revealed very linear trends with increasing WMC being associated with a smaller proportion of total errors of any type regardless of MTB condition and smaller proportion of intrusions without MTB (see Figs. 14 and 15, Supplemental Material). However, proportion of intrusion errors were greater in MTB with increasing WMC. Paired-sample t-tests revealed that, as expected, both intrusion errors, t (120) = 7.49, p < .001 (mean difference = .07, 95% confidence interval (CI) [.05, .08], Cohen’s d = 0.68) and total errors of any type, t (120) = 14.5, p < .001 (mean difference = .17, 95% CI [.14, .19], Cohen’s d = 1.32), were significantly greater in the MTB compared to the no-MTB condition.

Table 2 Pearson correlation coefficients for child-specific and aggregated trial measures, values above the diagonal are for trials without MTB (N = 124) and values below the diagonal correspond to trials with MTB (N = 121)

For correlation and regression analysis, children’s susceptibility to auditory distraction was first indexed by the difference in total number of errors of any type with and without MTB conditions as commonly used in studies on the irrelevant speech effect. Susceptibility to auditory distraction was not associated with age (r = -.03, p =.76) or WMC (r = -.07, p = .43). Multiple linear regression confirmed that even after controlling for age, the magnitude of susceptibility to noise was not associated with WMC, b= -0.90, SE = 1.23, p = .46.

Parameter estimates for the best-fitting GLMM for the probability of an intrusion is shown in Table 3 and is listed as Model 1. It reveals that after controlling for age, there was a negative association between WMC and intrusions in the no-MTB condition, b = -0.24, p <.001; however, WMC significantly interacted with MTB, b = 0.35, p < .001, such that WMC exhibited a positive association with intrusions in the presence of MTB. Figure 2 illustrates how age and MTB moderated the role of WMC. Without MTB, children of the median age (9.5 years old) with the mean value of WMC had a 0.15 odds or 13% chance of an intrusion on any given digit (0.15/1.15 = .13). This is shown as the height of the dashed line in the center panel of Fig. 2 above the WMC value of zero (grand mean). In the no MTB condition, after controlling for age, a 1 SD increase in WMC was associated with a decrease of 20% in the odds of intrusion, OR = exp(-0.24) = 0.79. This is evident by the consistency of the dashed lines declining from left-to-right, for each of the three panels displaying illustrative ages (quartiles). Conversely, the solid lines representing the MTB condition are all increasing and show that a 1 SD increase in WMC was associated with an increase of 11% in the odds of intrusion, OR = exp(-0.24 + 0.35) = 1.11.

Table 3 Parameter estimates for two final generalized “logistic” linear mixed-effects models
Fig. 2
figure 2

Final (generalized) logistic linear mixed-effects model for the probability of an intrusion error (Model 1), with 95% confidence bands. To illustrate the multiple interactions, quartile breaks were chosen for age (Q1 = 25th percentile, Q3 = 75th percentile). WMC working-memory capacity (standardized composite)

The best-fitting GLMM for total errors of any type is presented as Model 2 in Table 3. This measure exhibited a more complex relationship with age, specifically MTB moderated the quadratic effect of age on the log-odds of making an error, but after correctly controlling for this, evidence of only a main effect of WMC was established, b = 0.80, OR = 2.23, p < .001. Without MTB, children of the median age (9.5 years) with the mean value of WMC had 0.32 odds or a 24% chance of making any type of error on any given digit (0.32/1.32 = .24). This is shown as the height of the dashed line in the center panel of Fig. 3 above the WMC value of zero (grand mean). All three age panels in Fig. 3 display roughly similar declining lines, such that a 1 SD increase in WMC is associated with 2.23 times lower odds of making an error. This analysis yielded roughly parallel solid (MTB) and dashed (no MTB) estimates across the range of WMC for any given age since there was no significant interaction between WMC and MTB. Figure 4 presents the same model, but instead of focusing the horizontal axis on WMC, age is the main focus. Interestingly, even with the involved interaction effect of age, the net effect was a constant gap between the probability of making an error in the with MTB versus without MTB condition. This echoes the multiple linear regression results that failed to find an association between a child’s magnitude of susceptibility to auditory distraction and WMC or age.

Fig. 3
figure 3

Final (generalized) logistic linear mixed-effects model for the probability of any type of error (Model 2) focusing on effect of working memory capacity, with 95% confidence bands. To illustrate the multiple interactions, quartile breaks were chosen for age (Q1 = 25th percentile, Q3 = 75th percentile). WMC working memory capacity (standardized composite)

Fig. 4
figure 4

Final (generalized) logistic linear mixed-effects model for the probability of total errors of any type (Model 2) focusing on effect of age, with 95% confidence bands. To illustrate the multiple interactions, quartile breaks were chosen for WMC (Q1 = 25th percentile, Q3 = 75th percentile). WMC working-memory capacity (standardized composite)

However, when susceptibility to auditory distraction was operationalized as the ratio of the total errors of any type in MTB and no-MTB conditions, it was not associated with age (r = 0.07, p =.45), but was positively related to WMC (r = .21, p = .019). Ratio score measures the relative change, i.e., how large or small the errors in MTB are relative to no-MTB condition. Multiple linear regression analysis confirmed that even after controlling for age, the magnitude of susceptibility to noise was associated with WMC, b = 0.52, SE = 0.23, p = .024. Increase in WMC was associated with a higher proportion of errors in MTB relative to errors without MTB. Figures 5 and 6 illustrate the positive relation between WMC on susceptibility to auditory distraction as measured by the ratio of errors with and without MTB. Children with high WMC made about 2.5 times more intrusion errors in the MTB relative to the no-MTB condition, whereas children with low WMC made about the same proportion of errors with or without MTB.

Fig. 5
figure 5

Ratio of probability of total errors of any type with and without MTB, based on final (generalized) logistic linear mixed-effects model focusing on susceptibility to noise in relation to WMC. To illustrate susceptibility to noise, three ages were chosen. MTB multi-talker babble, WMC working-memory capacity

Fig. 6
figure 6

Ratio of probability of intrusion error with and without MTB, based on final (generalized) logistic linear mixed-effects model focusing on susceptibility to noise in relation to WMC. To illustrate susceptibility to noise, three ages were chosen. MTB multi-talker babble, WMC working-memory capacity

Discussion

We investigated the association between susceptibility to unimodal auditory distraction and WMC using a dichotic listening task and WM tasks in 125 school-age children. Two main results were observed. First, susceptibility to auditory distraction (i.e., difference in total errors of any type between MTB and no-MTB conditions) did not significantly correlate with age, similar to the findings of Elliott and Cowan (2005). This finding did not replicate Elliott (2002). One potential reason for this difference is that Elliott utilized a visual serial recall task whereas we utilized dichotic listening serial recall. Nevertheless, visual stimuli also require verbal recoding for recall. Also noteworthy is that Elliott (2002) reported a larger susceptibility to auditory distraction only for the youngest children (8-year-olds) but not for ages 9 through 11 years. Our results agree with studies that have examined age-related effects in susceptibility to noise in 6- to 7-year-old children (Klatte et al., 2010) and in 7- to 9-year-olds (Elliott & Cowan, 2005). Elliott and Cowan (2005) demonstrated that there was no strong association between developmental change in WMC and irrelevant speech effect in children. Our results are also in agreement with Röer et al. (2018), who examined fifth graders (M = 10.79 years), third graders (M = 8.03 years), and adults. Using a cross-modal auditory distraction paradigm, Röer et al. (2018) concluded that irrelevant auditory information disrupted WM equivalently in children and adults and that there was no strong or consistent evidence of developmental change in auditory distraction based on the existing literature. Absence of correlation between age and susceptibility to auditory distraction appears counterintuitive given the known cognitive improvements that are typical of elementary school-age children. However, the current data suggest otherwise. MTB used in the current study may be comparable to the changing state of auditory distractors as described in the literature (Hughes, 2014). The effect of changing state distractors is reported to be automatic (Röer et al., 2018), unlike the deviant stimulus effect, which is influenced by developmental change in attentional ability. Perhaps the type of auditory distractor used contributed to the lack of association between susceptibility to auditory distraction and age in the present study. This possibility was also in agreement with results based on ratio of error scores (ratio of errors in MTB relative to no-MTB condition), which also did not correlate with age. There are several studies on age-related vulnerability to distraction that use cross-modal paradigms or visual-modality specific paradigms; however, there are very few studies on unimodal distraction (see Guerreiro & Murphy, 2010, for a review on this topic). The few studies that exist in the auditory domain have shown less consistent effects of age-related vulnerability. In addition, review of this literature in adults has shown that age-related effects on susceptibility to distraction are modality specific (Guerreiro & Murphy, 2010). The absence of reduction in susceptibility to auditory distraction with age, as seen in this cross-sectional sample of children, has significant implications for children’s ability to listen in noisy situations such as classrooms. We address this at the end of the Discussion.

Second, even though there was a strong negative correlation between WMC and total errors of any type in both MTB and no-MTB conditions (Fig. 2), the magnitude of auditory distraction as quantified by the difference in errors between the two conditions was not related to individual differences in WMC. This finding is consistent with the irrelevant sound effect literature in adults (Beaman, 2004; Elliott & Cowan, 2005), and adds to the limited literature in children (Elliott, 2002; Elliott & Cowan, 2005; Elliott et al., 2016). Children with high WMC had lower total errors, perhaps because they were more adept at using verbal rehearsal. However, in the presence of the MTB relative to the no-MTB condition, children with high WMC did not have an advantage as demonstrated by the lack of association between WMC and susceptibility to auditory distraction. This lack of association supports the notion that auditory distraction from MTB appears to be automatic and that the obligatory auditory processing of distracting irrelevant background noise is not effectively suppressed by the attention control mechanism of WM (Macken, Phelps, & Jones, 2009; Sörqvist et al., 2013). Furthermore, evidence demonstrating the lack of a strong association between children’s speech perception in noise and WMC supports the current findings (Magimairaj, Nagaraj, & Benafield, 2018).

The negative association between WMC and intrusion errors in the no-MTB listening condition suggested that children with high WMC were better able to inhibit irrelevant channel information in an optimal listening situation (see Fig. 2). This result aligns with the literature that supports the positive relation between dichotic listening and WMC (Cameron, Glyde, Dillon, & Whitfield, 2016; Hugdahl et al., 2009; Magimairaj et al., 2018; Sharma, Purdy, & Kelly, 2009; Tomlin, Dillon, Sharma, & Rance, 2015). Furthermore, this result is also in line with findings by Conway et al. (2001), where adults with high WMC successfully inhibited, as instructed, even the highly meaningful distractor (own name) presented to the to-be-ignored ear. Interestingly, in the present study, WMC was not a contributing factor in inhibiting intrusion from the to-be-ignored ear in the presence of MTB. It appears that even children with high WMC could not effectively block irrelevant channel information when attention processes were overruled by the obligatory processing of background noise. Perhaps the use of rehearsal strategies by children with high span were counterproductive in the presence of irrelevant speech (Elliott & Cowan, 2005).

The absolute difference in errors between the MTB and no-MTB conditions was comparable for children with low and high WMC, but the ratio of errors in MTB relative to no-MTB condition was significantly different. When susceptibility to auditory distraction was measured as the ratio of errors to account for the age-related changes, children with high WMC had greater probability of making an error compared to children with low WMC (see Figs. 5 and 6). There is evidence in the adult literature that individuals with high WMC use more attention to carry out tasks (e.g., Rosen & Engle, 1997). Performing the dichotic listening task in MTB demands cognitive control as attention is needed to ignore irrelevant ear interference, to inhibit concurrent MTB, and to recall digits in correct serial order. Children with high WMC were able to maintain overall better performance (low errors), but their good performance was vulnerable to distraction by concurrent MTB. This was reflected by the fact that the ratio of errors of any type in the MTB condition was higher for children with high WMC (Fig. 5). Children with low WMC exhibited about the same likelihood of making intrusion errors with and without MTB (Fig. 6). It is possible that children with low WMC used their available attentional resources to try to prevent irrelevant channel interference at the expense of making greater overall errors in the MTB condition. As shown in Fig. 7, children with high WMC made fewer overall errors in both the MTB and no-MTB conditions, but their proportion of intrusion errors relative to total errors of any type were greater compared to children with low WMC. In other words, the majority of errors made by children with high WMC in both the MTB and the no-MTB conditions were intrusion errors. The fact that children with high WMC showed more intrusion errors in MTB probably occurred because of concurrent needs for attentional resources to ignore background MTB and digits in the to-be-ignored ear. It is possible that under these demanding situations, children with high WMC made intrusion errors because of their stronger ability to monitor digits presented to the to-be-ignored ear, but children with lower spans made more catastrophic errors instead, such as hesitating, selecting no answer, or providing random answers. Elliott and Cowan (2005) proposed that the use of rehearsal strategies by high-span individuals may in fact be harmful in the presence of irrelevant speech as the distractor. This difference in outcomes between children with low and high WMC reflected their differential use of available attentional resources to suppress concurrent interference during task performance.

Fig. 7
figure 7

Aggregate proportion of errors by WMC where the height of the bar represents the total proportion of error and the color represents the type of error. WMC was binned by 1 SE. WMC working-memory capacity

These findings have important implications for all children, especially those who are reported to have significant difficulty understanding spoken language in noisy environments (e.g., children with weak language systems, attention disorders, or auditory processing deficits; Cameron, Glyde, Dillon, King, & Gillies, 2015; Gokula, Sharma, Cupples, & Valderrama, 2019; Magimairaj & Nagaraj, 2018). Listening difficulties in children are manifested in behaviors such as asking for frequent repetitions, problems following spoken directions, and easy distractibility particularly in complex auditory situations (ASHA, 2005; BSA, 2018). Listening in complex auditory environments can be taxing for children who already have weak receptive language abilities because it requires children to reconstruct speech quickly by filling-in masked, missed, or degraded speech. The prevalence of specific language impairment or developmental language disorder in kindergarteners is estimated to be 7.4% in the upper midwestern USA (Tomblin et al., 1997). Prolonged listening difficulties may negatively affect children’s academic performance and social communication ability (e.g., White-Schwoch et al., 2015; Ziegler, Pech-Georgel, George, & Lorenzi, 2009). Given that WMC is not associated with reduced susceptibility to auditory distraction as measured by the difference in error scores between background MTB and no-MTB conditions, reducing extraneous noise in children’s learning environments becomes crucial. In addition, the use of assistive listening devices (e.g., Frequency Modulation systems that enhance speech in noise), environmental acoustic modifications, and pedagogical approaches are integral to enhance target speech input in children’s learning environments. Furthermore, consideration of cognitive training programs such as WM training to improve children’s listening ability in noise does not gather support based on the results of this study and other studies (Beaman, 2004; Elliott & Cowan, 2005). Emerging studies in adults are mixed on the outcomes of WM training to improve speech-perception-in-noise performance (e.g., Escobar, Mussoi, & Silberer, 2019; Ingvalson, Dhar, Wong, & Liu, 2015; Wayne, Hamilton, Huyck, & Johnsrude, 2016). To our knowledge, while there are several studies in children on WM training, none have been implemented with listening-in-noise ability as an outcome. That WMC does not effectively constrain susceptibility to auditory distraction, implies caution in expecting far transfer-effects from WM training to improve listening in noise.

A second implication of the study results is related to the methodological aspect of using absolute difference versus ratio scores to quantify the effects of irrelevant speech effect. Almost all studies on irrelevant speech effect have traditionally examined the difference score. In the current study, we noted that analyzing relative change using ratio scores revealed an association between WMC and susceptibility to auditory distraction that was not captured using difference scores. This methodological issue may be a potential reason why previous studies observed no association between WMC and susceptibility to cross-modal distraction. A third implication of the study results relates to the use of dichotic listening tests for clinical assessment of auditory processing skills such as binaural separation/integration (ASHA, 2005). As shown in previous research and in the present study, performance on a dichotic listening task (even without distraction) is attention demanding and is influenced by individual differences in WMC (Hugdahl et al., 2009; Magimairaj et al., 2018; Tomlin, Dillon, Sharma, & Rance, 2015). Given the significant influence of top-down factors on this task, interpretations about auditory system functioning based on the dichotic listening task, especially in developmental disorders, must be done with caution. That is, unlike the utility of this test in detecting auditory deficits resulting from well-defined temporal lobe lesions, the use of this measure to evaluate developmental auditory processing problems should be done cautiously taking into consideration the role of top-down influences.

Overall, study results imply that cognitive mechanisms recruited during listening in noise are distinct relative to listening under optimal conditions and individual differences in WMC influence susceptibility to auditory distraction differentially for children with low versus high span. Future studies are much needed to identify and explain these mechanisms.

Conclusions

This study adds new information to the limited evidence on unimodal auditory distraction and its relation to WMC in children. Susceptibility to auditory distraction did not show a reduction with age in our cross-sectional sample of school-age children. There was no association between the magnitude of susceptibility (difference in total errors of any type with and without MTB) to auditory distraction and children’s WMC, but the ratio of errors in MTB relative to no-MTB condition was associated with WMC. Children with high WMC had a greater probability of making an error in noise compared to children with low WMC. Given that children’s WMC improvements with age do not appear to advantage listening in the presence of distracting background noise, it is crucial to enhance target speech in children’s learning environments and to determine what other factors may play an influential role. Further studies are needed to identify and explain mechanisms crucial to listening in noise.