Elsevier

Brain and Language

Volume 222, November 2021, 105022
Brain and Language

When it’s not appropriate to adapt: Toddlers’ learning of novel speech patterns is affected by visual information

https://doi.org/10.1016/j.bandl.2021.105022Get rights and content

Highlights

  • English-learning toddlers can adapt to a novel consonant-shifting accent.

  • Adaptation is reduced when context provides an alternation explanation for the change.

  • Use of contextual information may prevent learning of spurious speech patterns.

Abstract

In adults, perceptual learning for speech is constrained, such that learning of novel pronunciations is less likely to occur if the (e.g., visual) context indicates that they are transient. However, adults have had a lifetime of experience with the types of cues that signal stable vs. transient speech variation. We ask whether visual context affects toddlers’ learning of a novel speech pattern. Across conditions, 19-month-olds (N = 117) were exposed to familiar words either pronounced typically or in a novel, consonant-shifting accent. During exposure, some toddlers heard the accented pronunciations without a face present; others saw a video of the speaker producing the words with a lollipop against her cheek or in her mouth. Toddlers showed the weakest learning of the accent when the speaker had the lollipop in her mouth, suggesting that they treated the lollipop as the cause of the atypical pronunciations. These results demonstrate that toddlers’ adaptation to a novel speech pattern is influenced by extra-linguistic context.

Introduction

Our environments are always changing. In some cases, it makes sense to learn the specific patterns of variation; in other cases, it might not. One domain in which this is particularly striking is in the processing of spoken language. Speech is highly variable. Across speakers, a single sound or word may be realized in different ways for many reasons, including the speakers’ ages and accents. Given this variation, accessing the correct word is not a straightforward task. The pronunciation of a word by one speaker might map onto a different meaning for another speaker. For example, one speaker might have a “pet” cat and another a “pit” cat. Or two speakers might use the same word form to express different meanings (e.g., “pit” for the animal vs. a deep hole). Although semantic and non-linguistic context can help resolve these ambiguities, a system that learns about such systematic variation across speakers can process it more efficiently in the future.

A growing body of work has shown that adult listeners cope with this variation in speech at least in part through perceptual learning, or adaptation (Samuel and Kraljic, 2009, Kleinschmidt and Jaeger, 2015). For example, if a speaker produces a sh-like sound ([ʃ]) in the word compass, instead of s ([s]), listeners will adjust their /s/ category representation for this speaker to encompass a broader range of sounds (those that typically lie near the boundary between /s/ and /ʃ/ (Norris, McQueen & Cutler, 2003). Adaptation to a speaker’s productions leads to more efficient future processing of speech from that individual talker or from people with similar ways of speaking. This type of process is thought to contribute to improvements in the recognition of unfamiliarly accented speech observed after exposure (Bradlow and Bent, 2008, Clarke and Garrett, 2004, Maye et al., 2008).

However, adults do not show this kind of adaptation in all situations – and for good reason. Speakers often mispronounce words or transiently produce atypical pronunciations in certain situations (e.g., when they have a cold). Learning that these pronunciations are general characteristics of a speaker and changing category representations as a result would be maladaptive, as it would require subsequent unlearning during future interactions with the speaker. Fortunately, adults consider how strong the evidence is for a novel speech pattern before learning it (Kleinschmidt & Jaeger, 2015). For example, adult listeners use contextual information to judge whether an atypical pronunciation is characteristic of the speaker (i.e., reliable), or incidental (i.e., unreliable), and show more learning in the former case. In situations where there is contextual evidence consistent with an alternative explanation for an atypical pronunciation (such as when a speaker has a pen in her mouth; Kraljic, Samuel, & Brennan, 2008 or when the sound is phonetically motivated; Kraljic, Brennan, & Samuel, 2008), adult listeners do not update their category representations. Part of being an efficient listener, therefore, is using the evidence available to determine whether a novel pattern should be learned.

Like adults, toddlers can adapt to differences in pronunciation across accents, with this ability improving over development (Mulak et al., 2013, Potter and Saffran, 2017, Schmale et al., 2012, van Heugten et al., 2015). For example, toddlers exposed to an accent in which the vowel /a/ is produced as [ae] (block → black) later recognize other words when the same speaker uses [ae] in place of what would be [a] in the toddlers’ native dialect. In contrast, toddlers not exposed to the accent do not recognize these [ae] pronunciations later (White & Aslin, 2011). Toddlers also show improvements in recognizing words transformed by more complex accents following exposure (Potter and Saffran, 2017, Schmale et al., 2012, van Heugten and Johnson, 2014), though what specifically they are learning about the accents in these more complex cases is not known.

Although toddlers can learn a novel speech pattern (like a shift in the realization of /a/ from [a] /[ae]), it is not clear whether their learning is affected by the strength and type of evidence they receive. Are toddlers, like adults, discerning learners, who take other aspects of the situation into account when evaluating how robust a novel speech pattern is likely to be? Although, to our knowledge, this has not been examined in the speech domain, children have been shown to learn differently depending on the nature of the evidence they are given in other domains. For example, during word learning, property extension, and causal reasoning tasks, children draw different conclusions depending on how much data is provided and how it was generated. In one such study, Xu & Tenenbaum (2007b) presented children with either one or three trials in which they saw referents of a novel word. They were then asked to select other referents of the word. Although children did not make strong assumptions about the extension of a word after a single trial, they did make strong assumptions after three trials. In particular, they made the most conservative hypothesis consistent with the data observed across the three trials (seeGweon, Tenenbaum, & Shulz, 2010 for similar behavior in a physical property extension task). For example, if the label was consistently applied to Dalmatians, children assumed it applied to Dalmatians and not to dogs more generally. These inferences are thought to arise because children assume (unless shown otherwise) that the data are not being sampled randomly and so the narrow range of exemplars is meaningful (Gweon et al., 2010, Xu and Tenenbaum, 2007a). Tasks of physical causal reasoning also reveal the sophisticated ways in which children evaluate different patterns of data. For example, if children are shown that two blocks activate a machine together and that one of those blocks does not activate the machine when it is presented alone, they infer that the other block causes the machine to activate. If they instead see two blocks activate the machine together and later see that one of the blocks does activate the machine alone, they infer that the other block does not (Sobel, Tenenbaum, & Gopnik, 2004). Together, these lines of work demonstrate that children reason about the most likely causes of the data they observe, taking into account how much data there are, how the data were generated, and whether some apparent causes ‘explain away’ others.

In the present work, we ask whether 19-month-old toddlers use extra-linguistic information to constrain speech learning. More specifically, we ask whether toddlers, like adults, are conservative learners, and will show less learning of a novel pronunciation when the visual context is consistent with an alternative explanation for it (as in Kraljic, Samuel, & Brennan, 2008). For toddlers, does the presence of one potential cause for the atypical pronunciations (a mouth obstruction) rule out another potential cause (a novel accent)?

There are at least two possible reasons why one might expect toddlers to be less conservative in their learning of speech variation than adults. The first is that young learners might not yet have realized that pronunciation changes occurring in certain contexts are unlikely to persist. For example, a child encountering a new individual for the first time may not realize that the peculiar way they are talking is a result of the fact that they have a cold, because they have not yet learned that colds alter a speaker’s nasality. If the child learns (erroneously) that nasality is a general feature of that individual’s speech, then they may have difficulty understanding that individual the next time they encounter them. Rather than learning that nasality is a general feature of the individual’s speech, it would be better either to simply be more tolerant of the speaker’s atypical pronunciations in the moment (listen “through” the pronunciations by relaxing the criteria for word recognition) or to learn the novel pronunciations, but link them to potential conditioning contexts (like the speaker’s red nose), so that this knowledge can be applied in the future in the same contexts. Based on previous work, it is not clear which of the latter two approaches adult listeners take when they encounter a speaker talking with a pen in her mouth. However, they do not appear to learn that the novel pronunciations are generally characteristic of the speaker (Kraljic, Samuel & Brennan, 2008), likely because they are aware that mouth obstructions can alter a person’s speech. If toddlers are unaware of the relationship between mouth obstructions and atypical productions, then they should perform differently than adults.

A second reason that we might expect toddlers to be less conservative learners of speech variation than adults is that their phonological systems are more flexible overall - their native language speech categories (and the link between those speech categories and the lexicon) are not as well established. Indeed, in both the lab and the real world, infants and young children appear to learn novel speech categories and patterns faster and more readily than adults, who may not be successful at all. For example, although infants show distributional learning of speech categories after only 2 minutes of exposure (Maye, Werker, Gerken, 2002), studies with adults use longer exposure periods (Chládková and Šimáčková, 2021, Hayes-Harb, 2007, Maye and Gerken, 2000) and sometimes fail to demonstrate learning even after these longer exposures (e.g., Wanrooij, Boersma, & van Zuijen, 2014). In the real world, children are more likely to acquire a new community accent than adults are. Therefore, it is entirely possible that young learners will learn novel speech patterns even in cases where adults do not. That said, a conservative learning strategy (in which the data and potential causes are evaluated) would seem to be particularly adaptive for toddlers, to prevent them from learning unreliable or transient patterns as they are building up knowledge of the native language. Adults, in contrast, could in principle afford to be less conservative, because small amounts of data should not cause significant changes in their representations.

To ask whether toddlers’ learning of novel speech patterns is conservative, we tested toddlers in one of four (between-subject) conditions. In two of these conditions, the Characteristic and Incidental conditions, toddlers heard a consonant-shifting accent during exposure and saw a video of the speaker producing the accented words. In the Characteristic condition, the speaker held a lollipop to her cheek while she produced the words. In the Incidental condition, the lollipop was in her mouth during the pronunciation of the exposure words. Therefore, in the Incidental condition, toddlers had the type of contextual information (a lollipop in the mouth) that could indicate that the novel pronunciations were not necessarily characteristic of the speaker. In other words, the lollipop served as a potential alternative cause for the atypical pronunciations (rather than the speaker having a different phono-lexical system). Importantly, use of this contextual information to constrain learning in only the Incidental condition would require that toddlers understand that it is specifically a mouth obstruction (but not an object on the face) that can cause changes in speech productions.

If, like adult listeners, toddlers learn conservatively (and understand that mouth obstructions can cause pronunciation changes), then we predict that toddlers in the Characteristic condition will show learning of the accent, but that toddlers in the Incidental condition will not. If, on the other hand, learning is driven by the acoustic information alone, then we expect equivalent learning of the accent in these two conditions. We also included two additional conditions. The first was a No-face condition, to establish that toddlers could learn this type of consonant-shifting accent in the absence of visual information about the speaker. Previous work has demonstrated only that English-learning toddlers can learn the specifics of novel accents involving vowel shifts. Finally, a Control condition, in which toddlers heard only standard pronunciations during exposure, was included to examine toddlers’ treatment of the accented test pronunciations in the absence of previous exposure to these pronunciations.

Section snippets

Participants

One hundred seventeen monolingual, English-learning toddlers between the ages of 18 – 20 months old (M = 574 days, SD = 17) were randomly assigned to one of four exposure conditions: accent in the absence of face information (“No-face”; n = 25), characteristic accent (“Characteristic”; n = 30), incidental accent (“Incidental”; n = 31), and no accent1

Results

We first computed looking proportions for the pre-naming phase. The mean proportions of looking at the familiar target during pre-naming were 0.491 (SD = 0.051), 0.544 (SD = 0.057), 0.531 (SD = 0.055), and 0.545 (SD = 0.05) for the No-face, Characteristic, Incidental, and Control conditions, respectively. A one-way ANOVA found that there was a significant difference in pre-naming looking across conditions, F(3, 116) = 5.948, p = .001, η2 = 0.136. In particular, the average pre-naming proportion

Discussion

We learn novel speech patterns through exposure to auditory input. The present study explored whether toddlers’ learning of such patterns is based entirely on acoustic information or whether non-linguistic contextual information can influence the learning process. Toddlers were exposed to a novel speech pattern (a consonant-shifting accent) either without or with a visible speaker. Toddlers learned the novel accent (later recognized the accented pronunciations) when there was no visible

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

The authors thank Ashley Blayney-Hoffer, Joel LeForestier, Emily McIntosh, Shaquille Sealy, and Erin Kim for help with recruitment and testing, as well as Carragh Erhardt for appearing in the stimulus videos, and Ayah Taji, Chloe Whitham, Catherine Rocha, Katelyn Rowe, Laura Brooks, Justin Leung, and Kristen Thompson for assistance with data coding. We also thank the many families who volunteered their time to participate. This work was funded by an operating Grant from the Natural Sciences and

References (38)

  • C.T. Best et al.

    Articulating what infants attune to in native speech

    Ecological Psychology

    (2016)
  • P. Boersma et al.

    Praat: Doing phonetics by computer [computer program]

    Retrieved from

    (2014)
  • K. Chládková et al.

    Distributional learning of speech sounds: An exploratory study into the effects of prior language experience

    Language Learning

    (2021)
  • C.M. Clarke et al.

    Rapid adaptation to foreign accented speech

    Journal of the Acoustical Society of America

    (2004)
  • P.S. Dale et al.

    Lexical development norms for young children

    Behavior Research Methods, Instruments, & Computers

    (1996)
  • H. Gweon et al.

    Infants consider both the sample and the sampling process in inductive generalization

    Proceedings of the National Academy of Sciences

    (2010)
  • R. Hayes-Harb

    Lexical and statistical evidence in the acquisition of second language phonemes

    Second Language Research

    (2007)
  • D.F. Kleinschmidt et al.

    Robust speech perception: Recognize the familiar, generalize to the similar, and adapt to the novel

    Psychological Review

    (2015)
  • T. Kraljic et al.

    First impressions and last resorts: How listeners adjust to speaker variability

    Psychological Science

    (2008)
  • View full text