Elsevier

Brain and Language

Volume 198, November 2019, 104692
Brain and Language

Brain-behavior relationships in incidental learning of non-native phonetic categories

https://doi.org/10.1016/j.bandl.2019.104692Get rights and content

Highlights

  • Incidental paradigms show promise in promoting non-native speech sound learning.

  • Subjects completed fMRI sessions before and after incidental learning sessions.

  • Frontal brain regions differentiated non-native Hindi sounds even before training.

  • Individual success in learning predicted increases in frontal activation over time.

  • Findings support a role for frontal areas in non-native phonetic category learning.

Abstract

Research has implicated the left inferior frontal gyrus (LIFG) in mapping acoustic-phonetic input to sound category representations, both in native speech perception and non-native phonetic category learning. At issue is whether this sensitivity reflects access to phonetic category information per se or to explicit category labels, the latter often being required by experimental procedures. The current study employed an incidental learning paradigm designed to increase sensitivity to a difficult non-native phonetic contrast without inducing explicit awareness of the categorical nature of the stimuli. Functional MRI scans revealed frontal sensitivity to phonetic category structure both before and after learning. Additionally, individuals who succeeded most on the learning task showed the largest increases in frontal recruitment after learning. Overall, results suggest that processing novel phonetic category information entails a reliance on frontal brain regions, even in the absence of explicit category labels.

Introduction

Speech sounds have a complex internal structure, and in general, processing the fine-grained detail of these sounds relies on temporal brain regions such as the left superior temporal gyrus (LSTG; Desai et al., 2008, Liebenthal et al., 2005, Mesgarani et al., 2014, Myers, 2007). These temporal areas show tuning that is specific and structured according to the acoustic details of one’s native language phonetic categories. However, a number of studies suggest that the perception of phonetic detail, even if largely supported by superior temporal cortex, is not entirely divorced from frontal brain regions. Individuals with Broca’s aphasia, for instance, have shown subtle deficits in phoneme discrimination, though they make fewer errors than individuals with posterior brain damage (Blumstein, Baker, & Goodglass, 1977). This notion has also been supported by functional neuroimaging studies of native language perception, with frontal brain regions implicated in different aspects of acoustic-phonetic processing (Lee et al., 2012, Myers, 2007, Rogers & Davis, 2017, Xie and Myers, 2018). In particular, the left inferior frontal gyrus (LIFG) is sensitive to the proximity between an acoustic token and a phonetic category boundary (Myers, 2007, Myers et al., 2009) and responds to phonetic ambiguity in naturally-produced, continuous speech (Xie & Myers, 2018). While there are likely differentiable roles for frontal structures in the perception of speech, in general inferior frontal regions show evidence of abstraction away from low-level acoustic details in order to access category-level information about speech tokens (Chevillet et al., 2013, Lee et al., 2012, Myers et al., 2009).

Further evidence for a role of frontal brain regions in speech perception comes from studies examining the acquisition of non-native phoneme categories. Non-native speech distinctions, especially those that are perceptually similar to existing native language categories, are very difficult to acquire in adulthood (Best & Tyler, 2007), with most adults falling short of native-like perceptual performance, even with targeted training (Golestani and Zatorre, 2009, Pruitt, Strange, Polka, & Aguilar, 1990, Strange and Dittmann, 1984). The extant research suggests that acquisition of new speech categories invokes processes in left frontal areas, among other neural systems. For instance, Golestani and Zatorre (2004) showed that newly-learned non-native stimuli activated the bilateral IFG (pars opercularis) and LSTG relative to a noise baseline, and Myers and Swan (2012) showed that an area of the left middle frontal gyrus (MFG) immediately adjacent to Broca’s area was sensitive to newly-acquired non-native category structure. One interpretation of these patterns is that non-native tokens activate emerging perceptual category information stored in the frontal lobe.

While several studies have shown frontal recruitment for non-native learning, evidence points to increased reliance on temporoparietal structures as listeners become more proficient (see Myers, 2014 for review). For instance, individual success in learning has been associated with reduced activation of LIFG (Golestani and Zatorre, 2004, Myers and Swan, 2012) and increased recruitment of temporoparietal regions such as the bilateral angular gyri (AG) (Golestani & Zatorre, 2004). These findings can be taken as evidence that listeners may initially recruit frontal regions to process non-native sounds but that as listeners develop better-elaborated representations of the novel phonetic categories, processing of these sounds may increasingly recruit temporal regions associated with sensory perception. Under such a view, the early reliance on frontal regions may reflect access to articulatory codes or abstract category-level representations that can be used to guide perception, or else may reflect high demands on phonological working memory (Callan et al., 2004, Golestani and Zatorre, 2004, Myers, 2014).

The interpretation of the role of frontal areas for native as well as non-native speech perception is complicated because many studies examining phonetic learning have used explicit tasks during scanning, such as phoneme categorization (Callan et al., 2004, Golestani and Zatorre, 2004). What is not clear is whether category-relevant neural activation is driven by the metalinguistic demands of the task or by speech perception per se. Indeed, Hickok and Poeppel, 2000, Hickok and Poeppel, 2004 have argued that the involvement of frontal brain structures in perceiving acoustic-phonetic detail is limited to situations in which participants must explicitly attend to sub-lexical details of the stimulus, as is required in phoneme identification tasks.

Nonetheless, frontal recruitment for phonetic learning has been observed in the absence of an explicit task. In a study by Myers and Swan (2012), participants were exposed to a dental-retroflex-velar continuum (i.e., d̪a-ɖa-ga) and trained to categorize stimuli into two categories. Half of the participants learned that the category boundary was between the dental and retroflex tokens, and for the other half of the participants, the category boundary was between the retroflex and velar tokens. A short-interval habituation design (Zevin & McCandliss, 2005) was used during scanning: On every trial, participants heard a train of identical stimuli followed by a distinct stimulus, which either came from the same phonetic category as the preceding stimuli or came from the other category. Notably, participants were not asked to identify the category for the tokens they heard and instead only responded to occasional high-pitched catch trials. The bilateral MFG showed sensitivity to the learned category structure, suggesting a role for frontal regions in perceiving non-native phonemic distinctions even in the absence of an explicit identification task. However, it is important to note that the Myers and Swan (2012) study did use an explicit categorization task during training, so it is possible that participants were categorizing stimuli during the fMRI scan, despite not being required to do so.

Indeed, the vast majority of studies examining the perception of non-native phonemes have used training tasks in which participants are explicitly taught a category label that corresponds to each stimulus. This explicit information about category identity may reinforce the early, frontally-mediated stages of non-native phonetic learning (Myers, 2014). That is, the frontal activation associated with non-native phonetic learning may specifically reflect a mapping between stimuli and category labels, rather than reflecting (bottom-up) sensitivity to the underlying acoustic-phonetic category structure. As such, a more stringent test of a role for frontal regions in non-native phonetic learning would require the use of implicit paradigms during both the training and fMRI portions of the study, such that participants do not have labels for the categories being learned and therefore cannot categorize the stimuli, even implicitly.

In recent years, researchers have increasingly utilized implicit paradigms to train participants on novel categories. For instance, Leech, Holt, Devlin, and Dick (2009) examined the neural underpinnings of implicit auditory learning using complex non-speech stimuli. Over the course of several training sessions, participants played a video game where auditory cues were diagnostic of whether an upcoming visual exemplar was a member of one category (aliens to be captured) or another (aliens to be shot). Pre- and post-training fMRI sessions utilized an implicit oddball detection task, meaning that neither behavioral training nor the scanner task entailed explicit categorization. Results showed that better auditory learning was associated with increased reliance on STS post-training. More recently, Lim, Fiez, and Holt (2019) measured BOLD activity while participants played this incidental learning video game in the MRI scanner. The authors manipulated whether the non-speech auditory exemplars were organized into linearly separable categories (structured categories) or not (unstructured categories). Critically, the time course of activation in the basal ganglia – and more specifically, in the striatum – differed between structured and unstructured categories, consistent with a proposed role for the striatum in acquiring new behaviorally-relevant sound categories (Lim et al., 2014, Yi et al., 2016). While the authors focused their discussion on the striatum, this same pattern was also observed in a number of additional regions including the bilateral IFG. Further, striatal activity was positively correlated with changes in behavior and functionally connected to superior temporal sulcus. Taken together, such results suggest the involvement of a coordinated network of frontal, striatal, and temporal areas in auditory category learning, at least for non-speech sounds.

In general, incidental or implicit learning paradigms can yield successful non-native learning (Gabay and Holt, 2015, Lim and Holt, 2011), showing that consistent associations between category information and behaviorally relevant stimulus properties can increase sensitivity to novel sound distinctions. Vlahou, Protopapas, and Seitz (2012) used an incidental training paradigm to examine learning of two different sound categories. Native speakers of Greek heard two pairs of speech sounds (four sounds total) on every trial and were asked to identify whether tokens within the first pair or second pair differed in volume. Unbeknownst to subjects, one pair always consisted of two Hindi dental sounds while the other consisted of two Hindi retroflex sounds. Critically, the volume difference emerged only within the retroflex pair (i.e., the correct response always corresponded to the retroflex category). To ensure the task was appropriately challenging, the size of the volume difference within the retroflex pair was set adaptively, such that the task got harder (i.e., the volume difference got smaller) if participants succeeded on easier levels. Following training, subjects’ discrimination and identification abilities were tested explicitly. Vlahou and colleagues found that participants who completed the incidental learning task performed as well as or better than a group who received explicit training on the speech sounds, and both groups performed better than a group of naïve listeners. Thus, even though the incidental learning task itself did not require learning of the non-native phonemic contrast, the consistent temporal yoking of category-level information (the phonetic category difference) with a behaviorally relevant dimension (the volume difference) resulted in learning, consistent with other similarly structured studies of incidental learning (Seitz & Watanabe, 2005).

The aim of the current study is to examine the neural systems underlying the learning of a non-native phonetic category distinction using an incidental speech sound learning paradigm, specifically testing whether frontal regions are involved in non-native phonetic category learning in the absence of explicit category labels. In Experiment 1, we leveraged the incidental learning paradigm used by Vlahou et al. (2012) to promote non-native learning of the Hindi dental-retroflex contrast. Functional activation was measured with fMRI both before and after three days of incidental learning, allowing us to examine whether frontal brain regions are recruited for processing phonetic detail when participants are not explicitly aware that they are being exposed to two novel speech sound categories. In Experiment 2, we examined the extent to which behavioral gains over the course of the incidental learning sessions depend on consistent associations between the phonetic category structure and the task-relevant changes in volume.

Section snippets

Experiment 1

In Experiment 1, we collected fMRI data to measure changes in brain activity that occur after three days of an incidental learning task designed to induce sensitivity to a non-native phonetic category difference. Crucially, participants were not informed of the categorical structure of the stimuli until after all scanning was completed, at which point their sensitivity to the non-native phonetic category structure was assessed explicitly.

Experiment 2

While Experiment 1 supports a role for frontal brain regions in the development of sensitivity to non-native phonetic category structure, it is unclear how much of this is attributable to learning per se. The incidental learning paradigm used in Experiment 1 was adapted from a study conducted by Vlahou et al. (2012), who demonstrated that subjects who had completed incidental learning sessions were more sensitive to phonetic category structure than a group of naïve participants. Learning

Conclusions

Non-native phonetic category learning offers a model system for auditory category learning in general. Recent attention to the learning systems underlying this process suggests that multiple learning systems can be recruited for novel speech sound learning (Chandrasekaran et al., 2014), and incidental paradigms that allow listeners to discover the nature of the phonetic category without explicit feedback have shown promise, especially insofar as these paradigms may recruit systems that more

Author statement

All the authors declare that we had no conflict of interest with respect to the theoretical questions or results of this research. All experiments were conducted following ethical guidelines put forth by the University of Connecticut’s Institutional Review Board. The authors are all in agreement of the content of this manuscript, which has not been published elsewhere and is not under consideration at any other journal.

Statement of significance

Research on the acquisition of non-native speech sound categories has suggested an important role for frontal areas such as the left inferior frontal gyrus. Here, we investigate whether such frontal recruitment is a consequence of experimental procedures (e.g., task demands to map non-native sounds to explicit category labels).

Acknowledgments

This research was supported by NIH grant R01 DC013064 to EBM and NIH NIDCD Grant R01 DC006220 to SEB. The authors thank F. Sayako Earle for assistance with stimulus development; members of the Language and Brain lab for help with data collection and their feedback throughout the project; Elisa Medeiros for assistance with collection of fMRI data; Paul Taylor for assistance with neuroimaging analyses; and attendees of the 2016 Meeting of the Psychonomic Society and the 2017 Meeting of the

References (56)

  • G. Hickok et al.

    Dorsal and ventral streams: A framework for understanding aspects of the functional anatomy of language

    Cognition

    (2004)
  • E.B. Myers

    Dissociable effects of phonetic competition and category typicality in a phonetic categorization task: An fMRI investigation

    Neuropsychologia

    (2007)
  • Z.S. Saad et al.

    Suma

    Neuroimage

    (2012)
  • A. Seitz et al.

    A unified model for perceptual learning

    Trends in Cognitive Sciences

    (2005)
  • M. Ahissar et al.

    Reverse hierarchies and sensory learning

    Philosophical Transactions of the Royal Society of London B: Biological Sciences

    (2009)
  • D. Bates et al.

    Fitting linear mixed-effects models using lme4

    Journal of Statistical Software

    (2015)
  • C.T. Best et al.

    Nonnative and second-language speech perception: Commonalities and complementarities

    Language Experience in Second Language Speech Learning: In Honor of James Emil Flege

    (2007)
  • J.R. Binder et al.

    Neural correlates of sensory and decision processes in auditory object identification

    Nature Neuroscience

    (2004)
  • Boersma, P., & Weenink, D. (2017). Praat: Doing phonetics by computer (Version 6.0.21). Available from...
  • D.H. Brainard et al.

    The psychophysics toolbox

    Spatial Vision

    (1997)
  • B. Chandrasekaran et al.

    Dual-learning systems during speech category learning

    Psychonomic Bulletin & Review

    (2014)
  • M.A. Chevillet et al.

    Automatic phoneme category selectivity in the dorsal auditory stream

    Journal of Neuroscience

    (2013)
  • R.W. Cox et al.

    fMRI clustering and false-positive rates

    Proceedings of the National Academy of Sciences

    (2017)
  • R. Desai et al.

    Left posterior temporal regions are sensitive to auditory categorization

    Journal of Cognitive Neuroscience

    (2008)
  • F.S. Earle et al.

    Overnight consolidation promotes generalization across talkers in the identification of non-native speech sounds

    The Journal of the Acoustical Society of America

    (2015)
  • W.B. Edmister et al.

    Improved auditory cortex imaging using clustered volume acquisitions

    Human Brain Mapping

    (1999)
  • A. Eklund et al.

    Cluster failure: Why fMRI inferences for spatial extent have inflated false-positive rates

    Proceedings of the National Academy of Sciences

    (2016)
  • A.L. Francis et al.

    Selective attention and the acquisition of new phonetic categories

    Journal of Experimental Psychology: Human Perception and Performance

    (2002)
  • Cited by (11)

    • Reliability and validity for perceptual flexibility in speech

      2022, Brain and Language
      Citation Excerpt :

      Those sample sizes are almost never achieved in speech perception, where typical sample sizes within an individual population vary, ranging in a representative batch of individual differences tasks from 33 (Janse & Adank, 2012) to about 131 (Kapnoula et al., 2017), with contemporary studies often having around 60 participants (Bent et al., 2016; Golestani & Zatorre, 2009; Rotman et al., 2020). Numbers are smaller for studies of special populations, which average around 30 participants (Janse & Adank, 2012; Kim et al., 2018), and MRI studies, which typically have about 20 (Erb et al., 2012; Luthra et al., 2019). Sample sizes are often limited by practical constraints, such as money (particularly for MRI studies) or access to special populations.

    • Working memory relates to individual differences in speech category learning: Insights from computational modeling and pupillometry

      2021, Brain and Language
      Citation Excerpt :

      However, the extent and quality of feedback required is unclear. While the presence of some amount of feedback confers benefits to adult learners (Chandrasekaran et al., 2015; Chandrasekaran, Yi, et al., 2014; Lim & Holt, 2011; McClelland et al., 2002; Tricomi et al., 2006; Yi et al., 2016), adults can acquire speech categories incidentally (Gabay et al., 2015; Lim et al., 2019; Luthra et al., 2019; Roark & Holt, 2018) or with task-irrelevant feedback (Goudbeek et al., 2008; McClelland et al., 2002). However, explicit feedback has been shown to support greater perceptual learning, guiding mappings between acoustic patterns and phonetic categories (Lehet et al., 2020).

    • Structural neural correlates of individual differences in categorical perception

      2021, Brain and Language
      Citation Excerpt :

      Regions of interest were selected from the Destrieux atlas in Freesurfer (Destrieux, Fischl, Dale, & Halgren, 2010). We identified the following bilateral regions of interest for our analyses based on the studies reviewed above: the pars opercularis region of the inferior frontal gyrus (Lee et al., 2012; Myers, 2007; Myers et al., 2009), the superior temporal gyrus (Myers, 2007), the transverse temporal gyrus and the planum temporale (Golestani, Molko, Dehaene, LeBihan, & Pallier, 2007; Golestani et al., 2011; Schremm et al., 2018; Turker et al., 2017; Wong et al., 2008), and the middle frontal gyrus (Luthra et al., 2019; Myers & Mesite, 2014; Myers & Swan, 2012). The FreeSurfer labels for these regions are included in Table 1 and can be seen in Fig. 2.

    • Relationships Between Native and Non-Native Speech Perception

      2023, Journal of Experimental Psychology: Learning Memory and Cognition
    • Implicit and explicit learning in talker identification

      2022, Attention, Perception, and Psychophysics
    • Impaired perceptual phonetic plasticity in Parkinson's disease

      2022, Journal of the Acoustical Society of America
    View all citing articles on Scopus
    View full text