Brain-behavior relationships in incidental learning of non-native phonetic categories
Introduction
Speech sounds have a complex internal structure, and in general, processing the fine-grained detail of these sounds relies on temporal brain regions such as the left superior temporal gyrus (LSTG; Desai et al., 2008, Liebenthal et al., 2005, Mesgarani et al., 2014, Myers, 2007). These temporal areas show tuning that is specific and structured according to the acoustic details of one’s native language phonetic categories. However, a number of studies suggest that the perception of phonetic detail, even if largely supported by superior temporal cortex, is not entirely divorced from frontal brain regions. Individuals with Broca’s aphasia, for instance, have shown subtle deficits in phoneme discrimination, though they make fewer errors than individuals with posterior brain damage (Blumstein, Baker, & Goodglass, 1977). This notion has also been supported by functional neuroimaging studies of native language perception, with frontal brain regions implicated in different aspects of acoustic-phonetic processing (Lee et al., 2012, Myers, 2007, Rogers & Davis, 2017, Xie and Myers, 2018). In particular, the left inferior frontal gyrus (LIFG) is sensitive to the proximity between an acoustic token and a phonetic category boundary (Myers, 2007, Myers et al., 2009) and responds to phonetic ambiguity in naturally-produced, continuous speech (Xie & Myers, 2018). While there are likely differentiable roles for frontal structures in the perception of speech, in general inferior frontal regions show evidence of abstraction away from low-level acoustic details in order to access category-level information about speech tokens (Chevillet et al., 2013, Lee et al., 2012, Myers et al., 2009).
Further evidence for a role of frontal brain regions in speech perception comes from studies examining the acquisition of non-native phoneme categories. Non-native speech distinctions, especially those that are perceptually similar to existing native language categories, are very difficult to acquire in adulthood (Best & Tyler, 2007), with most adults falling short of native-like perceptual performance, even with targeted training (Golestani and Zatorre, 2009, Pruitt, Strange, Polka, & Aguilar, 1990, Strange and Dittmann, 1984). The extant research suggests that acquisition of new speech categories invokes processes in left frontal areas, among other neural systems. For instance, Golestani and Zatorre (2004) showed that newly-learned non-native stimuli activated the bilateral IFG (pars opercularis) and LSTG relative to a noise baseline, and Myers and Swan (2012) showed that an area of the left middle frontal gyrus (MFG) immediately adjacent to Broca’s area was sensitive to newly-acquired non-native category structure. One interpretation of these patterns is that non-native tokens activate emerging perceptual category information stored in the frontal lobe.
While several studies have shown frontal recruitment for non-native learning, evidence points to increased reliance on temporoparietal structures as listeners become more proficient (see Myers, 2014 for review). For instance, individual success in learning has been associated with reduced activation of LIFG (Golestani and Zatorre, 2004, Myers and Swan, 2012) and increased recruitment of temporoparietal regions such as the bilateral angular gyri (AG) (Golestani & Zatorre, 2004). These findings can be taken as evidence that listeners may initially recruit frontal regions to process non-native sounds but that as listeners develop better-elaborated representations of the novel phonetic categories, processing of these sounds may increasingly recruit temporal regions associated with sensory perception. Under such a view, the early reliance on frontal regions may reflect access to articulatory codes or abstract category-level representations that can be used to guide perception, or else may reflect high demands on phonological working memory (Callan et al., 2004, Golestani and Zatorre, 2004, Myers, 2014).
The interpretation of the role of frontal areas for native as well as non-native speech perception is complicated because many studies examining phonetic learning have used explicit tasks during scanning, such as phoneme categorization (Callan et al., 2004, Golestani and Zatorre, 2004). What is not clear is whether category-relevant neural activation is driven by the metalinguistic demands of the task or by speech perception per se. Indeed, Hickok and Poeppel, 2000, Hickok and Poeppel, 2004 have argued that the involvement of frontal brain structures in perceiving acoustic-phonetic detail is limited to situations in which participants must explicitly attend to sub-lexical details of the stimulus, as is required in phoneme identification tasks.
Nonetheless, frontal recruitment for phonetic learning has been observed in the absence of an explicit task. In a study by Myers and Swan (2012), participants were exposed to a dental-retroflex-velar continuum (i.e., d̪a-ɖa-ga) and trained to categorize stimuli into two categories. Half of the participants learned that the category boundary was between the dental and retroflex tokens, and for the other half of the participants, the category boundary was between the retroflex and velar tokens. A short-interval habituation design (Zevin & McCandliss, 2005) was used during scanning: On every trial, participants heard a train of identical stimuli followed by a distinct stimulus, which either came from the same phonetic category as the preceding stimuli or came from the other category. Notably, participants were not asked to identify the category for the tokens they heard and instead only responded to occasional high-pitched catch trials. The bilateral MFG showed sensitivity to the learned category structure, suggesting a role for frontal regions in perceiving non-native phonemic distinctions even in the absence of an explicit identification task. However, it is important to note that the Myers and Swan (2012) study did use an explicit categorization task during training, so it is possible that participants were categorizing stimuli during the fMRI scan, despite not being required to do so.
Indeed, the vast majority of studies examining the perception of non-native phonemes have used training tasks in which participants are explicitly taught a category label that corresponds to each stimulus. This explicit information about category identity may reinforce the early, frontally-mediated stages of non-native phonetic learning (Myers, 2014). That is, the frontal activation associated with non-native phonetic learning may specifically reflect a mapping between stimuli and category labels, rather than reflecting (bottom-up) sensitivity to the underlying acoustic-phonetic category structure. As such, a more stringent test of a role for frontal regions in non-native phonetic learning would require the use of implicit paradigms during both the training and fMRI portions of the study, such that participants do not have labels for the categories being learned and therefore cannot categorize the stimuli, even implicitly.
In recent years, researchers have increasingly utilized implicit paradigms to train participants on novel categories. For instance, Leech, Holt, Devlin, and Dick (2009) examined the neural underpinnings of implicit auditory learning using complex non-speech stimuli. Over the course of several training sessions, participants played a video game where auditory cues were diagnostic of whether an upcoming visual exemplar was a member of one category (aliens to be captured) or another (aliens to be shot). Pre- and post-training fMRI sessions utilized an implicit oddball detection task, meaning that neither behavioral training nor the scanner task entailed explicit categorization. Results showed that better auditory learning was associated with increased reliance on STS post-training. More recently, Lim, Fiez, and Holt (2019) measured BOLD activity while participants played this incidental learning video game in the MRI scanner. The authors manipulated whether the non-speech auditory exemplars were organized into linearly separable categories (structured categories) or not (unstructured categories). Critically, the time course of activation in the basal ganglia – and more specifically, in the striatum – differed between structured and unstructured categories, consistent with a proposed role for the striatum in acquiring new behaviorally-relevant sound categories (Lim et al., 2014, Yi et al., 2016). While the authors focused their discussion on the striatum, this same pattern was also observed in a number of additional regions including the bilateral IFG. Further, striatal activity was positively correlated with changes in behavior and functionally connected to superior temporal sulcus. Taken together, such results suggest the involvement of a coordinated network of frontal, striatal, and temporal areas in auditory category learning, at least for non-speech sounds.
In general, incidental or implicit learning paradigms can yield successful non-native learning (Gabay and Holt, 2015, Lim and Holt, 2011), showing that consistent associations between category information and behaviorally relevant stimulus properties can increase sensitivity to novel sound distinctions. Vlahou, Protopapas, and Seitz (2012) used an incidental training paradigm to examine learning of two different sound categories. Native speakers of Greek heard two pairs of speech sounds (four sounds total) on every trial and were asked to identify whether tokens within the first pair or second pair differed in volume. Unbeknownst to subjects, one pair always consisted of two Hindi dental sounds while the other consisted of two Hindi retroflex sounds. Critically, the volume difference emerged only within the retroflex pair (i.e., the correct response always corresponded to the retroflex category). To ensure the task was appropriately challenging, the size of the volume difference within the retroflex pair was set adaptively, such that the task got harder (i.e., the volume difference got smaller) if participants succeeded on easier levels. Following training, subjects’ discrimination and identification abilities were tested explicitly. Vlahou and colleagues found that participants who completed the incidental learning task performed as well as or better than a group who received explicit training on the speech sounds, and both groups performed better than a group of naïve listeners. Thus, even though the incidental learning task itself did not require learning of the non-native phonemic contrast, the consistent temporal yoking of category-level information (the phonetic category difference) with a behaviorally relevant dimension (the volume difference) resulted in learning, consistent with other similarly structured studies of incidental learning (Seitz & Watanabe, 2005).
The aim of the current study is to examine the neural systems underlying the learning of a non-native phonetic category distinction using an incidental speech sound learning paradigm, specifically testing whether frontal regions are involved in non-native phonetic category learning in the absence of explicit category labels. In Experiment 1, we leveraged the incidental learning paradigm used by Vlahou et al. (2012) to promote non-native learning of the Hindi dental-retroflex contrast. Functional activation was measured with fMRI both before and after three days of incidental learning, allowing us to examine whether frontal brain regions are recruited for processing phonetic detail when participants are not explicitly aware that they are being exposed to two novel speech sound categories. In Experiment 2, we examined the extent to which behavioral gains over the course of the incidental learning sessions depend on consistent associations between the phonetic category structure and the task-relevant changes in volume.
Section snippets
Experiment 1
In Experiment 1, we collected fMRI data to measure changes in brain activity that occur after three days of an incidental learning task designed to induce sensitivity to a non-native phonetic category difference. Crucially, participants were not informed of the categorical structure of the stimuli until after all scanning was completed, at which point their sensitivity to the non-native phonetic category structure was assessed explicitly.
Experiment 2
While Experiment 1 supports a role for frontal brain regions in the development of sensitivity to non-native phonetic category structure, it is unclear how much of this is attributable to learning per se. The incidental learning paradigm used in Experiment 1 was adapted from a study conducted by Vlahou et al. (2012), who demonstrated that subjects who had completed incidental learning sessions were more sensitive to phonetic category structure than a group of naïve participants. Learning
Conclusions
Non-native phonetic category learning offers a model system for auditory category learning in general. Recent attention to the learning systems underlying this process suggests that multiple learning systems can be recruited for novel speech sound learning (Chandrasekaran et al., 2014), and incidental paradigms that allow listeners to discover the nature of the phonetic category without explicit feedback have shown promise, especially insofar as these paradigms may recruit systems that more
Author statement
All the authors declare that we had no conflict of interest with respect to the theoretical questions or results of this research. All experiments were conducted following ethical guidelines put forth by the University of Connecticut’s Institutional Review Board. The authors are all in agreement of the content of this manuscript, which has not been published elsewhere and is not under consideration at any other journal.
Statement of significance
Research on the acquisition of non-native speech sound categories has suggested an important role for frontal areas such as the left inferior frontal gyrus. Here, we investigate whether such frontal recruitment is a consequence of experimental procedures (e.g., task demands to map non-native sounds to explicit category labels).
Acknowledgments
This research was supported by NIH grant R01 DC013064 to EBM and NIH NIDCD Grant R01 DC006220 to SEB. The authors thank F. Sayako Earle for assistance with stimulus development; members of the Language and Brain lab for help with data collection and their feedback throughout the project; Elisa Medeiros for assistance with collection of fMRI data; Paul Taylor for assistance with neuroimaging analyses; and attendees of the 2016 Meeting of the Psychonomic Society and the 2017 Meeting of the
References (56)
- et al.
Mixed-effects modeling with crossed random effects for subjects and items
Journal of Memory and Language
(2008) - et al.
Phonological factors in auditory comprehension in aphasia
Neuropsychologia
(1977) - et al.
Phonetic perceptual identification by native-and second-language speakers differentially activates brain regions involved with acoustic phonetic processing and those involved with articulatory–auditory/orosensory internal models
NeuroImage
(2004) - et al.
Applications of multivariate modeling to neuroimaging group analysis: A comprehensive alternative to univariate general linear model
NeuroImage
(2014) AFNI: Software for analysis and visualization of functional magnetic resonance neuroimages
Computers and Biomedical Research
(1996)Freesurfer
NeuroImage
(2012)- et al.
Incidental learning of sound categories is impaired in developmental dyslexia
Cortex
(2015) - et al.
Learning new sounds of speech: Reallocation of neural substrates
Neuroimage
(2004) - et al.
Individual differences in the acquisition of second language phonology
Brain and Language
(2009) - et al.
Towards a functional neuroanatomy of speech perception
Trends in Cognitive Sciences
(2000)
Dorsal and ventral streams: A framework for understanding aspects of the functional anatomy of language
Cognition
Dissociable effects of phonetic competition and category typicality in a phonetic categorization task: An fMRI investigation
Neuropsychologia
Suma
Neuroimage
A unified model for perceptual learning
Trends in Cognitive Sciences
Reverse hierarchies and sensory learning
Philosophical Transactions of the Royal Society of London B: Biological Sciences
Fitting linear mixed-effects models using lme4
Journal of Statistical Software
Nonnative and second-language speech perception: Commonalities and complementarities
Language Experience in Second Language Speech Learning: In Honor of James Emil Flege
Neural correlates of sensory and decision processes in auditory object identification
Nature Neuroscience
The psychophysics toolbox
Spatial Vision
Dual-learning systems during speech category learning
Psychonomic Bulletin & Review
Automatic phoneme category selectivity in the dorsal auditory stream
Journal of Neuroscience
fMRI clustering and false-positive rates
Proceedings of the National Academy of Sciences
Left posterior temporal regions are sensitive to auditory categorization
Journal of Cognitive Neuroscience
Overnight consolidation promotes generalization across talkers in the identification of non-native speech sounds
The Journal of the Acoustical Society of America
Improved auditory cortex imaging using clustered volume acquisitions
Human Brain Mapping
Cluster failure: Why fMRI inferences for spatial extent have inflated false-positive rates
Proceedings of the National Academy of Sciences
Selective attention and the acquisition of new phonetic categories
Journal of Experimental Psychology: Human Perception and Performance
Cited by (11)
Reliability and validity for perceptual flexibility in speech
2022, Brain and LanguageCitation Excerpt :Those sample sizes are almost never achieved in speech perception, where typical sample sizes within an individual population vary, ranging in a representative batch of individual differences tasks from 33 (Janse & Adank, 2012) to about 131 (Kapnoula et al., 2017), with contemporary studies often having around 60 participants (Bent et al., 2016; Golestani & Zatorre, 2009; Rotman et al., 2020). Numbers are smaller for studies of special populations, which average around 30 participants (Janse & Adank, 2012; Kim et al., 2018), and MRI studies, which typically have about 20 (Erb et al., 2012; Luthra et al., 2019). Sample sizes are often limited by practical constraints, such as money (particularly for MRI studies) or access to special populations.
Working memory relates to individual differences in speech category learning: Insights from computational modeling and pupillometry
2021, Brain and LanguageCitation Excerpt :However, the extent and quality of feedback required is unclear. While the presence of some amount of feedback confers benefits to adult learners (Chandrasekaran et al., 2015; Chandrasekaran, Yi, et al., 2014; Lim & Holt, 2011; McClelland et al., 2002; Tricomi et al., 2006; Yi et al., 2016), adults can acquire speech categories incidentally (Gabay et al., 2015; Lim et al., 2019; Luthra et al., 2019; Roark & Holt, 2018) or with task-irrelevant feedback (Goudbeek et al., 2008; McClelland et al., 2002). However, explicit feedback has been shown to support greater perceptual learning, guiding mappings between acoustic patterns and phonetic categories (Lehet et al., 2020).
Structural neural correlates of individual differences in categorical perception
2021, Brain and LanguageCitation Excerpt :Regions of interest were selected from the Destrieux atlas in Freesurfer (Destrieux, Fischl, Dale, & Halgren, 2010). We identified the following bilateral regions of interest for our analyses based on the studies reviewed above: the pars opercularis region of the inferior frontal gyrus (Lee et al., 2012; Myers, 2007; Myers et al., 2009), the superior temporal gyrus (Myers, 2007), the transverse temporal gyrus and the planum temporale (Golestani, Molko, Dehaene, LeBihan, & Pallier, 2007; Golestani et al., 2011; Schremm et al., 2018; Turker et al., 2017; Wong et al., 2008), and the middle frontal gyrus (Luthra et al., 2019; Myers & Mesite, 2014; Myers & Swan, 2012). The FreeSurfer labels for these regions are included in Table 1 and can be seen in Fig. 2.
Relationships Between Native and Non-Native Speech Perception
2023, Journal of Experimental Psychology: Learning Memory and CognitionImplicit and explicit learning in talker identification
2022, Attention, Perception, and PsychophysicsImpaired perceptual phonetic plasticity in Parkinson's disease
2022, Journal of the Acoustical Society of America