How top-down processing enhances comprehension of noise-vocoded speech: Predictions about meaning are more important than predictions about form

https://doi.org/10.1016/j.jml.2020.104114Get rights and content

Highlights

  • Listeners use higher-level information to learn to understand distorted speech.

  • We investigate how top-down processing aids comprehension of noise-vocoded speech.

  • Predictions of higher-level meaning facilitated comprehension of distorted speech.

  • But form predictions played a minimal role in comprehension.

  • Results suggest top-down processing influences the semantic levels during learning.

Abstract

Listeners quickly learn to understand speech that has been distorted, and this process is enhanced when comprehension is constrained by higher-level knowledge. In three experiments, we investigated whether this knowledge enhances comprehension of distorted speech because it allows listeners to predict (1) the meaning of the distorted utterance, or (2) the lower-level wordforms. Participants listened to question-answer sequences, in which questions were clearly-spoken but answers were noise-vocoded. Comprehension (Experiment 1) and learning (Experiment 2) were enhanced when listeners could use the question to predict the semantics of the distorted answer, but were not enhanced by predictions of answer form. Form predictions enhanced comprehension only when questions and answers were significantly separated by time and intervening linguistic material (Experiment 3). Together, these results suggest that high-level semantic predictions enhance comprehension and learning, with form predictions playing only a minimal role.

Introduction

Speech perception is robust and resilient, such that we are able to comprehend utterances across a variety of noisy situations and adverse circumstances. For example, listeners can comprehend speech produced by different talkers at different rates (e.g., Miller & Liberman, 1979) and with different accents (e.g., Clarke & Garrett, 2004). Even when faced with novel acoustic distortions that might make the speech unintelligible at first, listeners can quickly adapt, such that repeated exposure to the distorted speech leads to rapid improvement in comprehension (e.g., Dupoux & Green, 1997). This adaptation is a form of perceptual learning – “relatively long-lasting changes to an organism’s perceptual system that improve its ability to respond to its environment and are caused by its environment” (e.g., Goldstone, 1998, p. 586).

An ongoing debate for theories of how we understand spoken language has concerned the interaction between higher-level knowledge and lower-level input, and whether listeners immediately use what they know to interpret what they hear (e.g., Magnuson et al., 2018, McClelland and Elman, 1986, Norris et al., 2000). But for the case of how we learn to perceive speech, it is generally accepted that higher-level knowledge plays an important role in reorganizing lower-level processing. For example, even for theories that minimize the role of interactivity in speech perception, it is still assumed that high-level knowledge (e.g., of words) influences how people learn to process speech, such as for setting categorical phonetic boundaries and for learning to understand ambiguous fricatives (e.g., McQueen et al., 2006, Norris et al., 2000).

There is ample evidence that as listeners process language, they use their high-level knowledge to predict upcoming linguistic information, from the topic of discourse under discussion to the forms of particular words (see Pickering & Gambi, 2018, for a review). A number of theories, such as predictive coding accounts (e.g., Arnal and Giraud, 2012, Sohoglu and Davis, 2016), assume that the process of perceptual learning is facilitated by prediction. But what types of prediction facilitate learning? In this paper, we address this question by investigating whether high-level knowledge facilitates learning because listeners can predict the semantic content of the distorted words they will hear, or because they can predict the detailed form of these distorted words.

We distinguish between these two possibilities in three experiments that test how people learn to understand noise-vocoded speech, which is an acoustic distortion that smooths large portions of the speech signal’s spectral information while preserving temporal cues (Davis et al., 2005, Shannon et al., 1995). With the appropriate degree of distortion, vocoded speech can be made approximately 50% intelligible for naïve listeners (Shannon, Fu, & Galvin, 2004), but importantly, the ability to comprehend vocoded speech improves quite quickly with exposure (e.g., Davis et al., 2005), thus making it ideal for investigating the phenomenon of perceptual learning.

In the rest of the Introduction, we review research investigating how high-level knowledge (particularly prediction) aids perceptual learning of noise-vocoded speech. We then describe the current study and formulate our predictions in more detail.

A number of studies have shown that noise-vocoded speech is easier to understand and perceived to be clearer if listeners have high-level knowledge of what they are going to hear. For example, Giraud et al. (2004) found that participants could more easily recognize the words in noise-vocoded sentences when they were first been presented with a clear version of the spoken sentence. Similar results were reported by Sohoglu, Peelle, Carlyon, and Davis (2012; see also Wild, Davis, & Johnsrude, 2012): Participants gave higher clarity ratings to noise-vocoded words when they were presented with matching text prior to hearing the vocoded stimulus. Together, these results suggest that the top-down use of lexical and sentential information in memory can facilitate subsequent processing of that same information when distorted, a phenomenon known as perceptual pop-out.

Importantly, this form of top-down processing not only influences perceptual pop-out and listeners’ ability to understand particular tokens of noise-vocoded speech, but also affects perceptual learning and the ability to understand novel vocoded sentences. On each trial of a study by Davis et al. (2005), participants first listened to a noise-vocoded sentence and then transcribed what they had heard. After transcribing this distorted sentence, participants then heard (Experiment 2) or read (Experiment 3) a clear version of the sentence followed by the same distorted version a second time (distorted(D)-clear(C)-distorted(D) condition), or they instead heard the distorted sentence twice before hearing the clear version (DDC condition). The authors found that listeners who knew the identity of the distorted sentence prior to its second presentation (DCD condition) could report more words during the first presentation of subsequent vocoded sentences than participants who heard both versions of the distorted sentence before the clear version (DDC condition). In other words, listeners showed more rapid perceptual learning when they knew the identity of the distorted sentence (and could use information from the clear sentence, in a top-down fashion, to process the distorted sentence) prior to its second presentation.

This learning effect did not occur when participants were trained with non-word sentences (Davis et al., 2005; Experiment 4). However, subsequent work by Hervais-Adelman, Davis, Johnsrude, and Carlyon (2008) found that participants trained with single non-words using a DCD procedure did show comparable perceptual learning to participants trained with words. This discrepancy may be explained by differences in the memorability of the stimuli in the two studies. Specifically, learning may not have occurred for non-word sentences because participants had difficulty maintaining a string of clear non-words in capacity limited phonological memory (cf. Gathercole, Willis, Baddeley, & Emslie, 1994) and so they could not make comparisons between the clear and distorted versions of the stimulus. When participants were trained with single non-words, however, the phonological representation of the clear form was likely still active in memory when the subsequent distorted version was presented. Thus, top-down facilitation of perceptual learning can occur for non-words if listeners can easily make comparisons between the clear and distorted stimuli.

One open question, however, concerns what mechanisms and information support the top-down facilitation of learning. The most prominent account for how high-level knowledge facilitates learning relies on the prediction of upcoming information, and predictive coding in particular (e.g., Arnal and Giraud, 2012, Sohoglu and Davis, 2016). Predictive coding theories postulate that listeners use their prior knowledge (e.g., from context) to make highly specified moment-to-moment predictions about upcoming events. These predictive coding theories are consistent with a number of other recent theories of language processing, which assume that listeners rapidly use linguistic context to generate predictions about the words that they will hear next (e.g., Christiansen & Chater, 2016). These predictions are immediately compared with incoming linguistic information, and the difference between the two (the prediction error) is carried forward to adjust future processing. For example, listeners presented with a clear version of the stimulus prior to distortion (i.e., in the DCD training condition in Davis et al.’s study) could use this clear representation to precisely predict the form of the distorted input. Any difference between the predicted and actual form yields an error signal, which is used to adjust future predictions, so that they more closely match the incoming speech input.

The best current evidence for the predictive coding account comes from cognitive neuroscience work. For example, Blank and Davis (2016; see also Sohoglu and Davis, 2016, Wild et al., 2012) investigated how manipulations of bottom-up and top-down processing influence the neural responses to distorted speech using fMRI. Participants listened to words (e.g., sing) vocoded using either four or twelve bands. These words were preceded by matching written text (e.g., “sing”), partially mismatching written text (e.g., “sit”), or totally mismatching written text (e.g., “doom”). The authors found that presenting matching text and increasing the sensory detail in the auditory stimuli both improved word report scores and reduced BOLD signals in the lateral temporal lobe, a region of the brain that is associated with hearing and comprehending speech (e.g., Davis & Johnsrude, 2003). But multivariate analyses suggested that sensory detail also interacted with the degree of match in the text. In particular, when prior knowledge was uninformative (i.e., mismatching or neutral text), then increases in sensory detail led to an increase in the amount of syllabic information represented in the lateral temporal lobe (quantified using representational similarity analysis; Kriegeskorte, Mur, & Bandettini, 2008), but when prior knowledge was informative (i.e., matching text), then increases in sensory detail reduced the amount of syllabic information represented in the same area. These results are consistent with a predictive coding account, in which deviations from the predicted input are represented as prediction error. When sensory input mismatches with predictions (i.e., in the mismatching conditions), then prediction error is increased, and so more information about the bottom-up signal is represented. But when sensory input matches with predictions, then the bottom-up input can be explained away, and that information can be discarded.

The studies described so far suggest that top-down processing facilitates perceptual pop-out and perceptual learning by providing participants with the opportunity to generate extremely precise moment-to-moment predictions about the form of what they will hear. But prediction in these studies has been operationalized using stimulus repetition, such that participants always listened to or saw a clear version of the stimulus before hearing the distorted version, which leaves the characteristics of these predictions somewhat unclear. In particular, it is unclear precisely what information needs to be predicted for enhanced processing and learning to occur. In studies using repetition, a (perhaps implicit) assumption is that by repeating a stimulus, participants should be able to make precise predictions about the form of the distorted words that they will hear, and it is these predictions about form that then facilitate learning by minimizing prediction error.

But a distorted stimulus that is identical to a previously presented clear version is also identical in semantic content, and so top-down effects on perceptual pop-out and learning could also be driven solely by predictions concerning high-level semantic input. For example, if listeners hear or see the clear word dog then they could activate the semantic units associated with dog (e.g., four legs, barks). These semantic predictions need not directly inform perceptual states, like form predictions do in predictive coding accounts, but could instead constrain the processing of ambiguous input and support subsequent learning through feedback connections from semantics to the lexicon (as in a TRACE account of speech perception; e.g., McClelland & Elman, 1986). In fact, when linguistic prediction has been studied outside of the context of stimulus repetition, there has been some controversy about the degree to which listeners and readers tend to generate predictions about the forms of upcoming words (see Pickering & Gambi, 2018, for a review).

Consistent with this argument, some studies show that the intelligibility of noise-vocoded speech is affected by semantic coherence (in the absence of repetition), suggesting that precise sensory predictions are not necessary for perceptual pop-out and learning. For example, Signoret, Johnsrude, Classon, and Rudner (2018; see also Davis, Ford, Kherif, & Johnsrude, 2011) found that clarity ratings were higher for noise-vocoded sentences that were semantically coherent, and thus constrained the number of potential continuations (e.g., Her daughter was too young for the disco), than those that were semantically incoherent, and did not provide any information about the content of the speaker’s forthcoming words (e.g., Her hockey was too tight to walk on cotton). These findings suggest that participants used the semantic content of the previous words to predict the semantics of forthcoming words, which made them easier to understand in their distorted form. Consistent with previous research showing perceptual pop-out, clarity ratings were also higher when these sentences were preceded by matching rather than mismatching written text. Based on these results, Signoret et al. concluded that both semantic and form-based predictions provide independent aid to perceptual clarity.

However, it is not clear that these two sources of information are truly independent. Participants could use the prior semantic context to generate both content and form-based predictions, for example using the sentence The boy would like to eat… to predict that the speaker will refer to an edible object (a semantic content prediction; Altmann & Kamide, 1999) and thus predicting the phonetic features of cake (a form prediction). Conversely, they could use the matching text to generate both form and content-based predictions, for example predicting the phonetic features of cake, which activates high-level lexical information. Furthermore, Signoret et al. (2018) did not assess perceptual learning, and so it is unclear whether sentence constraint enhances learning in the same way as stimulus repetition (e.g., Davis et al., 2005). Although semantic coherence enhanced perceptual clarity, which means that it was easier for participants to understand the words in distorted sentences for which they had high-level knowledge, it may not make it easier for them to understand novel distorted sentences for which they have no knowledge. In fact, there is also a possibility that apparent perceptual pop-out could partly reflect response bias: When listening to semantically coherent sentences, it may be easier to guess subsequent words, which may make participants more likely to give higher clarity ratings to distorted sentences.

In sum, it is unclear whether high-level knowledge enhances perceptual learning because listeners can use this knowledge to (1) make highly specified predictions about the form of the distorted input, or (2) predict the likely semantic space of possible upcoming words. We tested between these two possibilities in three experiments administered online using Prolific Academic. In these experiments, participants listened to question-answer sequences and were asked to type what they thought the answerer said. Using question-answer sequences allowed us to investigate how top-down processes aid perceptual learning without using stimulus repetition.

In all experiments, questions were clearly spoken while answers were noise-vocoded using six channels, which typically produces around 50% intelligibility (e.g., Shannon et al., 2004). Questions were always semantically constraining, and so listeners could use the question to guide their interpretation of the distorted answer. To test whether perception and learning were enhanced by specific form predictions, we manipulated the form constraint of questions so they were either form constraining and predicted a particular answer form (e.g., What colors are pandas?; see Table 1), or form unconstraining and did not predict a particular answer form (e.g., What colors should I paint the wall?). To test whether listeners used semantic predictions, we also manipulated the semantic consistency of the noise-vocoded answers, so that they were either semantically consistent and made complete sense as a possible answer given the semantic space of the question (e.g., Black and white) or semantically inconsistent and made no sense (e.g., Tom Hanks).

Experiment 1 assessed whether clear questions could enhance participants’ perception of distorted answers, in the same way that stimulus repetition is known to induce perceptual pop-out (e.g., Davis et al., 2005). We refer to the effects we measure as perceptual enhancement, rather than pop-out, in order to account for the fact that the effect is not generated by repetition. Experiment 2 then tested perceptual learning effects, using a manipulation similar to Davis et al.’s DCD condition, to determine whether perceptually enhanced comprehension generalized to novel distorted stimuli. Finally, predictive coding accounts postulate that listeners use in-the-moment predictions to learn, consistent with theories arguing that language processing is “now or never” (Christiansen & Chater, 2016). Experiment 3 investigated the time-course of learning effects to determine whether perceptual enhancement depends on predictions made using the immediate linguistic context.

If perceptual enhancement and perceptual learning effects occur because listeners use high-level knowledge to generate highly specific predictions about form, as would be expected under a prediction error account, then we expect an interaction between question constraint and answer consistency. When the question is form constraining, listeners can predict the precise form of the answer and can use this prediction to guide their interpretation of the distorted speech. These form predictions are more likely to be accurate when the answer is semantically consistent and makes sense as a response, but inaccurate when the answer is semantically inconsistent. As a result, listeners are likely to correctly report more words in the constraining consistent than the constraining inconsistent condition. In the form unconstraining conditions, however, listeners cannot make highly specified form predictions of the likely answer, and so we expect a smaller difference in the accuracy of word report scores for the semantically consistent and inconsistent answer conditions.

But if top-down effects on perceptual learning are driven by semantic predictions, then we expect listeners to be better at reporting words in distorted answers when these answers are semantically consistent rather than inconsistent, regardless of whether questions are form constraining or unconstraining. In other words, we do not expect an interaction between question constraint and answer consistency. Under this account, participants should use the question (e.g., What colors should I paint the wall?) to activate high-level semantic information (e.g., about colors), which should make it easier to integrate the distorted answer when it is semantically consistent (e.g., Black and white) than when it is inconsistent (e.g., if participants hear the answer Tom Hanks). Given that support for this hypothesis rests on a null interaction, we computed Bayes Factors for all predictors.

Section snippets

Experiment 1

In Experiment 1, participants listened to question-answer sequences, in which the question was clearly spoken while the answer was noise-vocoded, and were asked to type what they thought the answerer said. Importantly, we manipulated the form constraint of questions, so they either predicted a particular answer form (e.g., What colors are pandas?) or did not predict an answer form (e.g., What colors should I paint the wall?). These questions were combined with answers that were either

Experiment 2

In Experiment 2, we used a similar design to Davis et al. (2005) Distorted-Clear-Distorted condition (described in the Introduction) to investigate whether learning is enhanced by predictions of form or meaning. Participants first heard a distorted phrase and reported what they heard. They then heard a clear question followed by the same distorted phrase, this time used as an answer to the question (see Fig. 1b). As in Experiment 1, we varied the relationship between questions and answers (see

Experiment 3

Experiments 1 and 2 suggest that high-level knowledge enhances perception and learning because it allows listeners to predict the semantic features associated with the distorted input, which then presumably facilitates integration of novel distorted speech representations into pre-existing higher-level representations. In these experiments, we focused on the effects of high-level knowledge when listeners could predict the distorted input using the immediately surrounding linguistic context

General discussion

Previous research demonstrates that high-level knowledge enhances perception of and learning about distorted speech. For example, listeners are better able to understand, and learn to understand, noise-vocoded sentences if they have previously heard or read a clear version of that sentence (e.g., Davis et al., 2005). In three experiments, we investigated how high-level knowledge enhances perception by presenting participants with question-answer sequences, in which the answer was noise-vocoded

CRediT authorship contribution statement

Ruth E. Corps: Conceptualization, Methodology, Software, Data curation, Formal analysis, Visualization, Writing - original draft. Hugh Rabagliati: Conceptualization, Methodology, Supervision, Writing - review & editing.

Acknowledgements

Ruth Corps was supported by the Economic and Social Research Council [grant number ES/J500136/1]. Hugh Rabagliati was supported by grants from the Economic and Social Research Council [ES/L01064X/1] and the Leverhulme Trust [RPG-2014-253]. We thank Matthew Davis for sharing Matlab scripts used for vocoding.

References (38)

  • M.H. Christiansen et al.

    The now-or-never bottleneck: A fundamental constraint on language

    Behavioral and Brain Sciences

    (2016)
  • C.M. Clarke et al.

    Rapid adaptation to foreign-accented English

    The Journal of the Acoustical Society of America

    (2004)
  • M.H. Davis et al.

    Does semantic context benefit speech understanding through “top–down” processes? Evidence from time-resolved sparse fMRI

    Journal of Cognitive Neuroscience

    (2011)
  • M.H. Davis et al.

    Hierarchical processing in spoken language comprehension

    Journal of Neuroscience

    (2003)
  • M.H. Davis et al.

    Lexical information drives perceptual learning of distorted speech: Evidence from the comprehension of noise-vocoded sentences

    Journal of Experimental Psychology: General

    (2005)
  • L.T. DeCarlo

    Signal detection theory and generalized linear models

    Psychological Methods

    (1998)
  • S. Deerwester et al.

    Indexing by latent semantic analysis

    Journal of the American Society for Information Science

    (1990)
  • J.R. De Leeuw

    jsPsych: A JavaScript library for creating behavioral experiments in a Web browser

    Behavior Research Methods

    (2015)
  • E. Dupoux et al.

    Perceptual adjustment to highly compressed speech: Effects of talker and rate changes

    Journal of Experimental Psychology: Human Perception and Performance

    (1997)
  • Cited by (9)

    • Share the code, not just the data: A case study of the reproducibility of articles published in the Journal of Memory and Language under the open data policy

      2022, Journal of Memory and Language
      Citation Excerpt :

      The list of papers published before the open data policy took effect: Akan, Stanley, and Benjamin (2018), Arnold, Strangmann, Hwang, Zerkle, and Nappa (2018), Chan, Manley, Davis, and Szpunar (2018), Chubala, Surprenant, Neath, and Quinlan (2018), Cunnings and Sturt (2018), de Bruin, Samuel, and Duñabeitia (2018), Deliens, Antoniou, Clin, Ostashchenko, and Kissine (2018), Do and Kaiser (2019), Drummer and Felser (2018), Fisher and Radvansky (2018), Frazier and Clifton (2018), Fritz, Kita, Littlemore, and Krott (2019), Fukumura (2018), Galati, Dale, and Duran (2019), Gathercole, Dunning, Holmes, and Norris (2019), Healey (2018), Hopper and Huber (2018), Hsiao and Nation (2018), Isarida, Isarida, Kubota, Higuma, and Matsuda (2018), James, Fraundorf, Lee, and Watson (2018), Jones and Farrell (2018), Jou, Escamilla, Torres, Ortiz, and Salazar (2018), Karimi, Swaab, and Ferreira (2018), Keung and Staub (2018), Kowialiewski and Majerus (2018), Malejka and Bröder (2019), McKoon and Ratcliff (2018), Miller, Gross, and Unsworth (2019), Miyoshi, Kuwahara, and Kawaguchi (2018), Mohanty and Naveh-Benjamin (2018), Nicenboim and Vasishth (2018), Nooteboom and Quené (2019), Osth, Fox, McKague, Heathcote, and Dennis (2018), Paap, Anders-Jefferson, Mikulinsky, Masuda, and Mason (2019), Sahakyan and Malmberg (2018), Scott and Sera (2018), Seabrooke, Hollins, Kent, Wills, and Mitchell (2019), Seedorff, Oleson, and McMurray (2018), Singh, Gignac, Brydges, and Ecker (2018), Slioussar (2018), Stefanidi, Ellis, and Brewer (2018), Susser and Mulligan (2019), Thalmann, Souza, and Oberauer (2019), Uner and Roediger III (2018), Van Bergen and Bosker (2018), van Heugten, Paquette-Smith, Krieger, and Johnson (2018), van Tiel, Pankratz, and Sun (2019), Vasishth, Mertzen, Jäger, and Gelman (2018), Vaughn and Kendall (2018), Veldre and Andrews (2018a,b), Wang, Otgaar, Howe, Lippe, and Smeets (2018), Wedel, Nelson, and Sharp (2018), Wen and van Heuven (2018), Wilson, Donnelly, Christenfeld, and Wixted (2019), Yim, Osth, Sloutsky, and Dennis (2018), Zawadzka, Simkiss, and Hanczakowski (2018), Zhang and Samuel (2018). The list of papers published after the open data policy took effect: Ahn and Brown-Schmidt (2020), Ambrus et al. (2020), Avetisyan, Lago, and Vasishth (2020), Bangerter, Mayor, and Knutsen (2020), Boyce, Futrell, and Levy (2020), Brainerd, Bialer, and Chang (2020), Brainerd, Nakamura, Chang, and Bialer (2019), Brainerd, Nakamura, and Murtaza (2020), Brandt, Aßfalg, Zaiser, and Bernstein (2020), Brewer, Robey, and Unsworth (2021), Bristol and Rossano (2020), Brothers and Kuperberg (2021), Brysbaert (2019), Bürki, Elbuy, Madec, and Vasishth (2020), Chan, Manley, and Ahn (2020), Chetail (2020), Collins, Milliken, and Jamieson (2020), Corps and Rabagliati (2020), Diéez-Álamo, Glenberg, Diéez, Alonso, and Fernandez (2020), Falandays, Brown-Schmidt, and Toscano (2020), Fellman et al. (2020), Floccia et al. (2020), Fox, Dennis, and Osth (2020), Fujita and Cunnings (2020), Gagné et al. (2020), Garnham, Child, and Hutton (2020), Günther et al. (2020), Günther, Petilli, and Marelli (2020), Hesse and Benz (2020), Hollis (2020), Humphreys, Li, Burt, and Loft (2020), Hwang and Shin (2019), Isarida et al. (2020), Jäger et al. (2020), Johns, Jamieson, Crump, Jones, and Mewhort (2020), Kaula and Henson (2020), Lange, Berry, and Hollins (2019), Lauro, Schwartz, and Francis (2020), Lelonkiewicz, Ktori, and Crepaldi (2020), Liang, Ma, Bai, and Liversedge (2021), Li, Ren, Zheng, and Chen (2020), McKinley and Benjamin (2020), Meteyard and Davies (2020), Nooteboom and Quené (2020), Osth, Shabahang, Mewhort, and Heathcote (2020), Reifegerste, Jarvis, and Felser (2020), Saito, Kachlicka, Sun, and Tierney (2020), Samuel (2020), Schubert, Cohen, and Fischer-Baum (2020), Siegelman et al. (2020), Siew, Yi, and Lee (2021), Skrzypulec and Chuderski (2020), Snefjella, Lana, and Kuperman (2020), Snell and Theeuwes (2020), Tirso and Geraci (2020), Troyer and Kutas (2020), Tsuboi and Francis (2020), Villani, Lugli, Liuzza, Nicoletti, and Borghi (2021), Yang et al. (2020). The code and anonymized data for regenerating this paper are available from https://osf.io/3bzu8/.

    • Recognition of vocoded speech in English by Mandarin-speaking English-learners

      2022, Speech Communication
      Citation Excerpt :

      With regard to the form in which high-level semantic cues benefit the processing of degraded speech, some researchers proposed that the high-level lexical information helps in predicting the physical form of speech stream and modulating early acoustic processing (Sohoglu et al., 2014). Other researchers argued that semantic cues beyond the segmental level were involved in predicting and activating the forthcoming words or the distorted input (Corp and Rabagliati, 2020; Signoret et al., 2018). Unlike native listeners who have well-established phonetic categories and possess intact linguistic knowledge in the perceived language, non-native listeners demonstrate phonetic drift due to the influence of established phonetic system in their native language (Best, 1995; Escudero, 2009; Tobin et al., 2017) and inferior linguistic knowledge in the new language (Clahsen and Felser, 2006).

    View all citing articles on Scopus
    View full text