Elsevier

Cognition

Volume 197, April 2020, 104163
Cognition

Orofacial somatosensory inputs modulate word segmentation in lexical decision

https://doi.org/10.1016/j.cognition.2019.104163Get rights and content

Abstract

There is accumulating evidence that articulatory/motor knowledge plays a role in phonetic processing, such as the recent finding that orofacial somatosensory inputs may influence phoneme categorization. We here show that somatosensory inputs also contribute at a higher level of the speech perception chain, that is, in the context of word segmentation and lexical decision. We carried out an auditory identification test using a set of French phrases consisting of a definite article “la” followed by a noun, which may be segmented differently according to the placement of accents within the phrase. Somatosensory stimulation was applied to the facial skin at various positions within the acoustic utterances corresponding to these phrases, which had been recorded with neutral accent, that is, with all syllables given similar emphasis. We found that lexical decisions reflecting word segmentation were significantly and systematically biased depending on the timing of somatosensory stimulation. This bias was not induced when somatosensory stimulation was applied to the skin other than on the face. These results provide evidence that the orofacial somatosensory system contributes to lexical perception in situations that would be disambiguated by different articulatory movements, and suggests that articulatory/motor knowledge might be involved in speech segmentation.

Introduction

A long-standing question about speech perception concerns the potential role of articulatory knowledge in the phonetic decoding process. Coarticulatory phenomena classically modify the acoustic content of a given phonemic unit, which led to the development of the Motor Theory of Speech Perception, arguing that speech decoding is based on the recovery of the motor cause of speech stimuli, and that articulatory/motor representations provide the basis of speech communication (Liberman, Cooper, Shankweiler, & Studdert-Kennedy, 1967; Liberman & Mattingly, 1985). Listening to speech sounds activates cortical areas related to speech production in the motor and premotor cortex (e.g. Fadiga, Craighero, Buccino, & Rizzolatti, 2002; Grabski et al., 2013; Pulvermüller et al., 2006; Tremblay & Small, 2011; Watkins, Strafella, & Paus, 2003; Wilson, Saygin, Sereno, & Iacoboni, 2004). A number of behavioral studies show that articulatory movements preceding or accompanying the presentation of auditory stimuli modify speech perception, by e.g. motor stimulation (Sato et al., 2011) or articulatory suppression (Stokes, Venezia, & Hickok, 2019). Articulatory training by imitation appears to improve the auditory comprehension of an unfamiliar accent (Adank, Hagoort, & Bekkering, 2010) or a dysarthric speaker (Borrie & Schäfer, 2015) and training articulation with altered auditory feedback changes further perception of speech sounds (Lametti, Rochet-Capellan, Neufeld, Shiller, & Ostry, 2014; Shiller, Sato, Gracco, & Baum, 2009). Importantly however, the effects of articulation on perception are generally small and mostly obtained in configurations for which auditory decoding is made difficult because of noise, natural or induced degradation or stimulus ambiguity (see e.g. D'Ausilio, Jarmolowska, Busan, Bufalari, & Craighero, 2011; Stokes et al., 2019).

Somatosensory information associated to speech articulation is likely to play an important role in this process. The orofacial somatosensory system differs from the limb system and other body parts in terms of proprioceptive function since muscle proprioceptors, which play a predominant role in proprioception, have not been found in the orofacial muscles besides the jaw closing muscles (Stål, Eriksson, Eriksson, & Thornell, 1990). Given that the facial skin is deformed in orofacial movements including speaking (Connor & Abbs, 1998), cutaneous mechanoreceptors in the facial skin can play a role as alternative sources of proprioceptive information. Previous neural recording observations confirmed that cutaneous mechanoreceptors lateral to the oral angle are activated in jaw motion (Johansson, Trulsson, Olsson, & Abbs, 1988; Nordin & Hagbarth, 1989). This idea has also been demonstrated in somatosensory perturbation studies applying facial skin deformation externally. Ito and Gomi (2007) showed that downward skin stretches laterally to the oral angle induced compensatory reflex response in the upper lip related to jaw downward movements. Stretching the skin backwards also induced adaptive movement change in the upper lip for utterances requiring lip protrusion (Ito & Ostry, 2010). Accordingly, stretching the facial skin in a specific direction can provide somatosensory information related to lip and jaw articulatory motion, and can be used as an effective tool to investigate the orofacial somatosensory function in the processing of speech sounds.

Indeed, the role of somatosensory inputs arising from the facial skin in speech perception has been displayed by Ito, Tiede, and Ostry (2009). These authors reported that when the facial skin was pulled in the upward direction, an auditory stimulus ambiguous between /head/ and /had/ was identified as /head/ rather than /had/. Their interpretation was that articulatory motion for /head/ and /had/ involves vertical movements of the jaw and tongue, allowing modulations of the perception of speech sounds in this region by applying adequate somatosensory input. This kind of studies suggests a potential role of the somatosensory system in speech perception, in relation with theoretical proposals associating auditory processes and articulatory inferences in multisensory theories of speech perception (Schwartz, Basirat, Ménard, & Sato, 2012; Skipper, Van Wassenhove, Nusbaum, & Small, 2007).

Coarticulatory processes not only make the acoustic content of a phonemic unit context-dependent, but may also intervene to blur or enhance the segmentation process, crucial for lexical access (Spinelli, Grimault, Meunier, & Welby, 2010; Spinelli, Welby, & Schaegis, 2007). Since coarticulatory processes are based on articulatory mechanisms related to anticipation and perseveration in gestural dynamics, it is likely that the structure of articulatory motion plays a role in the segmentation as well as the decoding process. Considering the role of the somatosensory system in phonetic decoding, the question that we asked in this study is whether it could also intervene at the level of word segmentation for lexical access. This would provide a hint that perceptuo-motor relationships are more pervasive in speech perception than currently envisioned, and that they actually structure the processing chain enabling to relate the incoming speech signals with the lexicon in the human brain.

For this aim, we capitalized on the paradigm by Spinelli et al. (2010) on the role of prosodic cues for disambiguation of ambiguous acoustic structures in French. The study tested French phrases consisting of a definite article “la” followed by a noun, which are pronounced in the same way because of “elision” phenomena, e.g. “l'attache”, /l#ataʃ/ [“the string” in English] vs. “la tache”, /la#taʃ/ [“the stain” in English], “#” indicating the word boundary. The authors found that acoustic prosodic cues (e.g. local F0 increase) enabled to switch the percept from one structure to the other, and suggested that the phrases can be disambiguated and segmented differently according to the placement of the accents in the utterance, in line with articulatory strategies displayed in the production of this kind of material (Spinelli et al., 2007).

Since putting an accent in a phrase or changing the acoustic prosodic cues can be achieved by hyper-articulation (Fougeron, 2001; Spinelli et al., 2007), the cues for word segmentation may be obtained not only from acoustical information, but also from articulatory information provided by other sensory modalities. It has been known for a long time that the visual modality contributes to speech perception, not only for phonetic decoding (e.g. for speech in noise, Erber, 1969; Sumby & Pollack, 1954; or with incongruent auditory and visual inputs, McGurk & MacDonald, 1976) but also in prosodic processing (Dohen, Lœvenbruck, Cathiard, & Schwartz, 2004), lexical access (Fort et al., 2013) and word segmentation (Mitchel & Weiss, 2014; Sell & Kaschak, 2009). A recent study (Strauß, Savariaux, Kandel, & Schwartz, 2015), using the same type of material as Spinelli et al. (2010), confirmed that accentuated visual lip movements at a given position in the phonetic input may attract the perceptual placement of word segmentation, suggesting that visual lip information can play a role similar to acoustic prosody. Given that facial skin deformation has already been shown to provide articulatory information able to modify the phonetic decoding process, it might also contribute to modify the segmentation process before lexical access in the processing of a continuous speech stream.

The present study aims at exploring whether somatosensory inputs associated with facial skin deformation could also intervene in the segmentation process and hence modify lexical decision in French. To test this hypothesis, we carried out an auditory identification test of word segmentation similar to the one by Spinelli et al. (2010) and Strauß et al. (2015), using a specific lexical material in French characteristic of the elision phenomenon introduced previously. We examined how perceptual performance in an auditory identification test was modulated depending on when somatosensory inputs were applied during listening at the target auditory phrases. We speculated that a somatosensory stimulation pulling the facial skin upwards (as in Ito et al., 2009) at a given instant would lead participants to infer the presence of an accent around the corresponding position in time, and that this would modify the result of the segmentation process. We further speculated that a somatosensory stimulation applied elsewhere on the body (here, on the forearm) would be less or not effective. Finally, since multisensory interaction requires adequate matching of the various sources of information between the involved modalities, we reasoned that the vertical facial skin deformation would be more effective for utterances containing vowels realized with vertical articulatory movements of the jaw and tongue (e.g. /a/) than horizontal tongue or lip movements (e.g. /i/ or /o/).

Section snippets

Participants

Forty native French speakers (mean age = 27.10 years, SD = 6.56 years, 11 males, 29 females) participated in the experiment. They had no record of neurophysiological issues with hearing or orofacial sensation. The protocol of this experiment was approved by the Comité d'Ethique pour la Recherche, Grenoble Alpes (CERGA: Avis-2018-12-11-4). All participants signed the corresponding consent form.

Acoustic material

The acoustic material was directly inspired from Spinelli et al. (2010). It consisted in sequences of a

Face vs. forearm

Fig. 2 presents the average judgement probability in the word identification task across somatosensory conditions. In the Face condition, the judgement probability appears to vary with the timing of the somatosensory stimulation (Fig. 2A). When the somatosensory stimulation leads the first vowel (=P3), the judgement probability reaches the smallest value overall. When the somatosensory stimulation is close to the second vowel (=P6), the judgement probability is larger. In the Forearm condition,

Discussion

The results in Section 3 provide clear evidence relative to the hypotheses introduced in 1.2 Assessing the role of the somatosensory system in lexical access for speech perception, 2.5 Hypotheses and data analysis. Firstly, there was indeed an effect of the timing of the somatosensory stimulation on lexical decision when the stimulation was applied on the face, though not on the forearm. Furthermore, the effect appeared to be significantly different depending on the articulatory nature of the

Conclusion

This study showed that the lexical perception of a given sequence in French, involving ambiguous word segmentation, can be significantly modified by applying a somatosensory input on the facial skin. The judgement was systematically biased towards one or the other segmentation depending on the timing of the somatosensory input. Importantly, this effect was specifically induced by a stimulation on the facial skin, but not on the skin elsewhere than the face (forearm). In a follow-up analysis, we

Authors' contributions

R.O., J.-L.S. and T.I. designed research. R.O. and T.I. performed the experiment. R.O., J.-L.S. and T.I. analyzed data. R.O., J.-L.S. and T.I. wrote the paper. R.O., J.-L.S. and T.I. confirmed the final version.

Acknowledgements

We thank Nathan Mary and Dorian Deliquet for data collection and analysis and Silvain Gerber for statistical analysis.

Funding information

This work was supported by the European Research Council under the European Community's Seventh Framework Program (FP7/2007-2013 Grant Agreement no. 339152). This work was supported by grant ANR-15-IDEX-02 CDP NeuroCoG, and National Institute on Deafness and Other Communication Disorders Grant R01-DC017439.

Declaration of competing interest

The authors have no competing interests to declare.

References (55)

  • P. Tremblay et al.

    On the context-dependent nature of the contribution of the ventral premotor cortex to speech perception

    NeuroImage

    (2011)
  • V. van Wassenhove et al.

    Temporal window of integration in auditory-visual speech perception

    Neuropsychologia

    (2007)
  • K.E. Watkins et al.

    Seeing and hearing speech excites the motor system involved in speech production

    Neuropsychologia

    (2003)
  • F. Aboitiz

    A brain for speech

    Evolutionary continuity in primate and human auditory-vocal processing. Frontiers in Neuroscience

    (2018)
  • P. Adank et al.

    Imitation improves language comprehension

    Psychological Science

    (2010)
  • A. Basirat et al.

    Perceptuo-motor interactions in the perceptual organization of speech: Evidence from the verbal transformation effect

    Philosophical Transactions of the Royal Society B: Biological Sciences

    (2012)
  • S.A. Borrie et al.

    The role of somatosensory information in speech perception: Imitation improves recognition of disordered speech

    Journal of Speech Language and Hearing Research

    (2015)
  • N.P. Connor et al.

    Movement-related skin strain associated with goal-oriented lip actions

    Experimental Brain Research

    (1998)
  • N.P. Erber

    Interaction of audition and vision in the recognition of oral speech stimuli

    Journal of Speech and Hearing Research

    (1969)
  • L. Fadiga et al.

    Speech listening specifically modulates the excitability of tongue muscles: A TMS study

    European Journal of Neuroscience

    (2002)
  • M. Fort et al.

    Seeing the initial articulatory gestures of a word triggers lexical access

    Language and Cognitive Processes

    (2013)
  • B. Gick et al.

    Aero-tactile integration in speech perception

    Nature

    (2009)
  • A.-L. Giraud et al.

    Cortical oscillations and speech processing: Emerging computational principles and operations

    Nature Neuroscience

    (2012)
  • T. Hothorn et al.

    Simultaneous inference in general parametric models

    Biometrical Journal

    (2008)
  • T. Ito et al.

    Cutaneous mechanoreceptors contribute to the generation of a cortical reflex in speech

    NeuroReport

    (2007)
  • T. Ito et al.

    Temporal factors affecting somatosensory-auditory interactions in speech processing

    Frontiers in Psychology

    (2014)
  • T. Ito et al.

    Left lateralized enhancement of orofacial somatosensory processing due to speech sounds

    Journal of Speech Language and Hearing Research

    (2013)
  • Cited by (7)

    • Somatosensory contribution to audio-visual speech processing

      2021, Cortex
      Citation Excerpt :

      We used movement related stimulation to the listener through orofacial skin stretch to evaluate the neural response to somatosensory stimulation of the facial skin through an analysis of the change in electroencephalographic (EEG) activity. Orofacial somatosensory input associated with facial skin deformation provides motion information for speech production (Connor & Abbs, 1998; Ito & Gomi, 2007; Ito & Ostry, 2010; Johansson et al., 1988), and has been shown to interact in motion-specific ways to influence speech perception (Ito et al., 2009; Ogane et al., 2020; Trudeau-Fisette et al., 2019). The stimulation associated with facial skin deformation also changes cortical potentials for auditory speech perception (Ito et al. 2013, 2014), but the stimulation of lip tapping does not (Möttönen et al., 2005).

    • Word segmentation based on prosody in Parkinson’s Disease

      2021, Clinical Linguistics and Phonetics
    View all citing articles on Scopus
    View full text