Elsevier

Brain and Language

Volume 206, July 2020, 104812
Brain and Language

Effects of age and left hemisphere lesions on audiovisual integration of speech

https://doi.org/10.1016/j.bandl.2020.104812Get rights and content

Highlights

  • Audiovisual integration of speech is positively related to age in two older groups.

  • Left hemisphere strokes reduce sensitivity to timing offsets of audiovisual signals.

  • A broadened window of integration suggests damage to an active temporal filter.

  • Lesion mapping localizes this function to left supramarginal gyrus and planum temporale.

Abstract

Neuroimaging studies have implicated left temporal lobe regions in audiovisual integration of speech and inferior parietal regions in temporal binding of incoming signals. However, it remains unclear which regions are necessary for audiovisual integration, especially when the auditory and visual signals are offset in time. Aging also influences integration, but the nature of this influence is unresolved. We used a McGurk task to test audiovisual integration and sensitivity to the timing of audiovisual signals in two older adult groups: left hemisphere stroke survivors and controls. We observed a positive relationship between age and audiovisual speech integration in both groups, and an interaction indicating that lesions reduce sensitivity to timing offsets between signals. Lesion-symptom mapping demonstrated that damage to the left supramarginal gyrus and planum temporale reduces temporal acuity in audiovisual speech perception. This suggests that a process mediated by these structures identifies asynchronous audiovisual signals that should not be integrated.

Introduction

Typical speech perception results from neural processes that integrate speech sounds with visual information from corresponding facial movements. While the auditory component of speech is typically considered the dominant modality, the presence of visual speech information can influence the listener’s perception in meaningful ways. Prior studies have shown that visual speech increases the listener’s ability to decipher speech in noisy environments, and the predictive value of visual speech cues can speed up speech processing overall (Sumby and Pollack, 1954, van Wassenhove et al., 2005). One paradigm for studying the interaction between visual and auditory speech components is the McGurk effect (Macdonald and McGurk, 1978, Mcgurk and Macdonald, 1976). In the McGurk task, the content of the audiovisual stimulus is manipulated by pairing an audio recording of one syllable (e.g. “pa”) with a video recording of a person saying a conflicting syllable (e.g. “ka”). The McGurk effect occurs when this incongruent information leads to the perception of something other than the dominant auditory signal, typically either a percept that matches the visual input or a fusion percept that matches neither of the presented stimuli (e.g. “ta”). While there is considerable inter-individual variability in the perception of the McGurk effect, it provides a sufficiently stable measure for investigating the neural mechanisms of audiovisual integration (Basu Mallick et al., 2015, Burnham and Dodd, 2013).

Most studies on the brain basis of audiovisual speech integration have used functional neuroimaging methods in healthy populations and the results have consistently shown a relationship between audiovisual speech processes and activity in the left posterior superior temporal gyrus and sulcus (STG/STS) (Benoit et al., 2010, Erickson et al., 2014, Erickson et al., 2014, Nath and Beauchamp, 2012, Sams et al., 1991, Skipper et al., 2007, Szycik et al., 2012). In the dual stream speech processing model originally described by Rauschecker and Scott, the posterior STG/STS is one component of the auditory/language dorsal stream, which also includes the inferior parietal lobule (IPL), premotor cortex, and portions of inferior frontal cortex (Bornkessel-Schlesewsky et al., 2015, Rauschecker, 2011, Rauschecker and Scott, 2009). In the context of audiovisual integration of speech cues, it has been suggested that these dorsal stream areas are involved in resolving conflicting auditory and visual cues or aiding in the perception of ambiguous speech inputs (Erickson et al., 2014, Matchin et al., 2014, Skipper et al., 2007, Skipper et al., 2005). A critical component of audiovisual perception is the temporal relationship between the presented signals. Activity in dorsal stream regions has been associated with asynchrony of auditory and visual speech (Macaluso et al., 2004, Stevenson et al., 2010, Stevenson et al., 2011) and non-speech cues (Powers, Hevey, & Wallace, 2012). The likelihood that two incoming inputs will be integrated depends in part on how close together in time they occur, and while exact synchrony is not necessary for integration, the closer two stimuli are in time, the more likely they are to be bound into a unified audiovisual percept (Andersen et al., 2004, Calvert et al., 2004, Vroomen and Keetels, 2010). Prior work by Powers et al., 2009, Powers et al., 2012 found that the length of the temporal window of integration for non-speech audiovisual stimuli (tones/flashes) can be changed through perceptual training (Powers et al., 2012, Powers et al., 2009), and that changes in the length of the window are related to activity in and connectivity with the posterior STS (Powers et al., 2012). In addition, research using speech stimuli has shown that the supramarginal gyrus (SMG) and the anterior intraparietal sulcus (IPS) are also involved in the processing of asynchronous audiovisual inputs (Miller, 2005, Wiersinga-Post et al., 2010). Given that some of the same brain regions are involved in the processing of incongruent stimuli and asynchronous stimuli, investigating timing manipulations of the signals in the McGurk paradigm may offer additional insight into audiovisual speech processes.

Because prior studies have almost exclusively used correlational methods, causal evidence regarding the neural basis of audiovisual speech processes is very limited. Virtual lesion experiments using non-invasive brain stimulation provide one method for testing causal relationships between structure and function. In one such study, Beauchamp et al. (2010) delivered single-pulse transcranial magnetic stimulation (TMS) to the left STS while subjects performed a McGurk task and found that disrupting left STS produced a decrease in the fusion percept (Beauchamp, Nath, & Pasalar, 2010). Marques et al. (2014) performed a similar experiment using three levels of tDCS stimulation: cathodal, anodal, and sham (Marques, Lapenta, Merabet, Bolognini, & Boggio, 2014). They found that cathodal (inhibitory) stimulation of bilateral STS compared to anodal (excitatory) stimulation produced a decrease in fusion percepts, although the comparison to sham only trended toward significance. Furthermore, they showed that while anodal (excitatory) stimulation of STS had no effect, anodal stimulation of nearby bilateral posterior parietal cortex (PPC) produced significantly increased fusion percepts compared to sham. Together, these studies suggest that the left STS plays a causal role in the integration of auditory and visual speech stimuli, and that bilateral PPC regions may assist in the process.

In addition to research using brain stimulation, causal evidence of structure-function relationships can be provided through studies of patients with damage to the brain. However, while several case studies and series have examined the effects of lesions on audiovisual speech perception, there is currently a lack of systematic evidence regarding relationships between lesion location and audiovisual speech processing. A number of behavioral studies have examined the McGurk effect in people with aphasia (Andersen and Starrfelt, 2015, Campbell et al., 1990, Hessler et al., 2012, Schmid et al., 2009, Youse et al., 2004), but in the majority of these, accuracy on congruent audiovisual speech trials (e.g., matched auditory and visual speech signals) is low or inconsistent (Andersen and Starrfelt, 2015, Hessler et al., 2012, Youse et al., 2004). This leaves open the possibility that subjects did not understand the task or may have experienced general sensory impairments that prevented perception of the stimuli, rather than deficits specific to audiovisual integration. Critically, these studies only assessed behavioral outcomes of audiovisual speech perception. Without a quantitative examination of the lesion-behavior relationship, little is revealed about the exact structures necessary for audiovisual speech perception. While a number of studies have assessed the McGurk paradigm in cases with various lesions and diagnoses (Campbell et al., 1990, Champoux et al., 2006, Freeman et al., 2013, Hamilton et al., 2006, Nicholson et al., 2002, Soroker et al., 1995), it is hard to establish an exact brain-behavior relationship across various case studies, particularly given the possibility of remapping of function (Baum, Martin, Hamilton, & Beauchamp, 2012) and the individual variability in McGurk perception in controls (Benoit et al., 2010, Nath and Beauchamp, 2012). As a result of these factors, it is unclear if a lack of the McGurk effect for any given person relates to the specific lesion or if that person was simply a low McGurk perceiver prior to his or her stroke. One way the impact of these issues could be minimized is by comparing the behavioral responses in stroke survivors and controls using the same McGurk stimuli and using systematic lesion-behavior assessments across a larger cohort with various lesion distributions.

A recent study assessed McGurk fusion rates in a large cohort of LH stroke survivors and used lesion-symptom mapping to identify structure-function relationships (Hickok et al., 2018). The results showed a significant relationship between McGurk fusion rate and lesions in the posterior STS and the posterior superior temporal plane, as well as a posterior middle temporal gyrus (MTG) region that was specifically implicated when the contribution of auditory-only processing was factored out. These findings corroborate the evidence from fMRI and TMS investigations implicating the posterior lateral temporal lobe in audiovisual integration of speech. There has not yet been a lesion symptom mapping study using timing manipulations of the McGurk effect, but several studies have shown that patients with LH damage and aphasia also exhibit deficits in temporal sequence processing in speech (Schirmer, 2004) and non-speech contexts (Efron, 1963, Swisher and Hirsh, 1972).

Since most stroke survivors are older adults, it is important to note that both audiovisual integration processes and sensitivity to the timing of perceptual cues have been shown to change across the lifespan (for overview, see Baum & Stevenson, 2017; for speech stimuli see Gordon-Salant et al., 2017, Huyse et al., 2014, Stevenson et al., 2015, Tye-Mmurray et al., 2016; for non-speech stimuli see Bedard and Barnett-Cowan, 2016, Diederich et al., 2008). However, the results of the McGurk task in older populations have been mixed, with some studies showing increased integration with age (Sekiyama et al., 2014, Setti et al., 2013) and others showing decreased integration with age (Stevenson, Baum, Krueger, Newhouse, & Wallace, 2018). Though most of these studies compare separate groups of older and younger adults, it is possible that there may be an effect of age even within in an older adult population. Thus, in order to disentangle the effects of a lesion from those of age, investigations of lesion-behavior relationships in older adult stroke patients would benefit from including an age-matched control group and considering possible effects of age in statistical analyses.

In the present study we examined audiovisual speech perception in LH stroke survivors and matched controls using a McGurk paradigm that manipulated the timing of the auditory and visual signals. We first assessed behavioral differences in audiovisual speech integration between the two groups. Then, using a multivariate lesion-symptom mapping method in the LH stroke cohort, we tested for relationships between LH lesion location and audiovisual speech integration, including sensitivity to the timing of audiovisual speech signals (Ghaleh et al., 2018, Zhang et al., 2014). We expected that audiovisual fusion would increase with age in both groups and be reduced overall by left hemisphere strokes, particularly those affecting the STS/STG. We also predicted that inferior parietal lobe lesions would reduce sensitivity to the timing of auditory and visual speech cues, such that fusion would be less affected by differences in the timing of the two signals. This study is novel in that it is the first to examine the audiovisual integration of speech signals in a cohort of stroke survivors, while also investigating the role of participant age and stimulus timing, two factors that have been shown to influence audiovisual integration.

Section snippets

Participants

Thirty-three LH stroke survivors (mean age = 59.8, SD = 10.4) and 39 matched controls (mean age = 58.6, SD 12.6) met the inclusion criteria for this study as described below. Participants provided informed consent and were compensated as approved by the Georgetown University Institutional Review Board. Table 1 contains demographic information for each participant, including aphasia diagnoses and Western Aphasia Battery Scores. Forty-three LH stroke survivors were tested. Stroke survivors were

Behavioral results

Full cohort model: The first logistic mixed effects model (without interaction terms) in the full cohort of participants found a significant effect of age (OR 1.05, p < .001), but no significant difference between stroke survivors and controls (OR: 0.68, p = .22). We also found that in comparison to the AL bin, there was significantly greater fusion in the MID (OR: 3.27, p < .0001) and VL bins (OR:2.66, p < .0001). In the next model, we considered whether interaction effects between group and

Discussion

We report two main findings. First, across both stroke survivors and controls we find strong relationship between age and propensity to perceive the illusory fusion percept. Second, we find that lesions in the left SMG and planum temporale are related to reduced temporal specificity in audiovisual integration, suggesting that these regions are integral for neural processes related to audiovisual temporal binding windows.

Acknowledgements

Funding: This work was supported by a National Science Foundation (NSF) Graduate Research Fellowship (United States, grant numbers DGE-0903443 and DGE-1444316 to LCE, and 1444316 to KM); the Achievement Rewards for College Scientists Metropolitan Washington Chapter, United States (ARCS/MWC) 2015-2016 Noama Wheeler Scholar (to LCE); National Institutes of Health, United States (grant numbers KL2TR000102 and R01DC014960 to PET, 2RO1 EY018923-03A1 to JPR, 2 R56 NS052494-06A1 to JPR, RO1 DC014989

References (94)

  • R.H. Hamilton et al.

    An acquired deficit of audiovisual speech processing

    Brain and Language

    (2006)
  • G. Hickok et al.

    Neural networks supporting audiovisual integration for speech: A large-scale lesion study

    Cortex

    (2018)
  • E. Macaluso et al.

    Spatial and temporal factors during processing of audiovisual speech: A PET study

    NeuroImage

    (2004)
  • A.R. Nath et al.

    A neural basis for interindividual differences in the McGurk effect, a multisensory speech illusion

    NeuroImage

    (2012)
  • O. Profant et al.

    Diffusion tensor imaging and MR morphometry of the central auditory pathway and auditory cortex in aging

    Neuroscience

    (2014)
  • J.P. Rauschecker

    An expanded role for the dorsal auditory pathway in sensorimotor control and integration

    Hearing Research

    (2011)
  • C. Rorden et al.

    Age-specific CT and MRI templates for spatial normalization

    NeuroImage

    (2012)
  • M. Sams et al.

    Seeing speech: Visual information from lip movements modifies activity in the human auditory cortex

    Neuroscience Letters

    (1991)
  • M. Sams et al.

    McGurk effect in Finnish syllables, isolated words, and words in sentences: Effects of word meaning and sentence context

    Speech Communication

    (1998)
  • A. Schirmer

    Timing speech: A review of lesion and neuroimaging findings

    Cognitive Brain Research

    (2004)
  • J.I. Skipper et al.

    Speech-associated gestures, Broca’s area, and the human mirror system

    Brain and Language

    (2007)
  • J.I. Skipper et al.

    Listening to talking faces: Motor cortical activation during speech perception

    NeuroImage

    (2005)
  • N. Soroker et al.

    “McGurk illusion” to bilateral administration of sensory stimuli in patients with hemispatial neglect

    Neuropsychologia

    (1995)
  • R.A. Stevenson et al.

    Neural processing of asynchronous audiovisual speech perception

    NeuroImage

    (2010)
  • R.A. Stevenson et al.

    Deficits in audiovisual speech perception in normal aging emerge at the level of whole-word recognition

    Neurobiology of Aging

    (2015)
  • R.A. Stevenson et al.

    Discrete neural substrates underlie complementary audiovisual speech integration processes

    NeuroImage

    (2011)
  • L. Swisher et al.

    Brain damage and the ordering of two temporally successive stimuli

    Neuropsychologia

    (1972)
  • M. Wiener et al.

    The image of time: A voxel-wise meta-analysis

    NeuroImage

    (2010)
  • M. Wiener et al.

    Implicit timing activates the left inferior parietal cortex

    Neuropsychologia

    (2010)
  • P.A. Yushkevich et al.

    User-guided 3D active contour segmentation of anatomical structures: Significantly improved efficiency and reliability

    NeuroImage

    (2006)
  • P. Adank et al.

    The role of planum temporale in processing accent variation in spoken language comprehension

    Human Brain Mapping

    (2012)
  • M. Alm et al.

    Audio-visual speech experience with age influences perceived audio-visual asynchrony in speech

    The Journal of the Acoustical Society of America

    (2013)
  • T.S. Andersen et al.

    Audiovisual integration of speech in a patient with Broca’s Aphasia

    Frontiers in Psychology

    (2015)
  • D. Basu Mallick et al.

    Variability and stability in the McGurk effect: contributions of participants, stimuli, time, and response type

    Psychonomic Bulletin & Review

    (2015)
  • S.H. Baum et al.

    Shifts in Audiovisual Processing in Healthy Aging

    Current Behavioral Neuroscience Reports

    (2017)
  • M.S. Beauchamp et al.

    fMRI-guided transcranial magnetic stimulation reveals that the superior temporal sulcus is a cortical locus of the McGurk effect

    Journal of Neuroscience

    (2010)
  • G. Bedard et al.

    Impaired timing of audiovisual events in the elderly

    Experimental Brain Research

    (2016)
  • M.M.K. Benoit et al.

    Primary and multisensory cortical activity is correlated with audiovisual percepts

    Human Brain Mapping

    (2010)
  • J.R. Binder et al.

    Function of the left planum temporale in auditory and linguistic processing

    Brain

    (1996)
  • V.A. Brown et al.

    What accounts for individual differences in susceptibility to the McGurk effect?

    PloS One

    (2018)
  • D. Burnham et al.

    Auditory-visual speech perception as a direct process: The McGurk effect in infants and across languages

    (2013)
  • G. Calvert et al.

    The handbook of multisensory processes

    (2004)
  • F. Champoux et al.

    A role for the inferior colliculus in multisensory speech integration

    NeuroReport

    (2006)
  • K.M. Cienkowski et al.

    Auditory-visual speech perception and aging

    Ear and Hearing

    (2002)
  • C. Colin et al.

    Top-down and bottom-up modulation of audiovisual integration in speech

    European Journal of Cognitive Psychology

    (2010)
  • A.T. DeMarco et al.

    A multivariate lesion symptom mapping toolbox and examination of lesion-volume biases and correction methods in lesion-symptom mapping

    Human Brain Mapping

    (2018)
  • R. Efron

    Temporal perception, aphasia and Déjà VU

    Brain

    (1963)
  • Cited by (1)

    View full text