Effects of age and left hemisphere lesions on audiovisual integration of speech
Introduction
Typical speech perception results from neural processes that integrate speech sounds with visual information from corresponding facial movements. While the auditory component of speech is typically considered the dominant modality, the presence of visual speech information can influence the listener’s perception in meaningful ways. Prior studies have shown that visual speech increases the listener’s ability to decipher speech in noisy environments, and the predictive value of visual speech cues can speed up speech processing overall (Sumby and Pollack, 1954, van Wassenhove et al., 2005). One paradigm for studying the interaction between visual and auditory speech components is the McGurk effect (Macdonald and McGurk, 1978, Mcgurk and Macdonald, 1976). In the McGurk task, the content of the audiovisual stimulus is manipulated by pairing an audio recording of one syllable (e.g. “pa”) with a video recording of a person saying a conflicting syllable (e.g. “ka”). The McGurk effect occurs when this incongruent information leads to the perception of something other than the dominant auditory signal, typically either a percept that matches the visual input or a fusion percept that matches neither of the presented stimuli (e.g. “ta”). While there is considerable inter-individual variability in the perception of the McGurk effect, it provides a sufficiently stable measure for investigating the neural mechanisms of audiovisual integration (Basu Mallick et al., 2015, Burnham and Dodd, 2013).
Most studies on the brain basis of audiovisual speech integration have used functional neuroimaging methods in healthy populations and the results have consistently shown a relationship between audiovisual speech processes and activity in the left posterior superior temporal gyrus and sulcus (STG/STS) (Benoit et al., 2010, Erickson et al., 2014, Erickson et al., 2014, Nath and Beauchamp, 2012, Sams et al., 1991, Skipper et al., 2007, Szycik et al., 2012). In the dual stream speech processing model originally described by Rauschecker and Scott, the posterior STG/STS is one component of the auditory/language dorsal stream, which also includes the inferior parietal lobule (IPL), premotor cortex, and portions of inferior frontal cortex (Bornkessel-Schlesewsky et al., 2015, Rauschecker, 2011, Rauschecker and Scott, 2009). In the context of audiovisual integration of speech cues, it has been suggested that these dorsal stream areas are involved in resolving conflicting auditory and visual cues or aiding in the perception of ambiguous speech inputs (Erickson et al., 2014, Matchin et al., 2014, Skipper et al., 2007, Skipper et al., 2005). A critical component of audiovisual perception is the temporal relationship between the presented signals. Activity in dorsal stream regions has been associated with asynchrony of auditory and visual speech (Macaluso et al., 2004, Stevenson et al., 2010, Stevenson et al., 2011) and non-speech cues (Powers, Hevey, & Wallace, 2012). The likelihood that two incoming inputs will be integrated depends in part on how close together in time they occur, and while exact synchrony is not necessary for integration, the closer two stimuli are in time, the more likely they are to be bound into a unified audiovisual percept (Andersen et al., 2004, Calvert et al., 2004, Vroomen and Keetels, 2010). Prior work by Powers et al., 2009, Powers et al., 2012 found that the length of the temporal window of integration for non-speech audiovisual stimuli (tones/flashes) can be changed through perceptual training (Powers et al., 2012, Powers et al., 2009), and that changes in the length of the window are related to activity in and connectivity with the posterior STS (Powers et al., 2012). In addition, research using speech stimuli has shown that the supramarginal gyrus (SMG) and the anterior intraparietal sulcus (IPS) are also involved in the processing of asynchronous audiovisual inputs (Miller, 2005, Wiersinga-Post et al., 2010). Given that some of the same brain regions are involved in the processing of incongruent stimuli and asynchronous stimuli, investigating timing manipulations of the signals in the McGurk paradigm may offer additional insight into audiovisual speech processes.
Because prior studies have almost exclusively used correlational methods, causal evidence regarding the neural basis of audiovisual speech processes is very limited. Virtual lesion experiments using non-invasive brain stimulation provide one method for testing causal relationships between structure and function. In one such study, Beauchamp et al. (2010) delivered single-pulse transcranial magnetic stimulation (TMS) to the left STS while subjects performed a McGurk task and found that disrupting left STS produced a decrease in the fusion percept (Beauchamp, Nath, & Pasalar, 2010). Marques et al. (2014) performed a similar experiment using three levels of tDCS stimulation: cathodal, anodal, and sham (Marques, Lapenta, Merabet, Bolognini, & Boggio, 2014). They found that cathodal (inhibitory) stimulation of bilateral STS compared to anodal (excitatory) stimulation produced a decrease in fusion percepts, although the comparison to sham only trended toward significance. Furthermore, they showed that while anodal (excitatory) stimulation of STS had no effect, anodal stimulation of nearby bilateral posterior parietal cortex (PPC) produced significantly increased fusion percepts compared to sham. Together, these studies suggest that the left STS plays a causal role in the integration of auditory and visual speech stimuli, and that bilateral PPC regions may assist in the process.
In addition to research using brain stimulation, causal evidence of structure-function relationships can be provided through studies of patients with damage to the brain. However, while several case studies and series have examined the effects of lesions on audiovisual speech perception, there is currently a lack of systematic evidence regarding relationships between lesion location and audiovisual speech processing. A number of behavioral studies have examined the McGurk effect in people with aphasia (Andersen and Starrfelt, 2015, Campbell et al., 1990, Hessler et al., 2012, Schmid et al., 2009, Youse et al., 2004), but in the majority of these, accuracy on congruent audiovisual speech trials (e.g., matched auditory and visual speech signals) is low or inconsistent (Andersen and Starrfelt, 2015, Hessler et al., 2012, Youse et al., 2004). This leaves open the possibility that subjects did not understand the task or may have experienced general sensory impairments that prevented perception of the stimuli, rather than deficits specific to audiovisual integration. Critically, these studies only assessed behavioral outcomes of audiovisual speech perception. Without a quantitative examination of the lesion-behavior relationship, little is revealed about the exact structures necessary for audiovisual speech perception. While a number of studies have assessed the McGurk paradigm in cases with various lesions and diagnoses (Campbell et al., 1990, Champoux et al., 2006, Freeman et al., 2013, Hamilton et al., 2006, Nicholson et al., 2002, Soroker et al., 1995), it is hard to establish an exact brain-behavior relationship across various case studies, particularly given the possibility of remapping of function (Baum, Martin, Hamilton, & Beauchamp, 2012) and the individual variability in McGurk perception in controls (Benoit et al., 2010, Nath and Beauchamp, 2012). As a result of these factors, it is unclear if a lack of the McGurk effect for any given person relates to the specific lesion or if that person was simply a low McGurk perceiver prior to his or her stroke. One way the impact of these issues could be minimized is by comparing the behavioral responses in stroke survivors and controls using the same McGurk stimuli and using systematic lesion-behavior assessments across a larger cohort with various lesion distributions.
A recent study assessed McGurk fusion rates in a large cohort of LH stroke survivors and used lesion-symptom mapping to identify structure-function relationships (Hickok et al., 2018). The results showed a significant relationship between McGurk fusion rate and lesions in the posterior STS and the posterior superior temporal plane, as well as a posterior middle temporal gyrus (MTG) region that was specifically implicated when the contribution of auditory-only processing was factored out. These findings corroborate the evidence from fMRI and TMS investigations implicating the posterior lateral temporal lobe in audiovisual integration of speech. There has not yet been a lesion symptom mapping study using timing manipulations of the McGurk effect, but several studies have shown that patients with LH damage and aphasia also exhibit deficits in temporal sequence processing in speech (Schirmer, 2004) and non-speech contexts (Efron, 1963, Swisher and Hirsh, 1972).
Since most stroke survivors are older adults, it is important to note that both audiovisual integration processes and sensitivity to the timing of perceptual cues have been shown to change across the lifespan (for overview, see Baum & Stevenson, 2017; for speech stimuli see Gordon-Salant et al., 2017, Huyse et al., 2014, Stevenson et al., 2015, Tye-Mmurray et al., 2016; for non-speech stimuli see Bedard and Barnett-Cowan, 2016, Diederich et al., 2008). However, the results of the McGurk task in older populations have been mixed, with some studies showing increased integration with age (Sekiyama et al., 2014, Setti et al., 2013) and others showing decreased integration with age (Stevenson, Baum, Krueger, Newhouse, & Wallace, 2018). Though most of these studies compare separate groups of older and younger adults, it is possible that there may be an effect of age even within in an older adult population. Thus, in order to disentangle the effects of a lesion from those of age, investigations of lesion-behavior relationships in older adult stroke patients would benefit from including an age-matched control group and considering possible effects of age in statistical analyses.
In the present study we examined audiovisual speech perception in LH stroke survivors and matched controls using a McGurk paradigm that manipulated the timing of the auditory and visual signals. We first assessed behavioral differences in audiovisual speech integration between the two groups. Then, using a multivariate lesion-symptom mapping method in the LH stroke cohort, we tested for relationships between LH lesion location and audiovisual speech integration, including sensitivity to the timing of audiovisual speech signals (Ghaleh et al., 2018, Zhang et al., 2014). We expected that audiovisual fusion would increase with age in both groups and be reduced overall by left hemisphere strokes, particularly those affecting the STS/STG. We also predicted that inferior parietal lobe lesions would reduce sensitivity to the timing of auditory and visual speech cues, such that fusion would be less affected by differences in the timing of the two signals. This study is novel in that it is the first to examine the audiovisual integration of speech signals in a cohort of stroke survivors, while also investigating the role of participant age and stimulus timing, two factors that have been shown to influence audiovisual integration.
Section snippets
Participants
Thirty-three LH stroke survivors (mean age = 59.8, SD = 10.4) and 39 matched controls (mean age = 58.6, SD 12.6) met the inclusion criteria for this study as described below. Participants provided informed consent and were compensated as approved by the Georgetown University Institutional Review Board. Table 1 contains demographic information for each participant, including aphasia diagnoses and Western Aphasia Battery Scores. Forty-three LH stroke survivors were tested. Stroke survivors were
Behavioral results
Full cohort model: The first logistic mixed effects model (without interaction terms) in the full cohort of participants found a significant effect of age (OR 1.05, p < .001), but no significant difference between stroke survivors and controls (OR: 0.68, p = .22). We also found that in comparison to the AL bin, there was significantly greater fusion in the MID (OR: 3.27, p < .0001) and VL bins (OR:2.66, p < .0001). In the next model, we considered whether interaction effects between group and
Discussion
We report two main findings. First, across both stroke survivors and controls we find strong relationship between age and propensity to perceive the illusory fusion percept. Second, we find that lesions in the left SMG and planum temporale are related to reduced temporal specificity in audiovisual integration, suggesting that these regions are integral for neural processes related to audiovisual temporal binding windows.
Acknowledgements
Funding: This work was supported by a National Science Foundation (NSF) Graduate Research Fellowship (United States, grant numbers DGE-0903443 and DGE-1444316 to LCE, and 1444316 to KM); the Achievement Rewards for College Scientists Metropolitan Washington Chapter, United States (ARCS/MWC) 2015-2016 Noama Wheeler Scholar (to LCE); National Institutes of Health, United States (grant numbers KL2TR000102 and R01DC014960 to PET, 2RO1 EY018923-03A1 to JPR, 2 R56 NS052494-06A1 to JPR, RO1 DC014989
References (94)
- et al.
Factors influencing audiovisual fission and fusion illusions
Cognitive Brain Research
(2004) - et al.
A reproducible evaluation of ANTs similarity metric performance in brain image registration
NeuroImage
(2011) - et al.
Multisensory speech perception without the left superior temporal sulcus
NeuroImage
(2012) - et al.
Spatiotemporal dynamics of audiovisual speech processing
NeuroImage
(2008) - et al.
Multisensory processing after a brain damage: Clues on post-injury crossmodal plasticity from neuropsychology
Neuroscience & Biobehavioral Reviews
(2013) - et al.
Neurobiological roots of language in primate audition: Common computational properties
Trends in Cognitive Sciences
(2015) - et al.
Neuropsychological studies of auditory-visual fusion illusions. Four case studies and their implications
Neuropsychologia
(1990) - et al.
Assessing age-related multisensory enhancement with the time-window-of-integration model
Neuropsychologia
(2008) - et al.
Sight and sound out of synch: Fragmentation and renormalisation of audiovisual integration and subjective timing
Cortex
(2013) - et al.
Phonotactic processing deficit following left-hemisphere stroke
Cortex
(2018)