Elsevier

Vision Research

Volume 180, March 2021, Pages 51-62
Vision Research

Degraded visual and auditory input individually impair audiovisual emotion recognition from speech-like stimuli, but no evidence for an exacerbated effect from combined degradation

https://doi.org/10.1016/j.visres.2020.12.002Get rights and content
Under a Creative Commons license
open access

Highlights

  • Degraded video and audio in isolation impair emotion perception equally.

  • Combined degraded video and audio does not exacerbate individual effects.

  • Observers can compensate well for degraded audio, but barely for degraded video.

  • Gaze behavior adapts to degraded video and absent audio, but not to degraded audio.

Abstract

Emotion recognition requires optimal integration of the multisensory signals from vision and hearing. A sensory loss in either or both modalities can lead to changes in integration and related perceptual strategies. To investigate potential acute effects of combined impairments due to sensory information loss only, we degraded the visual and auditory information in audiovisual video-recordings, and presented these to a group of healthy young volunteers. These degradations intended to approximate some aspects of vision and hearing impairment in simulation. Other aspects, related to advanced age, potential health issues, but also long-term adaptation and cognitive compensation strategies, were not included in the simulations. Besides accuracy of emotion recognition, eye movements were recorded to capture perceptual strategies. Our data show that emotion recognition performance decreases when degraded visual and auditory information are presented in isolation, but simultaneously degrading both modalities does not exacerbate these isolated effects. Moreover, degrading the visual information strongly impacts recognition performance and on viewing behavior. In contrast, degrading auditory information alongside normal or degraded video had little (additional) effect on performance or gaze. Nevertheless, our results hold promise for visually impaired individuals, because the addition of any audio to any video greatly facilitates performance, even though adding audio does not completely compensate for the negative effects of video degradation. Additionally, observers modified their viewing behavior to degraded video in order to maximize their performance. Therefore, optimizing the hearing of visually impaired individuals and teaching them such optimized viewing behavior could be worthwhile endeavors for improving emotion recognition.

Keywords

Emotion perception
Eye-tracking
Central scotoma
Age-related hearing loss
Audiovisual
Dynamic

Cited by (0)

1

These authors contributed equally.