Deep Representation Learning for Affective Speech Signal Analysis and Processing: Preventing unwanted signal disparities,IEEE Signal Processing Magazine

当前位置： X-MOL 学术 › IEEE Signal Proc. Mag. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Deep Representation Learning for Affective Speech Signal Analysis and Processing: Preventing unwanted signal disparities
IEEE Signal Processing Magazine ( IF 14.9 ) Pub Date : 2021-10-27 , DOI: 10.1109/msp.2021.3105939
Chi-Chun Lee , Kusha Sridhar , Jeng-Lin Li , Wei-Cheng Lin , Bo-Hao Su , Carlos Busso

Speech emotion recognition (SER) is an important research area, with direct impacts in applications of our daily lives, spanning education, health care, security and defense, entertainment, and human–computer interaction. The advances in many other speech signal modeling tasks, such as automatic speech recognition, text-to-speech synthesis, and speaker identification, have led to the current proliferation of speech-based technology. Incorporating SER solutions into existing and future systems can take these voice-based solutions to the next level. Speech is a highly nonstationary signal, with dynamically evolving spatial-temporal patterns. It often requires a sophisticated representation modeling framework to develop algorithms capable of handling real-life complexities.

中文翻译：

用于情感语音信号分析和处理的深度表示学习：防止不需要的信号差异

语音情感识别 (SER) 是一个重要的研究领域，直接影响到我们日常生活的应用，涵盖教育、医疗保健、安全和国防、娱乐和人机交互。许多其他语音信号建模任务（例如自动语音识别、文本到语音合成和说话人识别）的进步导致了当前基于语音的技术的激增。将 SER 解决方案整合到现有和未来的系统中可以将这些基于语音的解决方案提升到一个新的水平。语音是一种高度不稳定的信号，具有动态演化的时空模式。它通常需要一个复杂的表示建模框架来开发能够处理现实生活复杂性的算法。

更新日期：2021-10-29

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>