Multi-Head Attention-Based Long Short-Term Memory for Depression Detection From Speech.,Frontiers in Neurorobotics

当前位置： X-MOL 学术 › Front. Neurorobotics › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Multi-Head Attention-Based Long Short-Term Memory for Depression Detection From Speech.
Frontiers in Neurorobotics ( IF 2.6 ) Pub Date : 2021-08-26 , DOI: 10.3389/fnbot.2021.684037
Yan Zhao ₁ , Zhenlin Liang ₁ , Jing Du ₁ , Li Zhang _{2,

3} , Chengyu Liu ₄ , Li Zhao ₁

Affiliation

Depression is a mental disorder that threatens the health and normal life of people. Hence, it is essential to provide an effective way to detect depression. However, research on depression detection mainly focuses on utilizing different parallel features from audio, video, and text for performance enhancement regardless of making full usage of the inherent information from speech. To focus on more emotionally salient regions of depression speech, in this research, we propose a multi-head time-dimension attention-based long short-term memory (LSTM) model. We first extract frame-level features to store the original temporal relationship of a speech sequence and then analyze their difference between speeches of depression and those of health status. Then, we study the performance of various features and use a modified feature set as the input of the LSTM layer. Instead of using the output of the traditional LSTM, multi-head time-dimension attention is employed to obtain more key time information related to depression detection by projecting the output into different subspaces. The experimental results show the proposed model leads to improvements of 2.3 and 10.3% over the LSTM model on the Distress Analysis Interview Corpus-Wizard of Oz (DAIC-WOZ) and the Multi-modal Open Dataset for Mental-disorder Analysis (MODMA) corpus, respectively.

中文翻译：

基于多头注意力的长短期记忆，用于从语音中检测抑郁症。

抑郁症是一种威胁人们健康和正常生活的精神障碍。因此，有必要提供一种有效的方法来检测抑郁症。然而，抑郁症检测的研究主要集中在利用音频、视频和文本的不同并行特征来增强性能，而没有充分利用语音的固有信息。为了关注抑郁症言语中情感上更显着的区域，在本研究中，我们提出了一种基于多头时维注意力的长短期记忆（LSTM）模型。我们首先提取帧级特征来存储语音序列的原始时间关系，然后分析抑郁症语音和健康状态语音之间的差异。然后，我们研究各种特征的性能，并使用修改后的特征集作为 LSTM 层的输入。不使用传统LSTM的输出，而是采用多头时维注意力，通过将输出投影到不同的子空间来获得更多与抑郁症检测相关的关键时间信息。实验结果表明，该模型在 Distress Analysis Interview Corpus-Wizard of Oz (DAIC-WOZ) 和 Multi-modal Open Dataset for Mental-disorder Analysis (MODMA) 语料库上比 LSTM 模型提高了 2.3% 和 10.3% ，分别。

更新日期：2021-08-26

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11