What You Say or How You Say It? Depression Detection Through Joint Modeling of Linguistic and Acoustic Aspects of Speech,Cognitive Computation

当前位置： X-MOL 学术 › Cognit. Comput. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

What You Say or How You Say It? Depression Detection Through Joint Modeling of Linguistic and Acoustic Aspects of Speech
Cognitive Computation ( IF 5.4 ) Pub Date : 2021-02-24 , DOI: 10.1007/s12559-020-09808-3
Nujud Aloshban , Anna Esposito , Alessandro Vinciarelli

Depression is one of the most common mental health issues. (It affects more than 4% of the world’s population, according to recent estimates.) This article shows that the joint analysis of linguistic and acoustic aspects of speech allows one to discriminate between depressed and nondepressed speakers with an accuracy above 80%. The approach used in the work is based on networks designed for sequence modeling (bidirectional Long-Short Term Memory networks) and multimodal analysis methodologies (late fusion, joint representation and gated multimodal units). The experiments were performed over a corpus of 59 interviews (roughly 4 hours of material) involving 29 individuals diagnosed with depression and 30 control participants. In addition to an accuracy of 80%, the results show that multimodal approaches perform better than unimodal ones owing to people’s tendency to manifest their condition through one modality only, a source of diversity across unimodal approaches. In addition, the experiments show that it is possible to measure the “confidence” of the approach and automatically identify a subset of the test data in which the performance is above a predefined threshold. It is possible to effectively detect depression by using unobtrusive and inexpensive technologies based on the automatic analysis of speech and language.

中文翻译：

您说什么或怎么说？通过语音的语言和声学方面的联合建模来进行抑郁检测

抑郁是最常见的心理健康问题之一。（根据最近的估计，它影响了全球4％以上的人口。）本文显示，对语音的语言和声学方面的联合分析使人们能够以80％以上的准确度区分沮丧和不沮丧的说话者。工作中使用的方法基于为序列建模设计的网络（双向长短时记忆网络）和多峰分析方法（后期融合，联合表示和门控多峰单元）。实验是在59个访谈（大约4个小时的材料）的语料库上进行的，涉及29位被诊断患有抑郁症的个体和30位对照参与者。除了80％的准确性外，结果表明，由于人们倾向于仅通过一种模式来表现其状况，因此多模式方法的性能要优于单模式方法，这是跨单模式方法的多样性之源。此外，实验表明，可以测量方法的“可信度”并自动识别测试数据的子集，其中性能高于预定义的阈值。通过使用基于语音和语言自动分析的低调且廉价的技术，可以有效地检测抑郁。实验表明，可以测量该方法的“置信度”，并自动识别性能高于预定义阈值的测试数据子集。通过使用基于语音和语言自动分析的低调且廉价的技术，可以有效地检测抑郁。实验表明，可以测量该方法的“置信度”，并自动识别性能高于预定义阈值的测试数据子集。通过使用基于语音和语言自动分析的低调且廉价的技术，可以有效地检测抑郁。

更新日期：2021-02-24

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>