Natural Language Processing Methods for Acoustic and Landmark Event-based Features in Speech-based Depression Detection,IEEE Journal of Selected Topics in Signal Processing

当前位置： X-MOL 学术 › IEEE J. Sel. Top. Signal Process. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Natural Language Processing Methods for Acoustic and Landmark Event-based Features in Speech-based Depression Detection
IEEE Journal of Selected Topics in Signal Processing ( IF 8.7 ) Pub Date : 2020-02-01 , DOI: 10.1109/jstsp.2019.2949419
Zhaocheng Huang , Julien Epps , Dale Joachim , Vidhyasaharan Sethu

The processing of speech as an explicit sequence of events is common in automatic speech recognition (linguistic events), but has received relatively little attention in paralinguistic speech classification despite its potential for characterizing broad acoustic event sequences. This paper proposes a framework for analyzing speech as a sequence of acoustic events, and investigates its application to depression detection. In this framework, acoustic space regions are tokenized to ‘words’ representing speech events at fixed or irregular intervals. This tokenization allows the exploitation of acoustic word features using proven natural language processing methods. A key advantage of this framework is its ability to accommodate heterogeneous event types: herein we combine acoustic words and speech landmarks, which are articulation-related speech events. Another advantage is the option to fuse such heterogeneous events at various levels, including the embedding level. Evaluation of the proposed framework on both controlled laboratory-grade supervised audio recordings as well as unsupervised self-administered smartphone recordings highlight the merits of the proposed framework across both datasets, with the proposed landmark-dependent acoustic words achieving improvements in F1(depressed) of up to 15% and 13% for SH2-FS and DAIC-WOZ respectively, relative to acoustic speech baseline approaches.

中文翻译：

基于语音的抑郁检测中基于声学和地标事件特征的自然语言处理方法

作为显式事件序列的语音处理在自动语音识别（语言事件）中很常见，但在副语言语音分类中受到的关注相对较少，尽管它具有表征广泛的声学事件序列的潜力。本文提出了一种将语音分析为一系列声学事件的框架，并研究其在抑郁症检测中的应用。在这个框架中，声学空间区域被标记为代表固定或不规则间隔的语音事件的“词”。这种标记化允许使用经过验证的自然语言处理方法来利用声学单词特征。这个框架的一个关键优势是它能够适应异构事件类型：在这里我们结合了声学词和语音标志，这是与发音相关的语音事件。另一个优点是可以选择在各个级别（包括嵌入级别）融合此类异构事件。对受控实验室级监督录音和无监督自我管理智能手机录音的拟议框架的评估突出了拟议框架在两个数据集中的优点，拟议的地标相关声学词实现了 F1(depressed) 的改进相对于声学语音基线方法，SH2-FS 和 DAIC-WOZ 分别高达 15% 和 13%。

更新日期：2020-02-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11