Depressive semantic awareness from vlog facial and vocal streams via spatio-temporal transformer,Digital Communications and Networks

当前位置： X-MOL 学术 › Digital Communications and Networks › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Depressive semantic awareness from vlog facial and vocal streams via spatio-temporal transformer
Digital Communications and Networks ( IF 7.5 ) Pub Date : 2023-03-27 , DOI: 10.1016/j.dcan.2023.03.007
Yongfeng Tao , Minqiang Yang , Yushan Wu , Kevin Lee , Adrienne Kline , Bin Hu

With the rapid growth of information transmission via the Internet, efforts have been made to reduce network load to promote efficiency. One such application is semantic computing, which can extract and process semantic communication. Social media has enabled users to share their current emotions, opinions, and life events through their mobile devices. Notably, people suffering from mental health problems are more willing to share their feelings on social networks. Therefore, it is necessary to extract semantic information from social media (vlog data) to identify abnormal emotional states to facilitate early identification and intervention. Most studies do not consider spatio-temporal information when fusing multimodal information to identify abnormal emotional states such as depression. To solve this problem, this paper proposes a spatio-temporal squeeze transformer method for the extraction of semantic features of depression. First, a module with spatio-temporal data is embedded into the transformer encoder, which is utilized to obtain a representation of spatio-temporal features. Second, a classifier with a voting mechanism is designed to encourage the model to classify depression and non-depression effectively. Experiments are conducted on the D-Vlog dataset. The results show that the method is effective, and the accuracy rate can reach 70.70%. This work provides scaffolding for future work in the detection of affect recognition in semantic communication based on social media vlog data.

中文翻译：

通过时空转换器从视频博客面部和声音流中获得抑郁语义意识

随着互联网信息传输的快速增长，人们努力降低网络负载以提高效率。其中一种应用是语义计算，它可以提取和处理语义通信。社交媒体使用户能够通过移动设备分享他们当前的情绪、观点和生活事件。值得注意的是，患有心理健康问题的人更愿意在社交网络上分享自己的感受。因此，有必要从社交媒体（vlog数据）中提取语义信息来识别异常情绪状态，以便于早期识别和干预。大多数研究在融合多模态信息来识别抑郁等异常情绪状态时，并未考虑时空信息。针对这一问题，本文提出一种时空挤压变换方法来提取抑郁症的语义特征。首先，将具有时空数据的模块嵌入到变压器编码器中，用于获取时空特征的表示。其次，设计了具有投票机制的分类器，以鼓励模型有效地对抑郁症和非抑郁症进行分类。实验在D-Vlog数据集上进行。结果表明，该方法是有效的，准确率可达70.70%。这项工作为未来基于社交媒体视频博客数据检测语义通信中的情感识别的工作提供了基础。

更新日期：2023-03-27

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文