Black-box adversarial attacks through speech distortion for speech emotion recognition,EURASIP Journal on Audio, Speech, and Music Processing

当前位置： X-MOL 学术 › EURASIP J. Audio Speech Music Proc. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Black-box adversarial attacks through speech distortion for speech emotion recognition
EURASIP Journal on Audio, Speech, and Music Processing ( IF 1.7 ) Pub Date : 2022-08-17 , DOI: 10.1186/s13636-022-00254-7
Jinxing Gao , Diqun Yan , Mingyu Dong

Speech emotion recognition is a key branch of affective computing. Nowadays, it is common to detect emotional diseases through speech emotion recognition. Various detection methods of emotion recognition, such as LTSM, GCN, and CNN, show excellent performance. However, due to the robustness of the model, the recognition results of the above models will have a large deviation. So in this article, we use black boxes to combat sample attacks to explore the robustness of the model. After using three different black-box attacks, the accuracy of the CNN-MAA model decreased by 69.38% at the best attack scenario, while the word error rate (WER) of voice decreased by only 6.24%, indicating that the robustness of the model does not perform well under our black-box attack method. After adversarial training, the model accuracy only decreased by 13.48%, which shows the effectiveness of adversarial training against sample attacks. Our code is available in Github .

中文翻译：

通过语音失真进行黑盒对抗攻击以进行语音情感识别

语音情感识别是情感计算的一个关键分支。如今，通过语音情感识别来检测情绪疾病是很常见的。情绪识别的各种检测方法，如LTSM、GCN和CNN，都表现出优异的性能。但由于模型的鲁棒性，上述模型的识别结果会有较大偏差。所以在本文中，我们使用黑盒来对抗样本攻击，以探索模型的鲁棒性。使用三种不同的黑盒攻击后，CNN-MAA模型在最佳攻击场景下的准确率下降了69.38%，而语音的单词错误率（WER）仅下降了6.24%，说明模型的鲁棒性在我们的黑盒攻击方法下表现不佳。经过对抗训练，模型准确率仅下降了 13.48%，这显示了对抗性训练对抗样本攻击的有效性。我们的代码在 Github 中可用。

更新日期：2022-08-17

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文