当前位置: X-MOL 学术Neurocomputing › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Speech neuromuscular decoding based on spectrogram images using conformal predictors with Bi-LSTM
Neurocomputing ( IF 6 ) Pub Date : 2021-03-16 , DOI: 10.1016/j.neucom.2021.03.025
You Wang , Ming Zhang , Rumeng Wu , Hengyang Wang , Zhiyuan Luo , Guang Li

The relationships between muscle movements and neural signals make it possible to decode silent speech based on neuromuscular activities. The decoding can be formulated as a supervised classification task. The electromyography (EMG) captured from surface articulatory muscles contains useful information that can help assist in decoding of speech. Spectrograms obtained from EMG have a wealth of information relating to the decoding, but have not yet been fully explored. In addition, the decoding results are often uncertain. Therefore, it is important to quantify the prediction confidence. This paper aims to improve the decoding performance by representing time series signals as spectrograms and utilising Inductive Conformal Prediction (ICP) to provide predictions with confidence. All EMG data are recorded on six dedicated facial muscles while participants recite the displayed words subvocally. Three pre-trained convolutional models of MobileNet-V1, ResNet18 and Xception are used to extract features from spectrograms for classification. Both bidirectional Long-Short Time Memory (Bi-LSTM) and Gate Recurrent Unit (GRU) classifiers are used for prediction. Furthermore, an ICP decoder based on Bi-LSTM is built to provide guaranteed predictions for each example at a specified confidence level. The proposed method of combining feature extraction based on Xception and classification using Bi-LSTM gives a higher accuracy of 0.87 than other methods. ICP outputs confidence measurements for each example that can help users to evaluate the reliability of new predictions. Experimental results demonstrate the practical usefulness in decoding articulatory neuromuscular activity and the advantages in applying ICP.



中文翻译:

基于频谱图图像的语音神经肌肉解码,使用带有Bi-LSTM的保形预测器

肌肉运动和神经信号之间的关系使得基于神经肌肉活动对无声语音进行解码成为可能。可以将解码公式化为监督分类任务。从表面关节肌捕获的肌电图(EMG)包含有用的信息,可帮助协助语音解码。从EMG获得的频谱图具有与解码有关的大量信息,但尚未得到充分探索。另外,解码结果通常是不确定的。因此,量化预测置信度很重要。本文旨在通过将时间序列信号表示为频谱图并利用感应保形预测(ICP)来提供置信度更高的预测,从而提高解码性能。所有EMG数据都记录在六个专用的面部肌肉上,而参与者则以语音的方式背诵显示的单词。使用三个预先训练的MobileNet-V1,ResNet18和Xception卷积模型从频谱图中提取特征进行分类。双向长时记忆(Bi-LSTM)和门循环单元(GRU)分类器均用于预测。此外,基于Bi-LSTM的ICP解码器可在指定的置信度下为每个示例提供有保证的预测。提出的将基于Xception的特征提取与使用Bi-LSTM进行分类的组合方法提供的精度比其他方法高0.87。ICP为每个示例输出置信度测量值,可以帮助用户评估新预测的可靠性。

更新日期:2021-05-05
down
wechat
bug