Voice Keyword Retrieval Method Using Attention Mechanism and Multimodal Information Fusion,Scientific Programming

当前位置： X-MOL 学术 › Sci. Program. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Voice Keyword Retrieval Method Using Attention Mechanism and Multimodal Information Fusion
Scientific Programming ( IF 1.672 ) Pub Date : 2021-01-25 , DOI: 10.1155/2021/6662841
Hongli Zhang ₁

Affiliation

A cross-modal speech-text retrieval method using interactive learning convolution automatic encoder (CAE) is proposed. First, an interactive learning autoencoder structure is proposed, including two inputs of speech and text, as well as processing links such as encoding, hidden layer interaction, and decoding, to complete the modeling of cross-modal speech-text retrieval. Then, the original audio signal is preprocessed and the Mel frequency cepstrum coefficient (MFCC) feature is extracted. In addition, the word bag model is used to extract the text features, and then the attention mechanism is used to combine the text and speech features. Through interactive learning CAE, the shared features of speech and text modes are obtained and then sent to modal classifier to identify modal information, so as to realize cross-modal voice text retrieval. Finally, experiments show that the performance of the proposed algorithm is better than that of the contrast algorithm in terms of recall rate, accuracy rate, and false recognition rate.

中文翻译：

基于注意机制和多模态信息融合的语音关键词检索方法

提出了一种使用交互式学习卷积自动编码器（CAE）的跨模式语音文本检索方法。首先，提出了一种交互式学习自动编码器结构，包括语音和文本的两个输入以及处理链接（例如编码，隐藏层交互和解码），以完成跨模态语音文本检索的建模。然后，对原始音频信号进行预处理，并提取梅尔频率倒谱系数（MFCC）特征。另外，单词袋模型用于提取文本特征，然后注意机制用于组合文本和语音特征。通过交互式学习CAE，获得语音和文本模式的共享特征，然后发送给模态分类器以识别模态信息，从而实现跨模态语音文本的检索。

更新日期：2021-01-25

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>