A study of continuous space word and sentence representations applied to ASR error detection,Speech Communication

当前位置： X-MOL 学术 › Speech Commun. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A study of continuous space word and sentence representations applied to ASR error detection
Speech Communication ( IF 2.4 ) Pub Date : 2020-03-07 , DOI: 10.1016/j.specom.2020.03.002
Sahar Ghannay , Yannick Estève , Nathalie Camelin

This paper presents a study of continuous word representations applied to automatic detection of speech recognition errors. A neural network architecture is proposed, which is well suited to handle continuous word representations, like word embeddings. We explore the use of several types of word representations: simple and combined linguistic embeddings, and acoustic ones associated to prosodic features, extracted from the audio signal. To compensate certain phenomena highlighted by the analysis of the error average span, we propose to model the errors at the sentence level through the use of sentence embeddings. An approach to build continuous sentence representations dedicated to ASR error detection is also proposed and compared to the Doc2vec approach. Experiments are performed on automatic transcriptions generated by the LIUM ASR system applied to the French ETAPE corpus. They show that the combination of linguistic embeddings, acoustic embeddings, prosodic features, and sentence embeddings in addition to more classical features yields very competitive results. Particularly, these results show the complementarity of acoustic embeddings and prosodic information, and show that the proposed sentence embeddings dedicated to ASR error detection achieve better results than generic sentence embeddings.

中文翻译：

连续空间单词和句子表示法在ASR错误检测中的应用研究

本文提出了一种用于自动检测语音识别错误的连续单词表示形式的研究。提出了一种神经网络体系结构，它非常适合处理连续的词表示形式，例如词嵌入。我们探索了几种类型的单词表示形式的使用：简单和组合的语言嵌入，以及从音频信号中提取的与韵律特征相关的声学形式。为了补偿通过错误平均跨度分析突出显示的某些现象，我们建议通过使用句子嵌入在句子级别对错误建模。还提出了一种构建专用于ASR错误检测的连续语句表示的方法，并将其与Doc2vec方法进行了比较。实验是对应用到法国ETAPE语料库的LIUM ASR系统生成的自动转录进行的。他们表明，语言嵌入，声学嵌入，韵律特征和句子嵌入以及更多经典特征的组合产生了非常有竞争力的结果。尤其是，这些结果表明了声学嵌入和韵律信息的互补性，并且表明，针对ASR错误检测的拟议句子嵌入比通用句子嵌入具有更好的结果。

更新日期：2020-03-07

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11