Analysis of the sensitivity of the End-Of-Turn Detection task to errors generated by the Automatic Speech Recognition process,Engineering Applications of Artificial Intelligence

当前位置： X-MOL 学术 › Eng. Appl. Artif. Intell. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Analysis of the sensitivity of the End-Of-Turn Detection task to errors generated by the Automatic Speech Recognition process
Engineering Applications of Artificial Intelligence ( IF 8 ) Pub Date : 2021-02-15 , DOI: 10.1016/j.engappai.2021.104189
César Montenegro , Roberto Santana , Jose A. Lozano

An End-Of-Turn Detection Module (EOTD-M) is an essential component of automatic Spoken Dialogue Systems. The capability of correctly detecting whether a user’s utterance has ended or not improves the accuracy in interpreting the meaning of the message and decreases the latency in the answer. Usually, in dialogue systems, an EOTD-M is coupled with an Automatic Speech Recognition Module (ASR-M) to transmit complete utterances to the Natural Language Understanding unit. Mistakes in the ASR-M transcription can have a strong effect on the performance of the EOTD-M. The actual extent of this effect depends on the particular combination of ASR-M transcription errors and the sentence featurization techniques implemented as part of the EOTD-M. In this paper we investigate this important relationship for an EOTD-M based on semantic information and particular characteristics of the speakers (speech profiles). We introduce an Automatic Speech Recognition Simulator (ASR-SIM) that models different types of semantic mistakes in the ASR-M transcription as well as different speech profiles. We use the simulator to evaluate the sensitivity to ASR-M mistakes of a Long Short-Term Memory network classifier trained in EOTD with different featurization techniques. Our experiments reveal the different ways in which the performance of the model is influenced by the ASR-M errors. We corroborate that not only is the ASR-SIM useful to estimate the performance of an EOTD-M in customized noisy scenarios, but it can also be used to generate training datasets with the expected error rates of real working conditions, which leads to better performance.

中文翻译：

分析转向结束检测任务对自动语音识别过程产生的错误的敏感性

转弯结束检测模块（EOTD-M）是自动语音对话系统的基本组件。正确检测用户话语是否已经结束的能力提高了解释消息含义的准确性，并减少了答案的等待时间。通常，在对话系统中，将EOTD-M与自动语音识别模块（ASR-M）耦合，以将完整的语音内容发送给自然语言理解单元。ASR-M转录中的错误可能会对EOTD-M的性能产生重大影响。这种影响的实际程度取决于ASR-M转录错误的特定组合以及作为EOTD-M的一部分实现的句子特征化技术。在本文中，我们基于语义信息和说话者的特定特征（语音配置文件）来研究EOTD-M的这一重要关系。我们介绍了一种自动语音识别模拟器（ASR-SIM），它可以对ASR-M转录中的不同类型的语义错误以及不同的语音配置文件进行建模。我们使用模拟器评估在EOTD中使用不同特征化技术训练的Long Short-Term Memory网络分类器对ASR-M错误的敏感性。我们的实验揭示了ASR-M错误影响模型性能的不同方式。我们证实，ASR-SIM不仅可用于估计EOTD-M在定制噪声场景中的性能，

更新日期：2021-02-15

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>