当前位置: X-MOL 学术Symmetry › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Development of Speech Recognition Systems in Emergency Call Centers
Symmetry ( IF 2.940 ) Pub Date : 2021-04-09 , DOI: 10.3390/sym13040634
Alakbar Valizada , Natavan Akhundova , Samir Rustamov

In this paper, various methodologies of acoustic and language models, as well as labeling methods for automatic speech recognition for spoken dialogues in emergency call centers were investigated and comparatively analyzed. Because of the fact that dialogue speech in call centers has specific context and noisy, emotional environments, available speech recognition systems show poor performance. Therefore, in order to accurately recognize dialogue speeches, the main modules of speech recognition systems—language models and acoustic training methodologies—as well as symmetric data labeling approaches have been investigated and analyzed. To find an effective acoustic model for dialogue data, different types of Gaussian Mixture Model/Hidden Markov Model (GMM/HMM) and Deep Neural Network/Hidden Markov Model (DNN/HMM) methodologies were trained and compared. Additionally, effective language models for dialogue systems were defined based on extrinsic and intrinsic methods. Lastly, our suggested data labeling approaches with spelling correction are compared with common labeling methods resulting in outperforming the other methods with a notable percentage. Based on the results of the experiments, we determined that DNN/HMM for an acoustic model, trigram with Kneser–Ney discounting for a language model and using spelling correction before training data for a labeling method are effective configurations for dialogue speech recognition in emergency call centers. It should be noted that this research was conducted with two different types of datasets collected from emergency calls: the Dialogue dataset (27 h), which encapsulates call agents’ speech, and the Summary dataset (53 h), which contains voiced summaries of those dialogues describing emergency cases. Even though the speech taken from the emergency call center is in the Azerbaijani language, which belongs to the Turkic group of languages, our approaches are not tightly connected to specific language features. Hence, it is anticipated that suggested approaches can be applied to the other languages of the same group.

中文翻译:

紧急呼叫中心语音识别系统的开发

本文研究并比较了声学和语言模型的各种方法,以及紧急呼叫中心语音对话的自动语音识别标签方法。由于呼叫中心中的对话语音具有特定的上下文以及嘈杂的情感环境,因此可用的语音识别系统显示出较差的性能。因此,为了准确地识别对话语音,已经研究和分析了语音识别系统的主要模块-语言模型和声学训练方法-以及对称数据标记方法。要找到对话数据的有效声学模型,训练并比较了不同类型的高斯混合模型/隐马尔可夫模型(GMM / HMM)和深层神经网络/隐马尔可夫模型(DNN / HMM)方法。此外,基于外部和内在方法定义了有效的对话系统语言模型。最后,我们将建议的带有拼写校正的数据标记方法与常见的标记方法进行了比较,从而以明显的百分比胜过其他方法。根据实验结果,我们确定声学模型的DNN / HMM,语言模型的Kneser-Ney折现的Trigram以及在使用标签方法的训练数据之前使用拼写校正是紧急呼叫中对话语音识别的有效配置中心。应该注意的是,这项研究是使用两种不同类型的从紧急呼叫中收集的数据集进行的:对话数据集(27小时),它封装了呼叫代理的语音;摘要数据集(53小时),其中包含了那些话音的摘要描述紧急情况的对话。尽管紧急呼叫中心的语音是阿塞拜疆语,属于突厥语族,但我们的方法与特定语言功能并没有紧密联系。因此,可以预期,建议的方法可以应用于同一组的其他语言。尽管紧急呼叫中心的语音是阿塞拜疆语,属于突厥语族,但我们的方法与特定语言功能并没有紧密联系。因此,可以预期,建议的方法可以应用于同一组的其他语言。尽管紧急呼叫中心的语音是阿塞拜疆语,属于突厥语族,但我们的方法与特定语言功能并没有紧密联系。因此,可以预期,建议的方法可以应用于同一组的其他语言。
更新日期:2021-04-09
down
wechat
bug