A computer-aided speech analytics approach for pronunciation feedback using deep feature clustering,Multimedia Systems

当前位置： X-MOL 学术 › Multimedia Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A computer-aided speech analytics approach for pronunciation feedback using deep feature clustering
Multimedia Systems ( IF 3.5 ) Pub Date : 2021-07-19 , DOI: 10.1007/s00530-021-00822-5
Faria Nazir ₁ , Muhammad Nadeem Majeed ₂ , Mustansar Ali Ghazanfar ₃ , Muazzam Maqsood ₄

Affiliation

Nowadays, the demand for language learning is increasing because people need to communicate with other people belonging to different regions for their business deals, study, etc. During language learning, a lot of pronunciation mistakes occur due to unfamiliarity with a new language and differences in accent. In this paper, we perform speech mistakes analysis using deep feature-based clustering. We proposed two novel methods for speech analysis, one to deal with phonemic errors (confusing phonemes) and the other to deal with the prosodic errors (partially changed pronunciation variation of phones). For accurate and efficient language learning, it is important to learn both phonemic as well as prosodic error corrections. In our first method, we perform speech analysis by combining deep CNN features and clustering algorithm to detect the phonemic errors. We classify the phonemes using K-nearest neighbor, Naïve Bayes, and support vector machine (SVM). We perform experiments on the six most frequently mispronounced confusing pairs of Arabic to handle phonemic errors and achieve an accuracy of 94%. In our second method, we proposed the unsupervised phone variation model (PVM) to detect prosodic errors. In PVM, each phone is extended to represent the different types of pronunciation variation of that phone with different proficiency levels. We use an Arabic dataset of 28 individual phones for speech analysis and provide feedback based on the variation of each phone and achieves an accuracy of 97%.

中文翻译：

一种使用深度特征聚类进行发音反馈的计算机辅助语音分析方法

如今，人们对语言学习的需求越来越大，因为人们需要与属于不同地区的其他人进行业务往来、学习等。在语言学习过程中，由于不熟悉一种新语言和语言差异，会出现很多发音错误。口音。在本文中，我们使用基于深度特征的聚类进行语音错误分析。我们提出了两种新的语音分析方法，一种是处理音位错误（混淆音素），另一种是处理韵律错误（部分改变了音素的发音变化）。为了准确有效地学习语言，学习音位和韵律纠错很重要。在我们的第一种方法中，我们通过结合深度 CNN 特征和聚类算法来进行语音分析，以检测音位错误。我们使用 K 近邻、朴素贝叶斯和支持向量机 (SVM) 对音素进行分类。我们对六个最常误读的令人困惑的阿拉伯语对进行实验，以处理音位错误并达到 94% 的准确率。在我们的第二种方法中，我们提出了无监督音素变化模型 (PVM) 来检测韵律错误。在 PVM 中，每个音素都被扩展为代表该音素不同熟练程度的不同类型的发音变化。我们使用包含 28 个单独音素的阿拉伯语数据集进行语音分析，并根据每个音素的变化提供反馈，准确率达到 97%。和支持向量机（SVM）。我们对六个最常误读的令人困惑的阿拉伯语对进行实验，以处理音位错误并达到 94% 的准确率。在我们的第二种方法中，我们提出了无监督音素变化模型 (PVM) 来检测韵律错误。在 PVM 中，每个音素都被扩展为代表该音素不同熟练程度的不同类型的发音变化。我们使用包含 28 个单独音素的阿拉伯语数据集进行语音分析，并根据每个音素的变化提供反馈，准确率达到 97%。和支持向量机（SVM）。我们对六个最常误读的阿拉伯语对进行实验，以处理音位错误并达到 94% 的准确率。在我们的第二种方法中，我们提出了无监督音素变化模型 (PVM) 来检测韵律错误。在 PVM 中，每个音素都被扩展为代表该音素不同熟练程度的不同类型的发音变化。我们使用包含 28 个单独音素的阿拉伯语数据集进行语音分析，并根据每个音素的变化提供反馈，准确率达到 97%。每个音素都被扩展为代表该音素在不同熟练程度下的不同类型的发音变化。我们使用包含 28 个单独音素的阿拉伯语数据集进行语音分析，并根据每个音素的变化提供反馈，准确率达到 97%。每个音素都被扩展为代表该音素在不同熟练程度下的不同类型的发音变化。我们使用包含 28 个单独音素的阿拉伯语数据集进行语音分析，并根据每个音素的变化提供反馈，准确率达到 97%。

更新日期：2021-07-19

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11