当前位置: X-MOL 学术Sādhanā › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Articulatory-feature-based methods for performance improvement of Multilingual Phone Recognition Systems using Indian languages
Sādhanā ( IF 1.6 ) Pub Date : 2020-07-30 , DOI: 10.1007/s12046-020-01428-9
K E Manjunath , Dinesh Babu Jayagopi , K Sreenivasa Rao , V Ramasubramanian

In this work, the performance of Multilingual Phone Recognition System (Multi-PRS) is improved using articulatory features (AFs). Four Indian languages – Kannada, Telugu, Bengali and Odia – are used for developing Multi-PRS. The transcription is derived using international phonetic alphabets (IPAs). Multi-PRS is trained using hidden Markov models and the state-of-the-art Deep Neural Networks (DNNs). AFs for five AF groups – place, manner, roundness, frontness and height – are predicted from Mel-frequency cepstral coefficients (MFCCs) using DNNs. The oracle AFs, which are derived from the ground truth IPA transcriptions, are used to set the best performance realizable by the predicted AFs. The performances of predicted and oracle AFs are compared. In addition to the AFs, the phone posteriors are explored to further boost the performance of Multi-PRS. Multi-task learning is explored to improve the prediction accuracy of AFs and thereby reduce the Phone Error Rates (PERs) of Multi-PRSs. Fusion of AFs is done using two approaches: i) lattice re-scoring approach and ii) AFs as tandem features. We show that oracle AFs by feature fusion with MFCCs offer a remarkably low target of PER of 10.4%, which is 24.7% absolute reduction compared with baseline Multi-PRS with MFCCs alone. The best performing system using predicted AFs has shown 3.2% reduction in absolute PER (9.1% reduction in relative PER) compared with baseline Multi-PRS. The best performance is obtained using the tandem approach for fusion of various AFs and phone posteriors.



中文翻译:

基于发音特征的使用印度语言的多语言电话识别系统性能改进方法

在这项工作中,使用发音功能(AF)改进了多语言电话识别系统(Multi-PRS)的性能。卡纳达语,泰卢固语,孟加拉语和奥迪亚语四种印度语言用于开发Multi-PRS。转录是使用国际语音字母导出的(IPA)。使用隐藏的马尔可夫模型和最新的深度神经网络(DNN)对Multi-PRS进行训练。使用DNN根据梅尔频率倒谱系数(MFCC)预测五个AF组的AF,即位置,方式,圆度,正面和高度。从基础真实IPA转录派生出的预言AFs用于设置可通过预测AF实现的最佳性能。比较了预测AF和预言AF的性能。除了自动对焦之外,还探索了电话后代,以进一步提高Multi-PRS的性能。探索多任务学习以提高自动对焦的预测准确性,从而降低多PRS的电话错误率(PER)。AF的融合使用两种方法完成:i)点阵重新评分方法和ii)AF作为串联特征。我们显示,通过与MFCC进行特征融合的Oracle AF提供的PER目标非常低,为10.4%,与仅使用MFCC的基线Multi-PRS相比,绝对降低了24.7%。与基线Multi-PRS相比,使用预测AF的最佳系统显示出绝对PER降低了3.2%(相对PER降低了9.1%)。使用串联方法融合各种自动对焦和电话后代可获得最佳性能。

更新日期:2020-07-30
down
wechat
bug