当前位置: X-MOL 学术Expert Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Cascade convolutional neural network‐long short‐term memory recurrent neural networks for automatic tonal and nontonal preclassification‐based Indian language identification
Expert Systems ( IF 3.3 ) Pub Date : 2020-03-02 , DOI: 10.1111/exsy.12544
Chuya China Bhanja 1 , Mohammad A. Laskar 1 , Rabul H. Laskar 1
Affiliation  

This work presents an automatic tonal/nontonal preclassification‐based Indian language identification (LID) system. Languages are firstly classified into tonal and nontonal categories, and then, individual languages are identified from the languages of the respective categories. This work proposes the use of pitch Chroma and formant features for this task, and also investigates how Mel‐frequency Cepstral Coefficients (MFCCs) complement these features. It further explores block processing (BP), pitch synchronous analysis (PSA)‐ and glottal closure regions (GCRs)‐based approaches for feature extraction, using syllables as basic units. Cascade convolutional neural network (CNN)‐long short‐term memory (LSTM) model using syllable‐level features has been developed. National Institute of Technology Silchar language database (NITS‐LD) and OGI‐Multilingual Telephone Speech Corpus (OGI‐MLTS) have been used for experimental validation. The proposed system based on the score combination of Cascade CNN‐LSTM models of Chroma (extracted from BP method), first two formants and MFCCs (both extracted from GCR method) reports the highest accuracies. In the preclassification stage, the observed accuracies are 91%, 87.3%, and 85.1% for NITS‐LD, for 30 s, 10 s, and 3 s test data respectively. For OGI‐MLTS database, the respective accuracies are 86.7%, 83.1%, and 80.6%. That amounts to absolute improvements of 11.6%, 12.3%, and 13.9% for NITS‐LD, and 12.5%, 11.9%, and 12.6% for OGI‐MLTS database with respect to that of the baseline system. The proposed preclassification‐based LID system shows improvements of 7.3%, 6.4%, and 7.4% for NITS‐LD and 6.1%, 6.7%, and 7.2% for OGI‐MLTS database over the baseline system for the three respective test data conditions.

中文翻译:

级联卷积神经网络-长短期记忆递归神经网络,用于基于音调和非音调预分类的印度语言自动识别

这项工作提出了一种基于音调/非音调预分类的自动印度语言识别(LID)系统。语言首先被分为音调和非调音类别,然后从相应类别的语言中识别出各个语言。这项工作建议将音高色度和共振峰特征用于此任务,并研究梅尔频率倒谱系数(MFCC)如何补充这些特征。它进一步探讨了使用音节作为基本单位的基于块处理(BP),音高同步分析(PSA)和声门闭合区域(GCR)的特征提取方法。已经开发了使用音节级特征的级联卷积神经网络(CNN)-长短期记忆(LSTM)模型。美国国立技术学院Silchar语言数据库(NITS-LD)和OGI-多语言电话语音语料库(OGI-MLTS)已用于实验验证。基于色度的Cascade CNN-LSTM模型(从BP方法中提取),前两个共振峰和MFCC(均从GCR方法中提取)的评分组合提出的系统报告了最高的准确性。在预分类阶段,对于NITS-LD,在30 s,10 s和3 s的测试数据中观察到的准确度分别为91%,87.3%和85.1%。对于OGI-MLTS数据库,各自的准确度分别为86.7%,83.1%和80.6%。相对于基准系统,这意味着NITS-LD的绝对改善为11.6%,12.3%和13.9%,OGI-MLTS数据库的为12.5%,11.9%和12.6%。提议的基于预分类的LID系统显示出7.3%,6.4%,
更新日期:2020-03-02
down
wechat
bug