当前位置: X-MOL 学术Cognit. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Mel-Frequency Cepstral Coefficient Features Based on Standard Deviation and Principal Component Analysis for Language Identification Systems
Cognitive Computation ( IF 5.4 ) Pub Date : 2021-07-16 , DOI: 10.1007/s12559-021-09914-w
Musatafa Abbas Abbood Albadr 1 , Sabrina Tiun 1 , Masri Ayob 1 , Manal Mohammed 1 , Fahad Taha AL-Dhief 2
Affiliation  

Spoken language identification (LID) is the process of determining and classifying natural language from a given content and dataset. Data must be processed to extract useful features to perform LID. The mel-frequency cepstral coefficient (MFCC) is one of the most popular feature extraction techniques in LID. The MFCC features are generated to serve as inputs for the classification stage. In this study, reduction in the MFCC feature dimension is investigated because large data size affects the computational time and resources (i.e., memory space) and slows the identification speed. The implementation of data reduction techniques to retain the most important feature parameters is also evaluated in this study. The investigation of data reduction is based on standard deviation (STD) calculation and principal component analysis (PCA). The features based on MFCC and the reduced dimensions based on STD and PCA results are then used as inputs to an optimized extreme learning machine (ELM) classifier called the optimized genetic algorithm-ELM (OGA-ELM). Several sets of data samples with one dimension of principal components (i.e., 119) are utilized for the evaluation. The results are generated using two different datasets. The first dataset is derived from eight separate languages, whereas the second dataset is a part of the National Institute of Standards and Technology Language Recognition Evaluation 2009 dataset. To evaluate the performance of the proposed method, this study utilizes several assessment measures, namely, accuracy, recall, precision, F-measure, G-mean, and identification time. The best LID performance is observed when the MFCC based on STD and PCA features with 119 feature dimensions is used with OGA-ELM as the classifier. The experimental results show that the proposed MFCC method achieves 99.38% accuracy using the first dataset. Additionally, it achieves accuracies of up to 97.60%, 96.80%, and 91.20% using the second dataset with durations of 30, 10, and 3 s, respectively. The proposed MFCC method exhibits the fastest computational time in all experiments, requiring only a few seconds to identify languages. Using a data reduction technique can substantially speed up the computational time, overcome resource limitations, and improve LID performance.



中文翻译:

基于标准差和主成分分析的语言识别系统梅尔频率倒谱系数特征

口语识别 (LID) 是从给定的内容和数据集中确定和分类自然语言的过程。必须处理数据以提取有用的特征来执行 LID。梅尔频率倒谱系数 (MFCC) 是 LID 中最流行的特征提取技术之一。生成 MFCC 特征作为分类阶段的输入。在这项研究中,由于大数据量影响计算时间和资源(即内存空间)并减慢识别速度,因此研究了 MFCC 特征维数的减少。本研究还评估了数据缩减技术的实施,以保留最重要的特征参数。数据缩减的调查基于标准偏差 (STD) 计算和主成分分析 (PCA)。然后将基于 MFCC 的特征和基于 STD 和 PCA 结果的缩减维度用作优化的极限学习机 (ELM) 分类器的输入,该分类器称为优化遗传算法 ELM (OGA-ELM)。使用具有一维主成分(即119)的几组数据样本进行评估。结果是使用两个不同的数据集生成的。第一个数据集来自八种不同的语言,而第二个数据集是美国国家标准与技术研究所语言识别评估 2009 数据集的一部分。为了评估所提出方法的性能,本研究采用了几种评估措施,即准确率、召回率、精确率、F 措施、G 均值和识别时间。当基于具有 119 个特征维度的 STD 和 PCA 特征的 MFCC 与 OGA-ELM 作为分类器一起使用时,观察到最佳 LID 性能。实验结果表明,所提出的 MFCC 方法使用第一个数据集达到了 99.38% 的准确率。此外,使用持续时间分别为 30、10 和 3 秒的第二个数据集,它实现了高达 97.60%、96.80% 和 91.20% 的准确率。所提出的 MFCC 方法在所有实验中都表现出最快的计算时间,只需几秒钟即可识别语言。使用数据缩减技术可以显着加快计算时间、克服资源限制并提高 LID 性能。此外,使用持续时间分别为 30、10 和 3 秒的第二个数据集,它实现了高达 97.60%、96.80% 和 91.20% 的准确率。所提出的 MFCC 方法在所有实验中都表现出最快的计算时间,只需几秒钟即可识别语言。使用数据缩减技术可以显着加快计算时间、克服资源限制并提高 LID 性能。此外,使用持续时间分别为 30、10 和 3 秒的第二个数据集,它实现了高达 97.60%、96.80% 和 91.20% 的准确率。所提出的 MFCC 方法在所有实验中表现出最快的计算时间,只需几秒钟即可识别语言。使用数据缩减技术可以显着加快计算时间、克服资源限制并提高 LID 性能。

更新日期:2021-07-18
down
wechat
bug