Abstract
Spoken language identification (LID) is the process of determining and classifying natural language from a given content and dataset. Data must be processed to extract useful features to perform LID. The mel-frequency cepstral coefficient (MFCC) is one of the most popular feature extraction techniques in LID. The MFCC features are generated to serve as inputs for the classification stage. In this study, reduction in the MFCC feature dimension is investigated because large data size affects the computational time and resources (i.e., memory space) and slows the identification speed. The implementation of data reduction techniques to retain the most important feature parameters is also evaluated in this study. The investigation of data reduction is based on standard deviation (STD) calculation and principal component analysis (PCA). The features based on MFCC and the reduced dimensions based on STD and PCA results are then used as inputs to an optimized extreme learning machine (ELM) classifier called the optimized genetic algorithm-ELM (OGA-ELM). Several sets of data samples with one dimension of principal components (i.e., 119) are utilized for the evaluation. The results are generated using two different datasets. The first dataset is derived from eight separate languages, whereas the second dataset is a part of the National Institute of Standards and Technology Language Recognition Evaluation 2009 dataset. To evaluate the performance of the proposed method, this study utilizes several assessment measures, namely, accuracy, recall, precision, F-measure, G-mean, and identification time. The best LID performance is observed when the MFCC based on STD and PCA features with 119 feature dimensions is used with OGA-ELM as the classifier. The experimental results show that the proposed MFCC method achieves 99.38% accuracy using the first dataset. Additionally, it achieves accuracies of up to 97.60%, 96.80%, and 91.20% using the second dataset with durations of 30, 10, and 3 s, respectively. The proposed MFCC method exhibits the fastest computational time in all experiments, requiring only a few seconds to identify languages. Using a data reduction technique can substantially speed up the computational time, overcome resource limitations, and improve LID performance.
Similar content being viewed by others
References
Lee KA, et al. The 2015 NIST Language Recognition Evaluation: the Shared View of I2R, Fantastic4 and SingaMS. in Interspeech. 2016.
Garg A, Gupta V, Jindal M. A survey of language identification techniques and applications. J Emerg Technol Web Intell. 2014;6(4):388–400.
Li J, et al. LSTM time and frequency recurrence for automatic speech recognition. in Automatic Speech Recognition and Understanding (ASRU), 2015 IEEE Workshop on. 2015. IEEE.
Hafen RP, Henry MJ. Speech information retrieval: a review. Multimedia Syst. 2012;18(6):499–518.
Albadr MAA, et al. Spoken language identification based on the enhanced self-adjusting extreme learning machine approach. PloS one, 2018. 13(4): p. e0194770.
Ali A, et al. Big data for development: applications and techniques. Big Data Analytics. 2016;1(1):2.
Al-Dhief FT, et al. A survey of voice pathology surveillance systems based on Internet of things and machine learning algorithms. IEEE Access. 2020;8:64514–33.
Aleti A, et al. An efficient method for uncertainty propagation in robust software performance estimation. J Syst Softw. 2018;138:222–35.
Salamon J, Bello JP. Deep convolutional neural networks and data augmentation for environmental sound classification. IEEE Signal Process Lett. 2017;24(3):279–83.
Anusuya M, Katti S. Speech recognition by machine: a review. 2010.
Schutte KT. Parts-based models and local features for automatic speech recognition. 2009, Citeseer.
Deshwal D, Sangwan P, Kumar D. Feature extraction methods in language identification: a survey. Wireless Pers Commun. 2019;107(4):2071–103.
Han W, et al. An efficient MFCC extraction method in speech recognition. in 2006 IEEE international symposium on circuits and systems. 2006. IEEE.
Renanti MD, Buono A, Kusuma WA. Infant cries identification by using codebook as feature matching, and MFCC as feature extraction. J Theor Appl Inf Technol. 2013;56(2):437–42.
Trang H, Loc TH, Nam HBH. Proposed combination of PCA and MFCC feature extraction in speech recognition system. in 2014 International Conference on Advanced Technologies for Communications (ATC 2014). 2014. IEEE.
Ahmed AI, et al. Speaker recognition using PCA-based feature transformation. Speech Commun. 2019;110:33–46.
Krishna SR, Rajeswara R, Vizianagaram V. SVM based emotion recognition using spectral features and PCA. Int J Pure Appl Math. 2017;114(9):227–35.
Nirjhor S, Chowdhury MAR, Sabab M. Bangla speech recognition using 1D CNN and LSTM with different dimension reduction techniques. 2019, Brac University.
Saleh M, Ibrahim N, Ramli D. Data reduction on MFCC features based on kernel PCA for speaker verification system. WALIA journal. 2014;30(2):56–62.
Winursito A, Hidayat R, Bejo A. Improvement of MFCC feature extraction accuracy using PCA in Indonesian speech recognition. in 2018 International Conference on Information and Communications Technology (ICOIACT). 2018. IEEE.
Albadr MA, et al. Genetic algorithm based on natural selection theory for optimization problems. Symmetry. 2020;12(11):1758.
Huang GB, et al. Extreme learning machine for regression and multiclass classification. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 2011. 42(2): p. 513–529.
Huang GB, Chen L, Siew CK. Universal approximation using incremental constructive feedforward networks with random hidden nodes. IEEE Trans Neural Networks. 2006;17(4):879–92.
Kaya H, Karpov AA. Efficient and effective strategies for cross-corpus acoustic emotion recognition. Neurocomputing. 2018;275:1028–34.
Chauhan PM, Desai NP. Mel frequency cepstral coefficients (mfcc) based speaker identification in noisy environment using wiener filter. in Green Computing Communication and Electrical Engineering (ICGCCEE), 2014 International Conference on. 2014. IEEE.
Martinez J, et al. Speaker recognition using Mel frequency cepstral coefficients (MFCC) and vector quantization (VQ) techniques. in Electrical Communications and Computers (CONIELECOMP), 2012 22nd International Conference on. 2012. IEEE.
Mannepalli K, Sastry PN, Suman M. MFCC-GMM based accent recognition system for Telugu speech signals. Int J Speech Technol. 2016;19(1):87–93.
Soorajkumar R, et al. Text-independent automatic accent identification system for Kannada language. in Proceedings of the International Conference on Data Engineering and Communication Technology. 2017. Springer.
Olvera MM, Sánchez A, Escobar LH. Web-based automatic language identification system. Int J Inf Electr Eng. 2016;6(5):304.
Rajpal A, et al. Native language identification using spectral and source-based features. Interspeech. 2016;2016:2383–7.
Sarmah K, Bhattacharjee U. GMM based language identification using MFCC and SDC features. Int J Comput Appl. 2014. 85(5).
Davis S, Mermelstein P. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans Acoust Speech Signal Process. 1980;28(4):357–66.
Lee SM, et al. Improved MFCC feature extraction by PCA-optimized filter-bank for speech recognition. in IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU'01. 2001. IEEE.
Ganchev T, Fakotakis N, Kokkinakis G. Comparative evaluation of various MFCC implementations on the speaker verification task. in Proceedings of the SPECOM. 2005.
Lima A, et al. On the use of kernel PCA for feature extraction in speech recognition. IEICE Trans Inf Syst. 2004;87(12):2802–11.
Hasan MR, Jamil M, Rahman M. Speaker identification using mel frequency cepstral coefficients. variations, 2004. 1(4).
Mishra P, Agrawal S. Recognition of voice using Mel cepstral coefficient & vector quantization. Int J Eng Res Appl. 2012;2(2):933–8.
Kalamani M, Valarmathy S, Anitha S. Automatic speech recognition using ELM and KNN classifiers. Int J Innov Res Comp Commun Engr. 2015;3(4):3145–52.
Albadr MAA, Tiun S. Spoken language identification based on particle swarm optimisation–extreme learning machine approach. Circuits, Systems, and Signal Processing, 2020: p. 1–27.
Huang GB. An insight into extreme learning machines: random neurons, random features and kernels. Cogn Comput. 2014;6(3):376–90.
Albadra MAA, Tiuna S. Extreme learning machine: a review. Int J Appl Eng Res. 2017;12(14):4610–23.
Huang G, et al. Trends in extreme learning machines: a review. Neural Netw. 2015;61:32–48.
Huang GB, Zhu QY, Siew CK. Extreme learning machine: theory and applications. Neurocomputing. 2006;70(1–3):489–501.
Solé-Casals J, et al. Improving a leaves automatic recognition process using PCA. in 2nd International Workshop on Practical Applications of Computational Biology and Bioinformatics (IWPACBB 2008). 2009. Springer.
Leitner C, Pernkopf F, Kubin G. Kernel PCA for speech enhancement. in Twelfth Annual Conference of the International Speech Communication Association. 2011.
Sokolova M, Japkowicz N, Szpakowicz S. Beyond accuracy, F-score and ROC: a family of discriminant measures for performance evaluation. in Australasian joint conference on artificial intelligence. 2006. Springer.
Tiun S. Experiments on Malay short text classification. in 2017 6th International Conference on Electrical Engineering and Informatics (ICEEI). 2017. IEEE.
Candil RZ. Exploiting temporal context in speech technologies using lstm recurrent neural networks. 2018, Universidad Autónoma de Madrid.
Gonzalez-Dominguez J, et al. Frame-by-frame language identification in short utterances using deep neural networks. Neural Netw. 2015;64:49–58.
Lozano-Diez A, et al. An end-to-end approach to language identification in short utterances using convolutional neural networks. in Sixteenth Annual Conference of the International Speech Communication Association. 2015.
Nercessian S, Torres-Carrasquillo P, Martinez-Montes G. Approaches for language identification in mismatched environments. in 2016 IEEE Spoken Language Technology Workshop (SLT). 2016. IEEE.
Singh OP. Exploration of sparse representation techniques in language recognition. 2019.
Zazo R, et al. Language identification in short utterances using long short-term memory (LSTM) recurrent neural networks. PloS one, 2016. 11(1): p. e0146917.
Funding
This study was funded by the Universiti Kebangsaan Malaysia (Research code: GUP-2020–063).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Ethical Approval
This article does not contain any study involving human or animal test subjects.
Conflicts of Interest
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Albadr, M.A.A., Tiun, S., Ayob, M. et al. Mel-Frequency Cepstral Coefficient Features Based on Standard Deviation and Principal Component Analysis for Language Identification Systems. Cogn Comput 13, 1136–1153 (2021). https://doi.org/10.1007/s12559-021-09914-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12559-021-09914-w