Mel-Frequency Cepstral Coefficient Features Based on Standard Deviation and Principal Component Analysis for Language Identification Systems

Albadr, Musatafa Abbas Abbood; Tiun, Sabrina; Ayob, Masri; Mohammed, Manal; AL-Dhief, Fahad Taha

doi:10.1007/s12559-021-09914-w

Mel-Frequency Cepstral Coefficient Features Based on Standard Deviation and Principal Component Analysis for Language Identification Systems

Published: 16 July 2021

Volume 13, pages 1136–1153, (2021)
Cite this article

Cognitive Computation Aims and scope Submit manuscript

Musatafa Abbas Abbood Albadr¹,
Sabrina Tiun¹,
Masri Ayob¹,
Manal Mohammed¹ &
…
Fahad Taha AL-Dhief²

554 Accesses
18 Citations
Explore all metrics

Abstract

Spoken language identification (LID) is the process of determining and classifying natural language from a given content and dataset. Data must be processed to extract useful features to perform LID. The mel-frequency cepstral coefficient (MFCC) is one of the most popular feature extraction techniques in LID. The MFCC features are generated to serve as inputs for the classification stage. In this study, reduction in the MFCC feature dimension is investigated because large data size affects the computational time and resources (i.e., memory space) and slows the identification speed. The implementation of data reduction techniques to retain the most important feature parameters is also evaluated in this study. The investigation of data reduction is based on standard deviation (STD) calculation and principal component analysis (PCA). The features based on MFCC and the reduced dimensions based on STD and PCA results are then used as inputs to an optimized extreme learning machine (ELM) classifier called the optimized genetic algorithm-ELM (OGA-ELM). Several sets of data samples with one dimension of principal components (i.e., 119) are utilized for the evaluation. The results are generated using two different datasets. The first dataset is derived from eight separate languages, whereas the second dataset is a part of the National Institute of Standards and Technology Language Recognition Evaluation 2009 dataset. To evaluate the performance of the proposed method, this study utilizes several assessment measures, namely, accuracy, recall, precision, F-measure, G-mean, and identification time. The best LID performance is observed when the MFCC based on STD and PCA features with 119 feature dimensions is used with OGA-ELM as the classifier. The experimental results show that the proposed MFCC method achieves 99.38% accuracy using the first dataset. Additionally, it achieves accuracies of up to 97.60%, 96.80%, and 91.20% using the second dataset with durations of 30, 10, and 3 s, respectively. The proposed MFCC method exhibits the fastest computational time in all experiments, requiring only a few seconds to identify languages. Using a data reduction technique can substantially speed up the computational time, overcome resource limitations, and improve LID performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Feature dimensionality reduction: a review

Article Open access 21 January 2022

Weikuan Jia, Meili Sun, … Sujuan Hou

Feature selection techniques for machine learning: a survey of more than two decades of research

Article 01 December 2023

Dipti Theng & Kishor K. Bhoyar

A review of unsupervised feature selection methods

Article 29 January 2019

Saúl Solorio-Fernández, J. Ariel Carrasco-Ochoa & José Fco. Martínez-Trinidad

References

Lee KA, et al. The 2015 NIST Language Recognition Evaluation: the Shared View of I2R, Fantastic4 and SingaMS. in Interspeech. 2016.
Garg A, Gupta V, Jindal M. A survey of language identification techniques and applications. J Emerg Technol Web Intell. 2014;6(4):388–400.
Google Scholar
Li J, et al. LSTM time and frequency recurrence for automatic speech recognition. in Automatic Speech Recognition and Understanding (ASRU), 2015 IEEE Workshop on. 2015. IEEE.
Hafen RP, Henry MJ. Speech information retrieval: a review. Multimedia Syst. 2012;18(6):499–518.
Article Google Scholar
Albadr MAA, et al. Spoken language identification based on the enhanced self-adjusting extreme learning machine approach. PloS one, 2018. 13(4): p. e0194770.
Ali A, et al. Big data for development: applications and techniques. Big Data Analytics. 2016;1(1):2.
Article Google Scholar
Al-Dhief FT, et al. A survey of voice pathology surveillance systems based on Internet of things and machine learning algorithms. IEEE Access. 2020;8:64514–33.
Article Google Scholar
Aleti A, et al. An efficient method for uncertainty propagation in robust software performance estimation. J Syst Softw. 2018;138:222–35.
Article Google Scholar
Salamon J, Bello JP. Deep convolutional neural networks and data augmentation for environmental sound classification. IEEE Signal Process Lett. 2017;24(3):279–83.
Article Google Scholar
Anusuya M, Katti S. Speech recognition by machine: a review. 2010.
Schutte KT. Parts-based models and local features for automatic speech recognition. 2009, Citeseer.
Deshwal D, Sangwan P, Kumar D. Feature extraction methods in language identification: a survey. Wireless Pers Commun. 2019;107(4):2071–103.
Article Google Scholar
Han W, et al. An efficient MFCC extraction method in speech recognition. in 2006 IEEE international symposium on circuits and systems. 2006. IEEE.
Renanti MD, Buono A, Kusuma WA. Infant cries identification by using codebook as feature matching, and MFCC as feature extraction. J Theor Appl Inf Technol. 2013;56(2):437–42.
Google Scholar
Trang H, Loc TH, Nam HBH. Proposed combination of PCA and MFCC feature extraction in speech recognition system. in 2014 International Conference on Advanced Technologies for Communications (ATC 2014). 2014. IEEE.
Ahmed AI, et al. Speaker recognition using PCA-based feature transformation. Speech Commun. 2019;110:33–46.
Article Google Scholar
Krishna SR, Rajeswara R, Vizianagaram V. SVM based emotion recognition using spectral features and PCA. Int J Pure Appl Math. 2017;114(9):227–35.
Google Scholar
Nirjhor S, Chowdhury MAR, Sabab M. Bangla speech recognition using 1D CNN and LSTM with different dimension reduction techniques. 2019, Brac University.
Saleh M, Ibrahim N, Ramli D. Data reduction on MFCC features based on kernel PCA for speaker verification system. WALIA journal. 2014;30(2):56–62.
Google Scholar
Winursito A, Hidayat R, Bejo A. Improvement of MFCC feature extraction accuracy using PCA in Indonesian speech recognition. in 2018 International Conference on Information and Communications Technology (ICOIACT). 2018. IEEE.
Albadr MA, et al. Genetic algorithm based on natural selection theory for optimization problems. Symmetry. 2020;12(11):1758.
Article Google Scholar
Huang GB, et al. Extreme learning machine for regression and multiclass classification. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 2011. 42(2): p. 513–529.
Huang GB, Chen L, Siew CK. Universal approximation using incremental constructive feedforward networks with random hidden nodes. IEEE Trans Neural Networks. 2006;17(4):879–92.
Article Google Scholar
Kaya H, Karpov AA. Efficient and effective strategies for cross-corpus acoustic emotion recognition. Neurocomputing. 2018;275:1028–34.
Article Google Scholar
Chauhan PM, Desai NP. Mel frequency cepstral coefficients (mfcc) based speaker identification in noisy environment using wiener filter. in Green Computing Communication and Electrical Engineering (ICGCCEE), 2014 International Conference on. 2014. IEEE.
Martinez J, et al. Speaker recognition using Mel frequency cepstral coefficients (MFCC) and vector quantization (VQ) techniques. in Electrical Communications and Computers (CONIELECOMP), 2012 22nd International Conference on. 2012. IEEE.
Mannepalli K, Sastry PN, Suman M. MFCC-GMM based accent recognition system for Telugu speech signals. Int J Speech Technol. 2016;19(1):87–93.
Article Google Scholar
Soorajkumar R, et al. Text-independent automatic accent identification system for Kannada language. in Proceedings of the International Conference on Data Engineering and Communication Technology. 2017. Springer.
Olvera MM, Sánchez A, Escobar LH. Web-based automatic language identification system. Int J Inf Electr Eng. 2016;6(5):304.
Google Scholar
Rajpal A, et al. Native language identification using spectral and source-based features. Interspeech. 2016;2016:2383–7.
Google Scholar
Sarmah K, Bhattacharjee U. GMM based language identification using MFCC and SDC features. Int J Comput Appl. 2014. 85(5).
Davis S, Mermelstein P. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans Acoust Speech Signal Process. 1980;28(4):357–66.
Article Google Scholar
Lee SM, et al. Improved MFCC feature extraction by PCA-optimized filter-bank for speech recognition. in IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU'01. 2001. IEEE.
Ganchev T, Fakotakis N, Kokkinakis G. Comparative evaluation of various MFCC implementations on the speaker verification task. in Proceedings of the SPECOM. 2005.
Lima A, et al. On the use of kernel PCA for feature extraction in speech recognition. IEICE Trans Inf Syst. 2004;87(12):2802–11.
Google Scholar
Hasan MR, Jamil M, Rahman M. Speaker identification using mel frequency cepstral coefficients. variations, 2004. 1(4).
Mishra P, Agrawal S. Recognition of voice using Mel cepstral coefficient & vector quantization. Int J Eng Res Appl. 2012;2(2):933–8.
Google Scholar
Kalamani M, Valarmathy S, Anitha S. Automatic speech recognition using ELM and KNN classifiers. Int J Innov Res Comp Commun Engr. 2015;3(4):3145–52.
Google Scholar
Albadr MAA, Tiun S. Spoken language identification based on particle swarm optimisation–extreme learning machine approach. Circuits, Systems, and Signal Processing, 2020: p. 1–27.
Huang GB. An insight into extreme learning machines: random neurons, random features and kernels. Cogn Comput. 2014;6(3):376–90.
Article Google Scholar
Albadra MAA, Tiuna S. Extreme learning machine: a review. Int J Appl Eng Res. 2017;12(14):4610–23.
Google Scholar
Huang G, et al. Trends in extreme learning machines: a review. Neural Netw. 2015;61:32–48.
Article Google Scholar
Huang GB, Zhu QY, Siew CK. Extreme learning machine: theory and applications. Neurocomputing. 2006;70(1–3):489–501.
Article Google Scholar
Solé-Casals J, et al. Improving a leaves automatic recognition process using PCA. in 2nd International Workshop on Practical Applications of Computational Biology and Bioinformatics (IWPACBB 2008). 2009. Springer.
Leitner C, Pernkopf F, Kubin G. Kernel PCA for speech enhancement. in Twelfth Annual Conference of the International Speech Communication Association. 2011.
Sokolova M, Japkowicz N, Szpakowicz S. Beyond accuracy, F-score and ROC: a family of discriminant measures for performance evaluation. in Australasian joint conference on artificial intelligence. 2006. Springer.
Tiun S. Experiments on Malay short text classification. in 2017 6th International Conference on Electrical Engineering and Informatics (ICEEI). 2017. IEEE.
Candil RZ. Exploiting temporal context in speech technologies using lstm recurrent neural networks. 2018, Universidad Autónoma de Madrid.
Gonzalez-Dominguez J, et al. Frame-by-frame language identification in short utterances using deep neural networks. Neural Netw. 2015;64:49–58.
Article Google Scholar
Lozano-Diez A, et al. An end-to-end approach to language identification in short utterances using convolutional neural networks. in Sixteenth Annual Conference of the International Speech Communication Association. 2015.
Nercessian S, Torres-Carrasquillo P, Martinez-Montes G. Approaches for language identification in mismatched environments. in 2016 IEEE Spoken Language Technology Workshop (SLT). 2016. IEEE.
Singh OP. Exploration of sparse representation techniques in language recognition. 2019.
Zazo R, et al. Language identification in short utterances using long short-term memory (LSTM) recurrent neural networks. PloS one, 2016. 11(1): p. e0146917.

Download references

Funding

This study was funded by the Universiti Kebangsaan Malaysia (Research code: GUP-2020–063).

Author information

Authors and Affiliations

Faculty of Information Science and Technology, CAIT, Universiti Kebangsaan Malaysia, Bangi, Selangor, Malaysia
Musatafa Abbas Abbood Albadr, Sabrina Tiun, Masri Ayob & Manal Mohammed
School of Electrical Engineering, Department of Communication Engineering, Universiti Teknologi Malaysia, UTM Johor Bahru, Johor, Malaysia
Fahad Taha AL-Dhief

Authors

Musatafa Abbas Abbood Albadr
View author publications
You can also search for this author in PubMed Google Scholar
Sabrina Tiun
View author publications
You can also search for this author in PubMed Google Scholar
Masri Ayob
View author publications
You can also search for this author in PubMed Google Scholar
Manal Mohammed
View author publications
You can also search for this author in PubMed Google Scholar
Fahad Taha AL-Dhief
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Musatafa Abbas Abbood Albadr.

Ethics declarations

Ethical Approval

This article does not contain any study involving human or animal test subjects.

Conflicts of Interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Albadr, M.A.A., Tiun, S., Ayob, M. et al. Mel-Frequency Cepstral Coefficient Features Based on Standard Deviation and Principal Component Analysis for Language Identification Systems. Cogn Comput 13, 1136–1153 (2021). https://doi.org/10.1007/s12559-021-09914-w

Download citation

Received: 18 July 2019
Accepted: 07 July 2021
Published: 16 July 2021
Issue Date: September 2021
DOI: https://doi.org/10.1007/s12559-021-09914-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mel-Frequency Cepstral Coefficient Features Based on Standard Deviation and Principal Component Analysis for Language Identification Systems

Abstract

Access this article

Similar content being viewed by others

Feature dimensionality reduction: a review

Feature selection techniques for machine learning: a survey of more than two decades of research

A review of unsupervised feature selection methods

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Ethical Approval

Conflicts of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Mel-Frequency Cepstral Coefficient Features Based on Standard Deviation and Principal Component Analysis for Language Identification Systems

Abstract

Access this article

Similar content being viewed by others

Feature dimensionality reduction: a review

Feature selection techniques for machine learning: a survey of more than two decades of research

A review of unsupervised feature selection methods

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Ethical Approval

Conflicts of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation