Skip to main content

Advertisement

Log in

Mel-Frequency Cepstral Coefficient Features Based on Standard Deviation and Principal Component Analysis for Language Identification Systems

  • Published:
Cognitive Computation Aims and scope Submit manuscript

Abstract

Spoken language identification (LID) is the process of determining and classifying natural language from a given content and dataset. Data must be processed to extract useful features to perform LID. The mel-frequency cepstral coefficient (MFCC) is one of the most popular feature extraction techniques in LID. The MFCC features are generated to serve as inputs for the classification stage. In this study, reduction in the MFCC feature dimension is investigated because large data size affects the computational time and resources (i.e., memory space) and slows the identification speed. The implementation of data reduction techniques to retain the most important feature parameters is also evaluated in this study. The investigation of data reduction is based on standard deviation (STD) calculation and principal component analysis (PCA). The features based on MFCC and the reduced dimensions based on STD and PCA results are then used as inputs to an optimized extreme learning machine (ELM) classifier called the optimized genetic algorithm-ELM (OGA-ELM). Several sets of data samples with one dimension of principal components (i.e., 119) are utilized for the evaluation. The results are generated using two different datasets. The first dataset is derived from eight separate languages, whereas the second dataset is a part of the National Institute of Standards and Technology Language Recognition Evaluation 2009 dataset. To evaluate the performance of the proposed method, this study utilizes several assessment measures, namely, accuracy, recall, precision, F-measure, G-mean, and identification time. The best LID performance is observed when the MFCC based on STD and PCA features with 119 feature dimensions is used with OGA-ELM as the classifier. The experimental results show that the proposed MFCC method achieves 99.38% accuracy using the first dataset. Additionally, it achieves accuracies of up to 97.60%, 96.80%, and 91.20% using the second dataset with durations of 30, 10, and 3 s, respectively. The proposed MFCC method exhibits the fastest computational time in all experiments, requiring only a few seconds to identify languages. Using a data reduction technique can substantially speed up the computational time, overcome resource limitations, and improve LID performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Lee KA, et al. The 2015 NIST Language Recognition Evaluation: the Shared View of I2R, Fantastic4 and SingaMS. in Interspeech. 2016.

  2. Garg A, Gupta V, Jindal M. A survey of language identification techniques and applications. J Emerg Technol Web Intell. 2014;6(4):388–400.

    Google Scholar 

  3. Li J, et al. LSTM time and frequency recurrence for automatic speech recognition. in Automatic Speech Recognition and Understanding (ASRU), 2015 IEEE Workshop on. 2015. IEEE.

  4. Hafen RP, Henry MJ. Speech information retrieval: a review. Multimedia Syst. 2012;18(6):499–518.

    Article  Google Scholar 

  5. Albadr MAA, et al. Spoken language identification based on the enhanced self-adjusting extreme learning machine approach. PloS one, 2018. 13(4): p. e0194770.

  6. Ali A, et al. Big data for development: applications and techniques. Big Data Analytics. 2016;1(1):2.

    Article  Google Scholar 

  7. Al-Dhief FT, et al. A survey of voice pathology surveillance systems based on Internet of things and machine learning algorithms. IEEE Access. 2020;8:64514–33.

    Article  Google Scholar 

  8. Aleti A, et al. An efficient method for uncertainty propagation in robust software performance estimation. J Syst Softw. 2018;138:222–35.

    Article  Google Scholar 

  9. Salamon J, Bello JP. Deep convolutional neural networks and data augmentation for environmental sound classification. IEEE Signal Process Lett. 2017;24(3):279–83.

    Article  Google Scholar 

  10. Anusuya M, Katti S. Speech recognition by machine: a review. 2010.

  11. Schutte KT. Parts-based models and local features for automatic speech recognition. 2009, Citeseer.

  12. Deshwal D, Sangwan P, Kumar D. Feature extraction methods in language identification: a survey. Wireless Pers Commun. 2019;107(4):2071–103.

    Article  Google Scholar 

  13. Han W, et al. An efficient MFCC extraction method in speech recognition. in 2006 IEEE international symposium on circuits and systems. 2006. IEEE.

  14. Renanti MD, Buono A, Kusuma WA. Infant cries identification by using codebook as feature matching, and MFCC as feature extraction. J Theor Appl Inf Technol. 2013;56(2):437–42.

    Google Scholar 

  15. Trang H, Loc TH, Nam HBH. Proposed combination of PCA and MFCC feature extraction in speech recognition system. in 2014 International Conference on Advanced Technologies for Communications (ATC 2014). 2014. IEEE.

  16. Ahmed AI, et al. Speaker recognition using PCA-based feature transformation. Speech Commun. 2019;110:33–46.

    Article  Google Scholar 

  17. Krishna SR, Rajeswara R, Vizianagaram V. SVM based emotion recognition using spectral features and PCA. Int J Pure Appl Math. 2017;114(9):227–35.

    Google Scholar 

  18. Nirjhor S, Chowdhury MAR, Sabab M. Bangla speech recognition using 1D CNN and LSTM with different dimension reduction techniques. 2019, Brac University.

  19. Saleh M, Ibrahim N, Ramli D. Data reduction on MFCC features based on kernel PCA for speaker verification system. WALIA journal. 2014;30(2):56–62.

    Google Scholar 

  20. Winursito A, Hidayat R, Bejo A. Improvement of MFCC feature extraction accuracy using PCA in Indonesian speech recognition. in 2018 International Conference on Information and Communications Technology (ICOIACT). 2018. IEEE.

  21. Albadr MA, et al. Genetic algorithm based on natural selection theory for optimization problems. Symmetry. 2020;12(11):1758.

    Article  Google Scholar 

  22. Huang GB, et al. Extreme learning machine for regression and multiclass classification. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 2011. 42(2): p. 513–529.

  23. Huang GB, Chen L, Siew CK. Universal approximation using incremental constructive feedforward networks with random hidden nodes. IEEE Trans Neural Networks. 2006;17(4):879–92.

    Article  Google Scholar 

  24. Kaya H, Karpov AA. Efficient and effective strategies for cross-corpus acoustic emotion recognition. Neurocomputing. 2018;275:1028–34.

    Article  Google Scholar 

  25. Chauhan PM, Desai NP. Mel frequency cepstral coefficients (mfcc) based speaker identification in noisy environment using wiener filter. in Green Computing Communication and Electrical Engineering (ICGCCEE), 2014 International Conference on. 2014. IEEE.

  26. Martinez J, et al. Speaker recognition using Mel frequency cepstral coefficients (MFCC) and vector quantization (VQ) techniques. in Electrical Communications and Computers (CONIELECOMP), 2012 22nd International Conference on. 2012. IEEE.

  27. Mannepalli K, Sastry PN, Suman M. MFCC-GMM based accent recognition system for Telugu speech signals. Int J Speech Technol. 2016;19(1):87–93.

    Article  Google Scholar 

  28. Soorajkumar R, et al. Text-independent automatic accent identification system for Kannada language. in Proceedings of the International Conference on Data Engineering and Communication Technology. 2017. Springer.

  29. Olvera MM, Sánchez A, Escobar LH. Web-based automatic language identification system. Int J Inf Electr Eng. 2016;6(5):304.

    Google Scholar 

  30. Rajpal A, et al. Native language identification using spectral and source-based features. Interspeech. 2016;2016:2383–7.

    Google Scholar 

  31. Sarmah K, Bhattacharjee U. GMM based language identification using MFCC and SDC features. Int J Comput Appl. 2014. 85(5).

  32. Davis S, Mermelstein P. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans Acoust Speech Signal Process. 1980;28(4):357–66.

    Article  Google Scholar 

  33. Lee SM, et al. Improved MFCC feature extraction by PCA-optimized filter-bank for speech recognition. in IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU'01. 2001. IEEE.

  34. Ganchev T, Fakotakis N, Kokkinakis G. Comparative evaluation of various MFCC implementations on the speaker verification task. in Proceedings of the SPECOM. 2005.

  35. Lima A, et al. On the use of kernel PCA for feature extraction in speech recognition. IEICE Trans Inf Syst. 2004;87(12):2802–11.

    Google Scholar 

  36. Hasan MR, Jamil M, Rahman M. Speaker identification using mel frequency cepstral coefficients. variations, 2004. 1(4).

  37. Mishra P, Agrawal S. Recognition of voice using Mel cepstral coefficient & vector quantization. Int J Eng Res Appl. 2012;2(2):933–8.

    Google Scholar 

  38. Kalamani M, Valarmathy S, Anitha S. Automatic speech recognition using ELM and KNN classifiers. Int J Innov Res Comp Commun Engr. 2015;3(4):3145–52.

    Google Scholar 

  39. Albadr MAA, Tiun S. Spoken language identification based on particle swarm optimisation–extreme learning machine approach. Circuits, Systems, and Signal Processing, 2020: p. 1–27.

  40. Huang GB. An insight into extreme learning machines: random neurons, random features and kernels. Cogn Comput. 2014;6(3):376–90.

    Article  Google Scholar 

  41. Albadra MAA, Tiuna S. Extreme learning machine: a review. Int J Appl Eng Res. 2017;12(14):4610–23.

    Google Scholar 

  42. Huang G, et al. Trends in extreme learning machines: a review. Neural Netw. 2015;61:32–48.

    Article  Google Scholar 

  43. Huang GB, Zhu QY, Siew CK. Extreme learning machine: theory and applications. Neurocomputing. 2006;70(1–3):489–501.

    Article  Google Scholar 

  44. Solé-Casals J, et al. Improving a leaves automatic recognition process using PCA. in 2nd International Workshop on Practical Applications of Computational Biology and Bioinformatics (IWPACBB 2008). 2009. Springer.

  45. Leitner C, Pernkopf F, Kubin G. Kernel PCA for speech enhancement. in Twelfth Annual Conference of the International Speech Communication Association. 2011.

  46. Sokolova M, Japkowicz N, Szpakowicz S. Beyond accuracy, F-score and ROC: a family of discriminant measures for performance evaluation. in Australasian joint conference on artificial intelligence. 2006. Springer.

  47. Tiun S. Experiments on Malay short text classification. in 2017 6th International Conference on Electrical Engineering and Informatics (ICEEI). 2017. IEEE.

  48. Candil RZ. Exploiting temporal context in speech technologies using lstm recurrent neural networks. 2018, Universidad Autónoma de Madrid.

  49. Gonzalez-Dominguez J, et al. Frame-by-frame language identification in short utterances using deep neural networks. Neural Netw. 2015;64:49–58.

    Article  Google Scholar 

  50. Lozano-Diez A, et al. An end-to-end approach to language identification in short utterances using convolutional neural networks. in Sixteenth Annual Conference of the International Speech Communication Association. 2015.

  51. Nercessian S, Torres-Carrasquillo P, Martinez-Montes G. Approaches for language identification in mismatched environments. in 2016 IEEE Spoken Language Technology Workshop (SLT). 2016. IEEE.

  52. Singh OP. Exploration of sparse representation techniques in language recognition. 2019.

  53. Zazo R, et al. Language identification in short utterances using long short-term memory (LSTM) recurrent neural networks. PloS one, 2016. 11(1): p. e0146917.

Download references

Funding

This study was funded by the Universiti Kebangsaan Malaysia (Research code: GUP-2020–063).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Musatafa Abbas Abbood Albadr.

Ethics declarations

Ethical Approval

This article does not contain any study involving human or animal test subjects.

Conflicts of Interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Albadr, M.A.A., Tiun, S., Ayob, M. et al. Mel-Frequency Cepstral Coefficient Features Based on Standard Deviation and Principal Component Analysis for Language Identification Systems. Cogn Comput 13, 1136–1153 (2021). https://doi.org/10.1007/s12559-021-09914-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12559-021-09914-w

Keywords

Navigation