Automatic dialect identification system for Kannada language using single and ensemble SVM algorithms

Chittaragi, Nagaratna B.; Koolagudi, Shashidhar G.

doi:10.1007/s10579-019-09481-5

Automatic dialect identification system for Kannada language using single and ensemble SVM algorithms

Original Paper
Published: 21 November 2019

Volume 54, pages 553–585, (2020)
Cite this article

Language Resources and Evaluation Aims and scope Submit manuscript

605 Accesses
13 Citations
3 Altmetric
Explore all metrics

Abstract

In this paper, an automatic dialect identification (ADI) system is proposed by extracting spectral and prosodic features for Kannada language. A new dialect dataset is collected from native speakers of Kannada language (A Dravidian language). This dataset includes five distinct dialects of Kannada language representing five geographical regions of Karnataka state. Investigation of the significance of spectral and prosodic variations on five Kannada dialects is carried out. Mel-frequency cepstral coefficients (MFCCs), spectral flux, and entropy are used as representatives of spectral features. Besides, pitch and energy features are extracted as representatives of prosodic parameters for identification of dialects. These raw feature vectors are further processed to get a new derived feature vectors by using statistical processing. In this paper, a single classifier based multi-class support vector machine (SVM) and multiple classifier based ensemble SVM (ESVM) techniques are employed for classification of dialects. The effectiveness and performance evaluation of the explored features are carried out on newly collected Kannada speech corpus, with five Kannada dialects and internationally known standard Intonation Variation in English (IViE) dataset with nine British English dialects. Experimental results have demonstrated that the derived feature vectors performs better when compared to raw feature vectors. However, ESVM technique has demonstrated better performance over a single SVM. Spectral and prosodic features have resulted individually with the dialect recognition performance of 83.12% and 44.52% respectively. Further, the complementary nature of both spectral and prosodic features is evaluated by combining both feature vectors for dialect recognition. However, an increase in dialect recognition performance of about 86.25% is observed. This indicates the existence of complementary dialect specific evidence with spectral and prosodic features. The experiments conducted on standard IViE corpus have shown a higher recognition rate of 91.38% using ESVM. Proposed ADI systems with derived features have shown better performance over the state-of-the-art i-vector feature based systems on both datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic speech recognition: a survey

Article 10 November 2020

Survey on SVM and their application in image classification

Article 11 January 2018

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

Article Open access 03 January 2024

Notes

In this thesis, the classification, identification, and recognition words are used interchangeably conveying similar meaning with a standard machine learning goal.
A UBM is a large GMM trained to represent the speaker-independent distribution of features

References

Ahuja, P., & Vyas, J. M. (2018). Forensic speaker profiling: The study of supra-segmental features of Gujarati dialects for text-independent speaker identification. Australian Journal of Forensic Sciences, 50(2), 152–165.
Google Scholar
Arslan, L. M., & Hansen, J. H. L. (1996). Language accent classification in American English. Speech Communication, 18(4), 353–367.
Google Scholar
Bahari, M. H., Dehak, N., Van hamme, H., Burget, L., Ali, A. M., & Glass, J. (2014). Non-negative factor analysis of Gaussian mixture model weight adaptation for language and dialect recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(7), 1117–1129.
Google Scholar
Behravan, H., Hautamäki, V., & Kinnunen, T. (2015). Factors affecting i-vector based foreign accent recognition: A case study in spoken Finnish. Speech Communication, 66, 118–129.
Google Scholar
Biadsy, F. (2011). Automatic dialect and accent recognition and its application to speech recognition (PhD Thesis, Columbia University).
Biadsy, F., & Hirschberg, J. (2009). Using prosody and phonotactics in Arabic dialect identification. Interspeech, 9, 208–211.
Google Scholar
Biadsy, F., Hirschberg, J., & Ellis, D. P. W. (2011) Dialect and accent recognition using phonetic-segmentation supervectors. In Interspeech (pp. 745–748).
Bougrine, S., Cherroun, H., & Ziadi, D. (2017). Hierarchical classification for spoken Arabic dialect identification using prosody: Case of algerian dialects. arXiv preprint arXiv:1703.10065.
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.
Google Scholar
Campbell, W. M., Campbell, J. P., Reynolds, D. A., Singer, E., & Torres-Carrasquillo, Pedro A. (2006). Support vector machines for speaker and language recognition. Computer Speech & Language, 20(2), 210–229.
Google Scholar
Canavan, A., & Zipperlen, G. (1996). Callfriend American English-non-southern dialect. Linguistic Data Consortium, Philadelphia, 10, 1.
Google Scholar
Chambers, J. K., & Trudgill, P. (1998). Dialectology (2nd ed.). Cambridge: Cambridge University Press.
Google Scholar
Chandrasekaran, K. (2012). Indeterminacies in Howatch’s St. Benet’s Trilogy. Language in India, 12(12), 382–389.
Google Scholar
Chang, C. C., & Lin, C. J. (2011). LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST), 2(3), 27.
Google Scholar
Chen, N. F., Shen, W., & Campbell, J. P. (2010). A linguistically-informative approach to dialect recognition using dialect-discriminating context-dependent phonetic models. In IEEE international conference on acoustics speech and signal processing (ICASSP) (pp. 5014–5017). IEEE.
Chen, N. F., Tam, S. W., Shen, W., & Campbell, J. P. (2014). Characterizing phonetic transformations and acoustic differences across English dialects. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(1), 110–124.
Google Scholar
Chen, T., Huang, C., Chang, E., & Wang, J. (2001). Automatic accent identification using Gaussian mixture models (pp. 343–346). IEEE workshop: In automatic speech recognition and understanding.
Chittaragi, N. B, Koolagudi, S. G. (2017). Acoustic features based word level dialect classification using SVM and ensemble methods. In Tenth international conference on contemporary computing (IC3) (pp. 1–6). IEEE.
Chittaragi, N. B., Koolagudi, S. G. (2018). Sentence based dialect identification system using extreme gradient boosting algorithm. In Sixth international conference on advanced computing, networking, and informatics [ICACNI-2018] (pp. 1–6). Berlin: Springer.
Chittaragi, N. B., Prakash, A., & Koolagudi, S. G. (2018). Dialect identification using spectral and prosodic features on single and ensemble classifiers. Arabian Journal for Science and Engineering, 43(3), 4289–4302.
Google Scholar
Clopper, C. G., & Pisoni, D. B. (2006). The nationwide speech project: A new corpus of American English dialects. Speech Communication, 48(6), 633–644.
Google Scholar
Clopper, C. G., & Smiljanic, R. (2011). Effects of gender and regional dialect on prosodic patterns in American English. Journal of Phonetics, 39(2), 237–245.
Google Scholar
D’ Arcy, S., Russell, M. J., Browning, S. R , Tomlinson, M. J. (2004). The accents of the British Isles (ABI) corpus. In Proceedings Modélisations pour l’Identification des Langues (pp. 115–119).
Darwish, K., Sajjad, H., & Mubarak, H. (2014). Verifiably effective arabic dialect identification. In Empirical methods in natural language processing (pp. 1465–1468).
Dehak, N., Torres-Carrasquillo, P. A., Reynolds, D. A., & Dehak, R. (2011). Language recognition via i-vectors and dimensionality reduction. In Interspeech (pp. 857–860).
Dietterich, T. G. (2000a). Ensemble methods in machine learning. In International workshop on multiple classifier systems (pp 1–15). Berlin: Springer.
Dietterich, T. G. (2000b). An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Machine Learning, 40(2), 139–157.
Google Scholar
Etman, A., & Beex, A. L. (2015). Language and dialect identification: A survey. In SAI intelligent systems conference (IntelliSys), (pp. 220–231).
Ferragne, E., & Pellegrino, F. (2007). Automatic dialect identification: A study of British English. In Speaker classification II (pp. 243–257). Berlin: Springer.
Google Scholar
Giannakopoulos, T., & Pikrakis, A. (2014). Introduction to audio analysis: A MATLAB approach. Cambridge: Academic Press.
Google Scholar
Grabe, E., & Post, B. (2002). Intonational variation in the British Isles. In Speech prosody.
Hanani, A., Russell, M. J., & Carey, M. J. (2013). Human and computer recognition of regional accents and ethnic groups from British English speech. Computer Speech & Language, 27(1), 59–74.
Google Scholar
Hansen, J. H. L., & Liu, G. (2016). Unsupervised accent classification for deep data fusion of accent and language information. Speech Communication, 78, 19–33.
Google Scholar
Harris, M. J., Gries, S. T., & Miglio, V. G. (2014). Prosody and its application to forensic linguistics. LESLI: Linguistic Evidence in Security Law and Intelligence, 2(2), 11–29.
Google Scholar
Hermansky, H., & Morgan, N. (1994). Rasta processing of speech. IEEE Transactions on Speech and Audio Processing, 2(4), 578–589.
Google Scholar
Huang, R., & Hansen, J. H. L. (2007). Unsupervised discriminative training with application to dialect classification. IEEE transactions on Audio, Speech, and Language processing, 15(8), 2444–2453.
Google Scholar
Huang, R., Hansen, J. H. L., & Angkititrakul, P. (2007). Dialect/accent classification using unrestricted audio. IEEE Transactions on Audio, Speech, and Language Processing, 15(2), 453–464.
Google Scholar
Jain, D., & Cardona, G. (2007). The Indo-Aryan languages. Abingdon: Routledge.
Google Scholar
Jiao, Y., Tu, M., Berisha, V., & Liss, J. M. (2016). Accent identification by combining deep neural networks and recurrent neural networks trained on long and short term features. In Interspeech (pp. 2388–2392).
Kim, H. Chul, P., Shaoning, J., Hong M., Kim, D. & Bang, S. Y. (2002). Support vector machine ensemble with bagging. In First international workshop on pattern recognition with support vector machines (pp. 397–408).
Lei, Y., & Hansen, J. H. L. (2011). Dialect classification via text-independent training and testing for Arabic, Spanish, and Chinese. IEEE Transactions on Audio, Speech, and Language Processing, 19(1), 85–96.
Google Scholar
Lim, B. P., Li, H., & Ma, B. (2005). Using local & global phonotactic features in Chinese dialect identification. In International conference on acoustics, speech, and signal processing (ICASSP) (Vol. 1, pp. I–577). IEEE
Liu, G. A., & Hansen, J. H. L. (2011). A systematic strategy for robust automatic dialect identification. In Nineteenth European signal processing conference (pp. 2138–2141).
Liu, G., Lei, Y., & Hansen, J. H. L. (2010). Dialect identification: Impact of differences between read versus spontaneous speech. In Eighteenth European signal processing Conference (pp 2003–2006). IEEE.
Malmasi, S., & Dras, M. (2015). Language identification using classifier ensembles. In Proceedings of the joint workshop on language technology for closely related languages, varieties and dialects, (pp. 35–43).
Mannepalli, K., Sastry, P. N., & Suman, M. (2016). MFCC-GMM based accent recognition system for Telugu speech signals. International Journal of Speech Technology, 19(1), 87–93.
Google Scholar
Mary, L., & Yegnanarayana, B. (2008). Extraction and representation of prosodic features for language and speaker recognition. Speech Communication, 50(10), 782–796.
Google Scholar
Ma, B., Zhu, D., & Tong, R. (2006). Chinese dialect identification using tone features based on pitch flux. International Conference on Acoustics Speech and Signal Processing Proceedings (ICASSP), 1, 1029–1032.
Google Scholar
Mehrabani, M., & Hansen, J. H. L. (2015). Automatic analysis of dialect/language sets. International Journal of Speech Technology, 18(3), 277–286.
Google Scholar
Nagesha, K. S., & Kumar, G. H. (2010). Acoustic-phonetic analysis of Kannada accents. Mumbai: Tata Institute of Fundamental Research.
Google Scholar
Pedersen, C., & Diederich, J. (2007). Accent classification using support vector machines. In Sixth international conference on computer and information science (IEEE/ACIS) (pp. 444–449).
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., et al. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830.
Google Scholar
Prahallad, K., Kumar, E. N., Keri, V., Rajendran, S., & Black, A. W. (2012). The IIIT-H Indic speech databases. In Thirteenth annual conference of the international speech communication association.
Rajapurohit, B. B. (1982). Acoustic characteristics of Kannada (Vol. 27). Central Institute of Indian Languages.
Ramus, F., & Mehler, J. (1999). Language identification with suprasegmental cues: A study based on speech resynthesis. The Journal of the Acoustical Society of America, 105(1), 512–521.
Google Scholar
Rao, K. S., & Koolagudi, S. G. (2011). Identification of Hindi dialects and emotions using spectral and prosodic features of speech. International Journal of Systemics, Cybernetics and Informatics, 9(4), 24–33.
Google Scholar
Reddy, V. R., Maity, S., & Rao, K. S. (2013). Identification of Indian languages using multi-level spectral and prosodic features. International Journal of Speech Technology, 16(4), 489–511.
Google Scholar
Rouas, J. L. (2007). Automatic prosodic variations modeling for language and dialect discrimination. IEEE Transactions on Audio, Speech and Language Processing, 15(6), 1904–1911.
Google Scholar
Sadjadi, S. O., Slaney, M., & Heck, L. (2013). MSR identity toolbox v1. 0: A MATLAB toolbox for speaker-recognition research. Speech and Language Processing Technical Committee Newsletter, 1(4), 1–32.
Google Scholar
Sarma, M., & Sarma, K. K. (2016) Dialect identification from Assamese speech using prosodic features and a neuro fuzzy classifier. In Third international conference on signal processing and integrated networks (SPIN), (pp. 127–132). IEEE.
Shen, W., Chen, N., & Reynolds, D. (2008). Dialect recognition using adapted phonetic models. In Proceedings of the annual conference of the international speech communication association, INTERSPEECH (pp. 763–766).
Shon, S., Ali, A., & Glass, J. (2018). Convolutional neural networks and language embeddings for end-to-end dialect recognition. arXiv preprint arXiv:1803.04567.
Sinha, S., Jain, A., & Agrawal, S. S. (2015). Acoustic-phonetic feature based dialect identification in Hindi Speech. International Journal on Smart Sensing & Intelligent Systems, 8(1), 235–254.
Google Scholar
Sinha, S., Jain, A., & Agrawal, S. S. (2019). Empirical analysis of linguistic and paralinguistic information for automatic dialect classification. Artificial Intelligence Review, 51(4), 647–672.
Google Scholar
Snyder, D., Garcia-Romero, D., Povey, D., & Khudanpur, S. (2017). Deep neural network embeddings for text-independent speaker verification. In Proc. Interspeech (pp. 999–1003).
Soman, K. P., Ramasamy, V., Antony, P. J., & Saravanan, S. (2011). A rule-based Kannada morphological analyzer and generator using finite state transducer. International Journal of Computer Applications, 27(10), 0975–8887.
Google Scholar
Soorajkumar, R., Girish, G. N., Ramteke, P. B., Joshi, S. S., & Koolagudi, S. G. (2017). Text-independent automatic accent identification system for Kannada Language. In Proceedings of the international conference on data engineering and communication technology, (pp. 411–418). Berlin: Springer.
Torres-carrasquillo, P. A., Gleason, T. P., & Reynolds, D. A . (2004). Dialect identification using Gaussian mixture models. In ODYSSEY—The speaker and language recognition workshop, (pp. 2–5).
Utami, I. T., Sartono, B., & Sadik, K. (2014). Comparison of single and ensemble classifiers of support vector machine and classification tree. Journal of Mathematical Sciences and Applications, 2(2), 17–20.
Google Scholar
Vanishree, V. M. (2011). Provision for linguistic diversity and linguistic minorities in India (Master’s Thesis, Applied Linguistics, St. Mary’s University College, Strawberry Hill, London).
Zhang, Q., & Hansen, J. H. L. (2018). Language/dialect recognition based on unsupervised deep learning. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 26(5), 873–882.
Google Scholar
Zhenhao, G. (2015). Improved accent classification combining phonetic vowels with acoustic features. In Eigth international congress on image and signal processing (CISP) (pp. 1204–1209).
Ziedan, R., Micheal, M., Alsammak, A., Mursi, M., & Elmaghraby, A. (2016). A unified approach for arabic language dialect detection. In Twenty ninth international conference on computers applications in industry and engineering (CAINE) (pp. 165–170).
Zissman, M. A., Gleason, T. P., Rekart, D. M., Losiewicz, B. L. (1996). Automatic dialect identification of extemporaneous conversational, Latin American Spanish speech. In Acoustics, speech, and signal processing, ICASSP (Vol. 2, pp. 777–780).
Zue, V., Seneff, S., & Glass, J. (1990). Speech database development at MIT: TIMIT and beyond. Speech Communication, 9(4), 351–356.
Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Computer Science and Engg., National Institute of Technology Karnataka, Surathkal, India
Nagaratna B. Chittaragi & Shashidhar G. Koolagudi
Dept. of Information Science and Engg., Siddaganga Institute of Technology, Tumkur, Karnataka, India
Nagaratna B. Chittaragi

Authors

Nagaratna B. Chittaragi
View author publications
You can also search for this author in PubMed Google Scholar
Shashidhar G. Koolagudi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nagaratna B. Chittaragi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chittaragi, N.B., Koolagudi, S.G. Automatic dialect identification system for Kannada language using single and ensemble SVM algorithms. Lang Resources & Evaluation 54, 553–585 (2020). https://doi.org/10.1007/s10579-019-09481-5

Download citation

Published: 21 November 2019
Issue Date: June 2020
DOI: https://doi.org/10.1007/s10579-019-09481-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic dialect identification system for Kannada language using single and ensemble SVM algorithms

Abstract

Access this article

Similar content being viewed by others

Automatic speech recognition: a survey

Survey on SVM and their application in image classification

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Automatic dialect identification system for Kannada language using single and ensemble SVM algorithms

Abstract

Access this article

Similar content being viewed by others

Automatic speech recognition: a survey

Survey on SVM and their application in image classification

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation