Skip to main content
Log in

Automatic dialect identification system for Kannada language using single and ensemble SVM algorithms

  • Original Paper
  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

In this paper, an automatic dialect identification (ADI) system is proposed by extracting spectral and prosodic features for Kannada language. A new dialect dataset is collected from native speakers of Kannada language (A Dravidian language). This dataset includes five distinct dialects of Kannada language representing five geographical regions of Karnataka state. Investigation of the significance of spectral and prosodic variations on five Kannada dialects is carried out. Mel-frequency cepstral coefficients (MFCCs), spectral flux, and entropy are used as representatives of spectral features. Besides, pitch and energy features are extracted as representatives of prosodic parameters for identification of dialects. These raw feature vectors are further processed to get a new derived feature vectors by using statistical processing. In this paper, a single classifier based multi-class support vector machine (SVM) and multiple classifier based ensemble SVM (ESVM) techniques are employed for classification of dialects. The effectiveness and performance evaluation of the explored features are carried out on newly collected Kannada speech corpus, with five Kannada dialects and internationally known standard Intonation Variation in English (IViE) dataset with nine British English dialects. Experimental results have demonstrated that the derived feature vectors performs better when compared to raw feature vectors. However, ESVM technique has demonstrated better performance over a single SVM. Spectral and prosodic features have resulted individually with the dialect recognition performance of 83.12% and 44.52% respectively. Further, the complementary nature of both spectral and prosodic features is evaluated by combining both feature vectors for dialect recognition. However, an increase in dialect recognition performance of about 86.25% is observed. This indicates the existence of complementary dialect specific evidence with spectral and prosodic features. The experiments conducted on standard IViE corpus have shown a higher recognition rate of 91.38% using ESVM. Proposed ADI systems with derived features have shown better performance over the state-of-the-art i-vector feature based systems on both datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. In this thesis, the classification, identification, and recognition words are used interchangeably conveying similar meaning with a standard machine learning goal.

  2. A UBM is a large GMM trained to represent the speaker-independent distribution of features

References

  • Ahuja, P., & Vyas, J. M. (2018). Forensic speaker profiling: The study of supra-segmental features of Gujarati dialects for text-independent speaker identification. Australian Journal of Forensic Sciences, 50(2), 152–165.

    Google Scholar 

  • Arslan, L. M., & Hansen, J. H. L. (1996). Language accent classification in American English. Speech Communication, 18(4), 353–367.

    Google Scholar 

  • Bahari, M. H., Dehak, N., Van hamme, H., Burget, L., Ali, A. M., & Glass, J. (2014). Non-negative factor analysis of Gaussian mixture model weight adaptation for language and dialect recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(7), 1117–1129.

    Google Scholar 

  • Behravan, H., Hautamäki, V., & Kinnunen, T. (2015). Factors affecting i-vector based foreign accent recognition: A case study in spoken Finnish. Speech Communication, 66, 118–129.

    Google Scholar 

  • Biadsy, F. (2011). Automatic dialect and accent recognition and its application to speech recognition (PhD Thesis, Columbia University).

  • Biadsy, F., & Hirschberg, J. (2009). Using prosody and phonotactics in Arabic dialect identification. Interspeech, 9, 208–211.

    Google Scholar 

  • Biadsy, F., Hirschberg, J., & Ellis, D. P. W. (2011) Dialect and accent recognition using phonetic-segmentation supervectors. In Interspeech (pp. 745–748).

  • Bougrine, S., Cherroun, H., & Ziadi, D. (2017). Hierarchical classification for spoken Arabic dialect identification using prosody: Case of algerian dialects. arXiv preprint arXiv:1703.10065.

  • Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.

    Google Scholar 

  • Campbell, W. M., Campbell, J. P., Reynolds, D. A., Singer, E., & Torres-Carrasquillo, Pedro A. (2006). Support vector machines for speaker and language recognition. Computer Speech & Language, 20(2), 210–229.

    Google Scholar 

  • Canavan, A., & Zipperlen, G. (1996). Callfriend American English-non-southern dialect. Linguistic Data Consortium, Philadelphia, 10, 1.

    Google Scholar 

  • Chambers, J. K., & Trudgill, P. (1998). Dialectology (2nd ed.). Cambridge: Cambridge University Press.

    Google Scholar 

  • Chandrasekaran, K. (2012). Indeterminacies in Howatch’s St. Benet’s Trilogy. Language in India, 12(12), 382–389.

    Google Scholar 

  • Chang, C. C., & Lin, C. J. (2011). LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST), 2(3), 27.

    Google Scholar 

  • Chen, N. F., Shen, W., & Campbell, J. P. (2010). A linguistically-informative approach to dialect recognition using dialect-discriminating context-dependent phonetic models. In IEEE international conference on acoustics speech and signal processing (ICASSP) (pp. 5014–5017). IEEE.

  • Chen, N. F., Tam, S. W., Shen, W., & Campbell, J. P. (2014). Characterizing phonetic transformations and acoustic differences across English dialects. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(1), 110–124.

    Google Scholar 

  • Chen, T., Huang, C., Chang, E., & Wang, J. (2001). Automatic accent identification using Gaussian mixture models (pp. 343–346). IEEE workshop: In automatic speech recognition and understanding.

  • Chittaragi, N. B, Koolagudi, S. G. (2017). Acoustic features based word level dialect classification using SVM and ensemble methods. In Tenth international conference on contemporary computing (IC3) (pp. 1–6). IEEE.

  • Chittaragi, N. B., Koolagudi, S. G. (2018). Sentence based dialect identification system using extreme gradient boosting algorithm. In Sixth international conference on advanced computing, networking, and informatics [ICACNI-2018] (pp. 1–6). Berlin: Springer.

  • Chittaragi, N. B., Prakash, A., & Koolagudi, S. G. (2018). Dialect identification using spectral and prosodic features on single and ensemble classifiers. Arabian Journal for Science and Engineering, 43(3), 4289–4302.

    Google Scholar 

  • Clopper, C. G., & Pisoni, D. B. (2006). The nationwide speech project: A new corpus of American English dialects. Speech Communication, 48(6), 633–644.

    Google Scholar 

  • Clopper, C. G., & Smiljanic, R. (2011). Effects of gender and regional dialect on prosodic patterns in American English. Journal of Phonetics, 39(2), 237–245.

    Google Scholar 

  • D’ Arcy, S., Russell, M. J., Browning, S. R , Tomlinson, M. J. (2004). The accents of the British Isles (ABI) corpus. In Proceedings Modélisations pour l’Identification des Langues (pp. 115–119).

  • Darwish, K., Sajjad, H., & Mubarak, H. (2014). Verifiably effective arabic dialect identification. In Empirical methods in natural language processing (pp. 1465–1468).

  • Dehak, N., Torres-Carrasquillo, P. A., Reynolds, D. A., & Dehak, R. (2011). Language recognition via i-vectors and dimensionality reduction. In Interspeech (pp. 857–860).

  • Dietterich, T. G. (2000a). Ensemble methods in machine learning. In International workshop on multiple classifier systems (pp 1–15). Berlin: Springer.

  • Dietterich, T. G. (2000b). An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Machine Learning, 40(2), 139–157.

    Google Scholar 

  • Etman, A., & Beex, A. L. (2015). Language and dialect identification: A survey. In SAI intelligent systems conference (IntelliSys), (pp. 220–231).

  • Ferragne, E., & Pellegrino, F. (2007). Automatic dialect identification: A study of British English. In Speaker classification II (pp. 243–257). Berlin: Springer.

    Google Scholar 

  • Giannakopoulos, T., & Pikrakis, A. (2014). Introduction to audio analysis: A MATLAB approach. Cambridge: Academic Press.

    Google Scholar 

  • Grabe, E., & Post, B. (2002). Intonational variation in the British Isles. In Speech prosody.

  • Hanani, A., Russell, M. J., & Carey, M. J. (2013). Human and computer recognition of regional accents and ethnic groups from British English speech. Computer Speech & Language, 27(1), 59–74.

    Google Scholar 

  • Hansen, J. H. L., & Liu, G. (2016). Unsupervised accent classification for deep data fusion of accent and language information. Speech Communication, 78, 19–33.

    Google Scholar 

  • Harris, M. J., Gries, S. T., & Miglio, V. G. (2014). Prosody and its application to forensic linguistics. LESLI: Linguistic Evidence in Security Law and Intelligence, 2(2), 11–29.

    Google Scholar 

  • Hermansky, H., & Morgan, N. (1994). Rasta processing of speech. IEEE Transactions on Speech and Audio Processing, 2(4), 578–589.

    Google Scholar 

  • Huang, R., & Hansen, J. H. L. (2007). Unsupervised discriminative training with application to dialect classification. IEEE transactions on Audio, Speech, and Language processing, 15(8), 2444–2453.

    Google Scholar 

  • Huang, R., Hansen, J. H. L., & Angkititrakul, P. (2007). Dialect/accent classification using unrestricted audio. IEEE Transactions on Audio, Speech, and Language Processing, 15(2), 453–464.

    Google Scholar 

  • Jain, D., & Cardona, G. (2007). The Indo-Aryan languages. Abingdon: Routledge.

    Google Scholar 

  • Jiao, Y., Tu, M., Berisha, V., & Liss, J. M. (2016). Accent identification by combining deep neural networks and recurrent neural networks trained on long and short term features. In Interspeech (pp. 2388–2392).

  • Kim, H. Chul, P., Shaoning, J., Hong M., Kim, D. & Bang, S. Y. (2002). Support vector machine ensemble with bagging. In First international workshop on pattern recognition with support vector machines (pp. 397–408).

  • Lei, Y., & Hansen, J. H. L. (2011). Dialect classification via text-independent training and testing for Arabic, Spanish, and Chinese. IEEE Transactions on Audio, Speech, and Language Processing, 19(1), 85–96.

    Google Scholar 

  • Lim, B. P., Li, H., & Ma, B. (2005). Using local & global phonotactic features in Chinese dialect identification. In International conference on acoustics, speech, and signal processing (ICASSP) (Vol. 1, pp. I–577). IEEE

  • Liu, G. A., & Hansen, J. H. L. (2011). A systematic strategy for robust automatic dialect identification. In Nineteenth European signal processing conference (pp. 2138–2141).

  • Liu, G., Lei, Y., & Hansen, J. H. L. (2010). Dialect identification: Impact of differences between read versus spontaneous speech. In Eighteenth European signal processing Conference (pp 2003–2006). IEEE.

  • Malmasi, S., & Dras, M. (2015). Language identification using classifier ensembles. In Proceedings of the joint workshop on language technology for closely related languages, varieties and dialects, (pp. 35–43).

  • Mannepalli, K., Sastry, P. N., & Suman, M. (2016). MFCC-GMM based accent recognition system for Telugu speech signals. International Journal of Speech Technology, 19(1), 87–93.

    Google Scholar 

  • Mary, L., & Yegnanarayana, B. (2008). Extraction and representation of prosodic features for language and speaker recognition. Speech Communication, 50(10), 782–796.

    Google Scholar 

  • Ma, B., Zhu, D., & Tong, R. (2006). Chinese dialect identification using tone features based on pitch flux. International Conference on Acoustics Speech and Signal Processing Proceedings (ICASSP), 1, 1029–1032.

    Google Scholar 

  • Mehrabani, M., & Hansen, J. H. L. (2015). Automatic analysis of dialect/language sets. International Journal of Speech Technology, 18(3), 277–286.

    Google Scholar 

  • Nagesha, K. S., & Kumar, G. H. (2010). Acoustic-phonetic analysis of Kannada accents. Mumbai: Tata Institute of Fundamental Research.

    Google Scholar 

  • Pedersen, C., & Diederich, J. (2007). Accent classification using support vector machines. In Sixth international conference on computer and information science (IEEE/ACIS) (pp. 444–449).

  • Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., et al. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830.

    Google Scholar 

  • Prahallad, K., Kumar, E. N., Keri, V., Rajendran, S., & Black, A. W. (2012). The IIIT-H Indic speech databases. In Thirteenth annual conference of the international speech communication association.

  • Rajapurohit, B. B. (1982). Acoustic characteristics of Kannada (Vol. 27). Central Institute of Indian Languages.

  • Ramus, F., & Mehler, J. (1999). Language identification with suprasegmental cues: A study based on speech resynthesis. The Journal of the Acoustical Society of America, 105(1), 512–521.

    Google Scholar 

  • Rao, K. S., & Koolagudi, S. G. (2011). Identification of Hindi dialects and emotions using spectral and prosodic features of speech. International Journal of Systemics, Cybernetics and Informatics, 9(4), 24–33.

    Google Scholar 

  • Reddy, V. R., Maity, S., & Rao, K. S. (2013). Identification of Indian languages using multi-level spectral and prosodic features. International Journal of Speech Technology, 16(4), 489–511.

    Google Scholar 

  • Rouas, J. L. (2007). Automatic prosodic variations modeling for language and dialect discrimination. IEEE Transactions on Audio, Speech and Language Processing, 15(6), 1904–1911.

    Google Scholar 

  • Sadjadi, S. O., Slaney, M., & Heck, L. (2013). MSR identity toolbox v1. 0: A MATLAB toolbox for speaker-recognition research. Speech and Language Processing Technical Committee Newsletter, 1(4), 1–32.

    Google Scholar 

  • Sarma, M., & Sarma, K. K. (2016) Dialect identification from Assamese speech using prosodic features and a neuro fuzzy classifier. In Third international conference on signal processing and integrated networks (SPIN), (pp. 127–132). IEEE.

  • Shen, W., Chen, N., & Reynolds, D. (2008). Dialect recognition using adapted phonetic models. In Proceedings of the annual conference of the international speech communication association, INTERSPEECH (pp. 763–766).

  • Shon, S., Ali, A., & Glass, J. (2018). Convolutional neural networks and language embeddings for end-to-end dialect recognition. arXiv preprint arXiv:1803.04567.

  • Sinha, S., Jain, A., & Agrawal, S. S. (2015). Acoustic-phonetic feature based dialect identification in Hindi Speech. International Journal on Smart Sensing & Intelligent Systems, 8(1), 235–254.

    Google Scholar 

  • Sinha, S., Jain, A., & Agrawal, S. S. (2019). Empirical analysis of linguistic and paralinguistic information for automatic dialect classification. Artificial Intelligence Review, 51(4), 647–672.

    Google Scholar 

  • Snyder, D., Garcia-Romero, D., Povey, D., & Khudanpur, S. (2017). Deep neural network embeddings for text-independent speaker verification. In Proc. Interspeech (pp. 999–1003).

  • Soman, K. P., Ramasamy, V., Antony, P. J., & Saravanan, S. (2011). A rule-based Kannada morphological analyzer and generator using finite state transducer. International Journal of Computer Applications, 27(10), 0975–8887.

    Google Scholar 

  • Soorajkumar, R., Girish, G. N., Ramteke, P. B., Joshi, S. S., & Koolagudi, S. G. (2017). Text-independent automatic accent identification system for Kannada Language. In Proceedings of the international conference on data engineering and communication technology, (pp. 411–418). Berlin: Springer.

  • Torres-carrasquillo, P. A., Gleason, T. P., & Reynolds, D. A . (2004). Dialect identification using Gaussian mixture models. In ODYSSEY—The speaker and language recognition workshop, (pp. 2–5).

  • Utami, I. T., Sartono, B., & Sadik, K. (2014). Comparison of single and ensemble classifiers of support vector machine and classification tree. Journal of Mathematical Sciences and Applications, 2(2), 17–20.

    Google Scholar 

  • Vanishree, V. M. (2011). Provision for linguistic diversity and linguistic minorities in India (Master’s Thesis, Applied Linguistics, St. Mary’s University College, Strawberry Hill, London).

  • Zhang, Q., & Hansen, J. H. L. (2018). Language/dialect recognition based on unsupervised deep learning. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 26(5), 873–882.

    Google Scholar 

  • Zhenhao, G. (2015). Improved accent classification combining phonetic vowels with acoustic features. In Eigth international congress on image and signal processing (CISP) (pp. 1204–1209).

  • Ziedan, R., Micheal, M., Alsammak, A., Mursi, M., & Elmaghraby, A. (2016). A unified approach for arabic language dialect detection. In Twenty ninth international conference on computers applications in industry and engineering (CAINE) (pp. 165–170).

  • Zissman, M. A., Gleason, T. P., Rekart, D. M., Losiewicz, B. L. (1996). Automatic dialect identification of extemporaneous conversational, Latin American Spanish speech. In Acoustics, speech, and signal processing, ICASSP (Vol. 2, pp. 777–780).

  • Zue, V., Seneff, S., & Glass, J. (1990). Speech database development at MIT: TIMIT and beyond. Speech Communication, 9(4), 351–356.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nagaratna B. Chittaragi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chittaragi, N.B., Koolagudi, S.G. Automatic dialect identification system for Kannada language using single and ensemble SVM algorithms. Lang Resources & Evaluation 54, 553–585 (2020). https://doi.org/10.1007/s10579-019-09481-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-019-09481-5

Keywords

Navigation