Abstract
In this paper, an automatic dialect identification (ADI) system is proposed by extracting spectral and prosodic features for Kannada language. A new dialect dataset is collected from native speakers of Kannada language (A Dravidian language). This dataset includes five distinct dialects of Kannada language representing five geographical regions of Karnataka state. Investigation of the significance of spectral and prosodic variations on five Kannada dialects is carried out. Mel-frequency cepstral coefficients (MFCCs), spectral flux, and entropy are used as representatives of spectral features. Besides, pitch and energy features are extracted as representatives of prosodic parameters for identification of dialects. These raw feature vectors are further processed to get a new derived feature vectors by using statistical processing. In this paper, a single classifier based multi-class support vector machine (SVM) and multiple classifier based ensemble SVM (ESVM) techniques are employed for classification of dialects. The effectiveness and performance evaluation of the explored features are carried out on newly collected Kannada speech corpus, with five Kannada dialects and internationally known standard Intonation Variation in English (IViE) dataset with nine British English dialects. Experimental results have demonstrated that the derived feature vectors performs better when compared to raw feature vectors. However, ESVM technique has demonstrated better performance over a single SVM. Spectral and prosodic features have resulted individually with the dialect recognition performance of 83.12% and 44.52% respectively. Further, the complementary nature of both spectral and prosodic features is evaluated by combining both feature vectors for dialect recognition. However, an increase in dialect recognition performance of about 86.25% is observed. This indicates the existence of complementary dialect specific evidence with spectral and prosodic features. The experiments conducted on standard IViE corpus have shown a higher recognition rate of 91.38% using ESVM. Proposed ADI systems with derived features have shown better performance over the state-of-the-art i-vector feature based systems on both datasets.
Similar content being viewed by others
Notes
In this thesis, the classification, identification, and recognition words are used interchangeably conveying similar meaning with a standard machine learning goal.
A UBM is a large GMM trained to represent the speaker-independent distribution of features
References
Ahuja, P., & Vyas, J. M. (2018). Forensic speaker profiling: The study of supra-segmental features of Gujarati dialects for text-independent speaker identification. Australian Journal of Forensic Sciences, 50(2), 152–165.
Arslan, L. M., & Hansen, J. H. L. (1996). Language accent classification in American English. Speech Communication, 18(4), 353–367.
Bahari, M. H., Dehak, N., Van hamme, H., Burget, L., Ali, A. M., & Glass, J. (2014). Non-negative factor analysis of Gaussian mixture model weight adaptation for language and dialect recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(7), 1117–1129.
Behravan, H., Hautamäki, V., & Kinnunen, T. (2015). Factors affecting i-vector based foreign accent recognition: A case study in spoken Finnish. Speech Communication, 66, 118–129.
Biadsy, F. (2011). Automatic dialect and accent recognition and its application to speech recognition (PhD Thesis, Columbia University).
Biadsy, F., & Hirschberg, J. (2009). Using prosody and phonotactics in Arabic dialect identification. Interspeech, 9, 208–211.
Biadsy, F., Hirschberg, J., & Ellis, D. P. W. (2011) Dialect and accent recognition using phonetic-segmentation supervectors. In Interspeech (pp. 745–748).
Bougrine, S., Cherroun, H., & Ziadi, D. (2017). Hierarchical classification for spoken Arabic dialect identification using prosody: Case of algerian dialects. arXiv preprint arXiv:1703.10065.
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.
Campbell, W. M., Campbell, J. P., Reynolds, D. A., Singer, E., & Torres-Carrasquillo, Pedro A. (2006). Support vector machines for speaker and language recognition. Computer Speech & Language, 20(2), 210–229.
Canavan, A., & Zipperlen, G. (1996). Callfriend American English-non-southern dialect. Linguistic Data Consortium, Philadelphia, 10, 1.
Chambers, J. K., & Trudgill, P. (1998). Dialectology (2nd ed.). Cambridge: Cambridge University Press.
Chandrasekaran, K. (2012). Indeterminacies in Howatch’s St. Benet’s Trilogy. Language in India, 12(12), 382–389.
Chang, C. C., & Lin, C. J. (2011). LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST), 2(3), 27.
Chen, N. F., Shen, W., & Campbell, J. P. (2010). A linguistically-informative approach to dialect recognition using dialect-discriminating context-dependent phonetic models. In IEEE international conference on acoustics speech and signal processing (ICASSP) (pp. 5014–5017). IEEE.
Chen, N. F., Tam, S. W., Shen, W., & Campbell, J. P. (2014). Characterizing phonetic transformations and acoustic differences across English dialects. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(1), 110–124.
Chen, T., Huang, C., Chang, E., & Wang, J. (2001). Automatic accent identification using Gaussian mixture models (pp. 343–346). IEEE workshop: In automatic speech recognition and understanding.
Chittaragi, N. B, Koolagudi, S. G. (2017). Acoustic features based word level dialect classification using SVM and ensemble methods. In Tenth international conference on contemporary computing (IC3) (pp. 1–6). IEEE.
Chittaragi, N. B., Koolagudi, S. G. (2018). Sentence based dialect identification system using extreme gradient boosting algorithm. In Sixth international conference on advanced computing, networking, and informatics [ICACNI-2018] (pp. 1–6). Berlin: Springer.
Chittaragi, N. B., Prakash, A., & Koolagudi, S. G. (2018). Dialect identification using spectral and prosodic features on single and ensemble classifiers. Arabian Journal for Science and Engineering, 43(3), 4289–4302.
Clopper, C. G., & Pisoni, D. B. (2006). The nationwide speech project: A new corpus of American English dialects. Speech Communication, 48(6), 633–644.
Clopper, C. G., & Smiljanic, R. (2011). Effects of gender and regional dialect on prosodic patterns in American English. Journal of Phonetics, 39(2), 237–245.
D’ Arcy, S., Russell, M. J., Browning, S. R , Tomlinson, M. J. (2004). The accents of the British Isles (ABI) corpus. In Proceedings Modélisations pour l’Identification des Langues (pp. 115–119).
Darwish, K., Sajjad, H., & Mubarak, H. (2014). Verifiably effective arabic dialect identification. In Empirical methods in natural language processing (pp. 1465–1468).
Dehak, N., Torres-Carrasquillo, P. A., Reynolds, D. A., & Dehak, R. (2011). Language recognition via i-vectors and dimensionality reduction. In Interspeech (pp. 857–860).
Dietterich, T. G. (2000a). Ensemble methods in machine learning. In International workshop on multiple classifier systems (pp 1–15). Berlin: Springer.
Dietterich, T. G. (2000b). An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Machine Learning, 40(2), 139–157.
Etman, A., & Beex, A. L. (2015). Language and dialect identification: A survey. In SAI intelligent systems conference (IntelliSys), (pp. 220–231).
Ferragne, E., & Pellegrino, F. (2007). Automatic dialect identification: A study of British English. In Speaker classification II (pp. 243–257). Berlin: Springer.
Giannakopoulos, T., & Pikrakis, A. (2014). Introduction to audio analysis: A MATLAB approach. Cambridge: Academic Press.
Grabe, E., & Post, B. (2002). Intonational variation in the British Isles. In Speech prosody.
Hanani, A., Russell, M. J., & Carey, M. J. (2013). Human and computer recognition of regional accents and ethnic groups from British English speech. Computer Speech & Language, 27(1), 59–74.
Hansen, J. H. L., & Liu, G. (2016). Unsupervised accent classification for deep data fusion of accent and language information. Speech Communication, 78, 19–33.
Harris, M. J., Gries, S. T., & Miglio, V. G. (2014). Prosody and its application to forensic linguistics. LESLI: Linguistic Evidence in Security Law and Intelligence, 2(2), 11–29.
Hermansky, H., & Morgan, N. (1994). Rasta processing of speech. IEEE Transactions on Speech and Audio Processing, 2(4), 578–589.
Huang, R., & Hansen, J. H. L. (2007). Unsupervised discriminative training with application to dialect classification. IEEE transactions on Audio, Speech, and Language processing, 15(8), 2444–2453.
Huang, R., Hansen, J. H. L., & Angkititrakul, P. (2007). Dialect/accent classification using unrestricted audio. IEEE Transactions on Audio, Speech, and Language Processing, 15(2), 453–464.
Jain, D., & Cardona, G. (2007). The Indo-Aryan languages. Abingdon: Routledge.
Jiao, Y., Tu, M., Berisha, V., & Liss, J. M. (2016). Accent identification by combining deep neural networks and recurrent neural networks trained on long and short term features. In Interspeech (pp. 2388–2392).
Kim, H. Chul, P., Shaoning, J., Hong M., Kim, D. & Bang, S. Y. (2002). Support vector machine ensemble with bagging. In First international workshop on pattern recognition with support vector machines (pp. 397–408).
Lei, Y., & Hansen, J. H. L. (2011). Dialect classification via text-independent training and testing for Arabic, Spanish, and Chinese. IEEE Transactions on Audio, Speech, and Language Processing, 19(1), 85–96.
Lim, B. P., Li, H., & Ma, B. (2005). Using local & global phonotactic features in Chinese dialect identification. In International conference on acoustics, speech, and signal processing (ICASSP) (Vol. 1, pp. I–577). IEEE
Liu, G. A., & Hansen, J. H. L. (2011). A systematic strategy for robust automatic dialect identification. In Nineteenth European signal processing conference (pp. 2138–2141).
Liu, G., Lei, Y., & Hansen, J. H. L. (2010). Dialect identification: Impact of differences between read versus spontaneous speech. In Eighteenth European signal processing Conference (pp 2003–2006). IEEE.
Malmasi, S., & Dras, M. (2015). Language identification using classifier ensembles. In Proceedings of the joint workshop on language technology for closely related languages, varieties and dialects, (pp. 35–43).
Mannepalli, K., Sastry, P. N., & Suman, M. (2016). MFCC-GMM based accent recognition system for Telugu speech signals. International Journal of Speech Technology, 19(1), 87–93.
Mary, L., & Yegnanarayana, B. (2008). Extraction and representation of prosodic features for language and speaker recognition. Speech Communication, 50(10), 782–796.
Ma, B., Zhu, D., & Tong, R. (2006). Chinese dialect identification using tone features based on pitch flux. International Conference on Acoustics Speech and Signal Processing Proceedings (ICASSP), 1, 1029–1032.
Mehrabani, M., & Hansen, J. H. L. (2015). Automatic analysis of dialect/language sets. International Journal of Speech Technology, 18(3), 277–286.
Nagesha, K. S., & Kumar, G. H. (2010). Acoustic-phonetic analysis of Kannada accents. Mumbai: Tata Institute of Fundamental Research.
Pedersen, C., & Diederich, J. (2007). Accent classification using support vector machines. In Sixth international conference on computer and information science (IEEE/ACIS) (pp. 444–449).
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., et al. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830.
Prahallad, K., Kumar, E. N., Keri, V., Rajendran, S., & Black, A. W. (2012). The IIIT-H Indic speech databases. In Thirteenth annual conference of the international speech communication association.
Rajapurohit, B. B. (1982). Acoustic characteristics of Kannada (Vol. 27). Central Institute of Indian Languages.
Ramus, F., & Mehler, J. (1999). Language identification with suprasegmental cues: A study based on speech resynthesis. The Journal of the Acoustical Society of America, 105(1), 512–521.
Rao, K. S., & Koolagudi, S. G. (2011). Identification of Hindi dialects and emotions using spectral and prosodic features of speech. International Journal of Systemics, Cybernetics and Informatics, 9(4), 24–33.
Reddy, V. R., Maity, S., & Rao, K. S. (2013). Identification of Indian languages using multi-level spectral and prosodic features. International Journal of Speech Technology, 16(4), 489–511.
Rouas, J. L. (2007). Automatic prosodic variations modeling for language and dialect discrimination. IEEE Transactions on Audio, Speech and Language Processing, 15(6), 1904–1911.
Sadjadi, S. O., Slaney, M., & Heck, L. (2013). MSR identity toolbox v1. 0: A MATLAB toolbox for speaker-recognition research. Speech and Language Processing Technical Committee Newsletter, 1(4), 1–32.
Sarma, M., & Sarma, K. K. (2016) Dialect identification from Assamese speech using prosodic features and a neuro fuzzy classifier. In Third international conference on signal processing and integrated networks (SPIN), (pp. 127–132). IEEE.
Shen, W., Chen, N., & Reynolds, D. (2008). Dialect recognition using adapted phonetic models. In Proceedings of the annual conference of the international speech communication association, INTERSPEECH (pp. 763–766).
Shon, S., Ali, A., & Glass, J. (2018). Convolutional neural networks and language embeddings for end-to-end dialect recognition. arXiv preprint arXiv:1803.04567.
Sinha, S., Jain, A., & Agrawal, S. S. (2015). Acoustic-phonetic feature based dialect identification in Hindi Speech. International Journal on Smart Sensing & Intelligent Systems, 8(1), 235–254.
Sinha, S., Jain, A., & Agrawal, S. S. (2019). Empirical analysis of linguistic and paralinguistic information for automatic dialect classification. Artificial Intelligence Review, 51(4), 647–672.
Snyder, D., Garcia-Romero, D., Povey, D., & Khudanpur, S. (2017). Deep neural network embeddings for text-independent speaker verification. In Proc. Interspeech (pp. 999–1003).
Soman, K. P., Ramasamy, V., Antony, P. J., & Saravanan, S. (2011). A rule-based Kannada morphological analyzer and generator using finite state transducer. International Journal of Computer Applications, 27(10), 0975–8887.
Soorajkumar, R., Girish, G. N., Ramteke, P. B., Joshi, S. S., & Koolagudi, S. G. (2017). Text-independent automatic accent identification system for Kannada Language. In Proceedings of the international conference on data engineering and communication technology, (pp. 411–418). Berlin: Springer.
Torres-carrasquillo, P. A., Gleason, T. P., & Reynolds, D. A . (2004). Dialect identification using Gaussian mixture models. In ODYSSEY—The speaker and language recognition workshop, (pp. 2–5).
Utami, I. T., Sartono, B., & Sadik, K. (2014). Comparison of single and ensemble classifiers of support vector machine and classification tree. Journal of Mathematical Sciences and Applications, 2(2), 17–20.
Vanishree, V. M. (2011). Provision for linguistic diversity and linguistic minorities in India (Master’s Thesis, Applied Linguistics, St. Mary’s University College, Strawberry Hill, London).
Zhang, Q., & Hansen, J. H. L. (2018). Language/dialect recognition based on unsupervised deep learning. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 26(5), 873–882.
Zhenhao, G. (2015). Improved accent classification combining phonetic vowels with acoustic features. In Eigth international congress on image and signal processing (CISP) (pp. 1204–1209).
Ziedan, R., Micheal, M., Alsammak, A., Mursi, M., & Elmaghraby, A. (2016). A unified approach for arabic language dialect detection. In Twenty ninth international conference on computers applications in industry and engineering (CAINE) (pp. 165–170).
Zissman, M. A., Gleason, T. P., Rekart, D. M., Losiewicz, B. L. (1996). Automatic dialect identification of extemporaneous conversational, Latin American Spanish speech. In Acoustics, speech, and signal processing, ICASSP (Vol. 2, pp. 777–780).
Zue, V., Seneff, S., & Glass, J. (1990). Speech database development at MIT: TIMIT and beyond. Speech Communication, 9(4), 351–356.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Chittaragi, N.B., Koolagudi, S.G. Automatic dialect identification system for Kannada language using single and ensemble SVM algorithms. Lang Resources & Evaluation 54, 553–585 (2020). https://doi.org/10.1007/s10579-019-09481-5
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10579-019-09481-5