Dialect Identification using Chroma-Spectral Shape Features with Ensemble Technique,Computer Speech & Language

当前位置： X-MOL 学术 › Comput. Speech Lang › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Dialect Identification using Chroma-Spectral Shape Features with Ensemble Technique
Computer Speech & Language ( IF 4.3 ) Pub Date : 2021-04-23 , DOI: 10.1016/j.csl.2021.101230
Nagaratna B. Chittaragi , Shashidhar G. Koolagudi

The present work proposes a text-independent dialect identification system. Generally, dialects of a language exhibit varying pronunciation styles followed in a specific geographical region. In this paper, chroma features familiar with music-related systems are employed for identification of dialects. In addition, eight significant spectral shape related features from short term spectra are computed and combined along with chroma features and named as chroma-spectral shape features. Chroma features try to aggregate spectral information and attempt to encapsulate the evidential variations, concerning timbre, correlated melody, rhythmic, and intonation patterns found prominently among dialects of few languages. The effectiveness of the proposed features and approach is evaluated on five prominent Kannada dialects spoken in Karnataka, India and internationally known standard Intonation Variation in English (IViE) dataset with nine British English dialects. Discriminative models such as, single classifier based Support Vector Machine (SVM) and ensemble based support vector machines (ESVM) are employed for classification. The proposed features have shown better performance over state-of-the-art i-vector features on both datasets. The highest recognition performance of 95.6% and 97.52% are achieved in the cases of Kannada and IViE dialect datasets respectively using ESVM. Proposed features have also demonstrated robust performance with small sized (limited data) audio clips even in noisy conditions.

中文翻译：

使用色度谱形状特征和集成技术进行方言识别

本工作提出了一种独立于文本的方言识别系统。通常，一种语言的方言在特定地理区域中表现出不同的发音风格。在本文中，使用与音乐相关的系统熟悉的色度特征来识别方言。此外，还计算了来自短期光谱的八个与光谱形状相关的重要特征，并将其与色度特征组合在一起，并命名为色度光谱形状特征。色度特征试图聚集频谱信息，并试图封装与音色，相关旋律，节奏和语调模式有关的证据变化，这些语言在几种语言的方言中很常见。在卡纳塔克邦说出的五种著名的卡纳达语方言中，对提出的功能和方法的有效性进行了评估，印度和国际上已知的标准英语音调变化（IViE）数据集，其中包含九种英国英语方言。区分模型，例如基于单个分类器的支持向量机（SVM）和基于集成的支持向量机（ESVM）进行分类。与两个数据集上的最新i-vector特征相比，拟议的特征均具有更好的性能。在使用ESVM的Kannada和IViE方言数据集的情况下，分别达到95.6％和97.52％的最高识别性能。拟议的功能还证明了即使在嘈杂的条件下，小尺寸（有限数据）音频剪辑也具有强大的性能。基于单个分类器的支持向量机（SVM）和基于集成的支持向量机（ESVM）进行分类。与两个数据集上的最新i-vector特征相比，拟议的特征均具有更好的性能。在使用ESVM的Kannada和IViE方言数据集的情况下，分别达到95.6％和97.52％的最高识别性能。拟议的功能还证明了即使在嘈杂的条件下，小尺寸（有限数据）音频剪辑也具有强大的性能。基于单个分类器的支持向量机（SVM）和基于集成的支持向量机（ESVM）进行分类。与两个数据集上的最新i-vector特征相比，拟议的特征均具有更好的性能。在使用ESVM的Kannada和IViE方言数据集的情况下，分别达到95.6％和97.52％的最高识别性能。拟议的功能还证明了即使在嘈杂的条件下，小尺寸（有限数据）音频剪辑也具有强大的性能。使用ESVM在Kannada和IViE方言数据集的情况下分别达到52％。拟议的功能还证明了即使在嘈杂的条件下，小尺寸（有限数据）音频剪辑也具有强大的性能。使用ESVM在Kannada和IViE方言数据集的情况下分别达到52％。拟议的功能还证明了即使在嘈杂的条件下，小尺寸（有限数据）音频剪辑也具有强大的性能。

更新日期：2021-04-30

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>