当前位置: X-MOL 学术J. Big Data › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Development of a regional voice dataset and speaker classification based on machine learning
Journal of Big Data ( IF 8.6 ) Pub Date : 2021-03-02 , DOI: 10.1186/s40537-021-00435-9
Muhammad Ismail , Shahzad Memon , Lachhman Das Dhomeja , Shahid Munir Shah , Dostdar Hussain , Sabit Rahim , Imran Ali

At present, voice biometrics are commonly used for identification and authentication of users through their voice. Voice based services such as mobile banking, access to personal devices, and logging into social networks are the common examples of authenticating users through voice biometrics. In Pakistan, voice-based services are very common in banking and mobile/cellular sector, however, these services do not use voice features to recognize customers. Therefore, the chance to use these services with false identity is always high. It is essential to design a voice-based recognition system to minimize the risk of false identity. In this paper, we developed regional voice datasets for voice biometrics, by collecting voice data in different local accents of Pakistan. Although, there is a global need for voice biometrics especially when voice-based services are common, however, this paper uses Pakistan as a use case to show how to build regional voice dataset for voice biometrics. To build voice dataset, voice samples were recorded from 180 male and female speakers with two languages English and Urdu in form of five regional accents. Mel Frequency Cepstral Coefficient (MFCC) features were extracted from the collected voice samples to train Support Vector Machine (SVM), Artificial Neural Network (ANN), Random Forest (RF) and K-nearest neighbor (KNN) classifiers. The results indicate that ANN outperformed SVM, RF and KNN by achieving 88.53% and 86.58% recognition accuracy on both datasets respectively.



中文翻译:

基于机器学习的区域语音数据集和说话人分类的开发

目前,语音生物识别技术通常用于通过用户的语音进行身份识别和身份验证。基于语音的服务(例如移动银行,访问个人设备以及登录社交网络)是通过语音生物识别技术对用户进行身份验证的常见示例。在巴基斯坦,基于语音的服务在银行和移动/蜂窝部门非常普遍,但是,这些服务不使用语音功能来识别客户。因此,使用具有虚假身份的服务的机会总是很高。设计基于语音的识别系统以最小化错误身份的风险至关重要。在本文中,我们通过收集巴基斯坦不同本地口音的语音数据,开发了用于语音生物识别的区域语音数据集。虽然,全球存在对语音生物特征的需求,尤其是在基于语音的服务很普遍的情况下,但是,本文以巴基斯坦为例,说明了如何为语音生物特征建立区域语音数据集。为了建立语音数据集,以五种区域口音的形式,从180位男性和女性说话者那里录制了语音样本,使用两种语言分别为英语和乌尔都语。从收集的语音样本中提取梅尔频率倒谱系数(MFCC)特征,以训练支持向量机(SVM),人工神经网络(ANN),随机森林(RF)和K最近邻(KNN)分类器。结果表明,在两个数据集上,人工神经网络均优于SVM,RF和KNN,分别达到88.53%和86.58%的识别精度。本文以巴基斯坦为例,说明如何为语音生物特征识别建立区域语音数据集。为了建立语音数据集,以五种区域口音的形式,从180位男性和女性说话者那里录制了语音样本,使用两种语言分别为英语和乌尔都语。从收集的语音样本中提取梅尔频率倒谱系数(MFCC)特征,以训练支持向量机(SVM),人工神经网络(ANN),随机森林(RF)和K最近邻(KNN)分类器。结果表明,在两个数据集上,人工神经网络均优于SVM,RF和KNN,分别达到88.53%和86.58%的识别精度。本文以巴基斯坦为例,说明如何为语音生物特征识别建立区域语音数据集。为了建立语音数据集,以五种区域口音的形式,从180位男性和女性说话者那里录制了语音样本,使用两种语言分别为英语和乌尔都语。从收集的语音样本中提取梅尔频率倒谱系数(MFCC)特征,以训练支持向量机(SVM),人工神经网络(ANN),随机森林(RF)和K最近邻(KNN)分类器。结果表明,在两个数据集上,人工神经网络均优于SVM,RF和KNN,分别达到88.53%和86.58%的识别精度。从收集的语音样本中提取梅尔频率倒谱系数(MFCC)特征,以训练支持向量机(SVM),人工神经网络(ANN),随机森林(RF)和K最近邻(KNN)分类器。结果表明,在两个数据集上,人工神经网络均优于SVM,RF和KNN,分别达到88.53%和86.58%的识别精度。从收集的语音样本中提取梅尔频率倒谱系数(MFCC)特征,以训练支持向量机(SVM),人工神经网络(ANN),随机森林(RF)和K最近邻(KNN)分类器。结果表明,在两个数据集上,人工神经网络均优于SVM,RF和KNN,分别达到88.53%和86.58%的识别精度。

更新日期:2021-03-02
down
wechat
bug