当前位置: X-MOL 学术Comput. Intell. Neurosci. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Spoken Language Identification Using Deep Learning
Computational Intelligence and Neuroscience Pub Date : 2021-09-21 , DOI: 10.1155/2021/5123671
Gundeep Singh 1 , Sahil Sharma 1 , Vijay Kumar 2 , Manjit Kaur 3 , Mohammed Baz 4 , Mehedi Masud 5
Affiliation  

The process of detecting language from an audio clip by an unknown speaker, regardless of gender, manner of speaking, and distinct age speaker, is defined as spoken language identification (SLID). The considerable task is to recognize the features that can distinguish between languages clearly and efficiently. The model uses audio files and converts those files into spectrogram images. It applies the convolutional neural network (CNN) to bring out main attributes or features to detect output easily. The main objective is to detect languages out of English, French, Spanish, and German, Estonian, Tamil, Mandarin, Turkish, Chinese, Arabic, Hindi, Indonesian, Portuguese, Japanese, Latin, Dutch, Portuguese, Pushto, Romanian, Korean, Russian, Swedish, Tamil, Thai, and Urdu. An experiment was conducted on different audio files using the Kaggle dataset named spoken language identification. These audio files are comprised of utterances, each of them spanning over a fixed duration of 10 seconds. The whole dataset is split into training and test sets. Preparatory results give an overall accuracy of 98%. Extensive and accurate testing show an overall accuracy of 88%.

中文翻译:

使用深度学习进行口语识别

从未知说话者的音频剪辑中检测语言的过程被定义为口语识别 (SLID),无论其性别、说话方式和说话人的年龄如何。相当重要的任务是识别可以清晰有效地区分语言的特征。该模型使用音频文件并将这些文件转换为频谱图图像。它应用卷积神经网络 (CNN) 来提取主要属性或特征以轻松检测输出。主要目标是检测英语、法语、西班牙语和德语、爱沙尼亚语、泰米尔语、普通话、土耳其语、中文、阿拉伯语、印地语、印度尼西亚语、葡萄牙语、日语、拉丁语、荷兰语、葡萄牙语、普什图语、罗马尼亚语、韩语、俄语、瑞典语、泰米尔语、泰语和乌尔都语。使用名为口语识别的 Kaggle 数据集对不同的音频文件进行了实验。这些音频文件由话语组成,每个话语都跨越 10 秒的固定持续时间。整个数据集分为训练集和测试集。准备结果的总体准确率为 98%。广泛而准确的测试显示总体准确率为 88%。
更新日期:2021-09-22
down
wechat
bug