当前位置: X-MOL 学术Comput. Intell. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Taylor‐AMS features and deep convolutional neural network for converting nonaudible murmur to normal speech
Computational Intelligence ( IF 2.8 ) Pub Date : 2020-02-14 , DOI: 10.1111/coin.12281
T. Rajesh Kumar 1 , G. R Suresh 2 , S. Kanaga Subaraja 3 , C. Karthikeyan 1
Affiliation  

Communication becomes effective when the speech signal arrives with the profound characteristics. This insisted the researchers to develop an automatic system of recognizing the speech signals from the murmurs. Some of the traditional automatic recognition systems are unfit for the silent environments imposing a need for an effective recognition system. Also, the traditional automatic recognition methods, like Neural Networks, render poor performance in the presence of the murmurs. Thus, this article proposes a method for automatic whisper recognition using the Deep Convolutional Neural Network (DCNN). The training of the DCNN is performed using the proposed Stochastic‐Whale Optimization Algorithm (Stochastic‐WOA), which is designed by the integration of Stochastic Gradient Descent algorithm with WOA. The input to the classifier is the features that include pitch chroma, spectral centroid, spectral skewness, and Taylor‐Amplitude Modulation Spectrogram (Taylor‐AMS), which is obtained by combining Taylor series and Amplitude Modulation Spectrogram (AMS) features, of the preprocessed input speech signal. The experimentation of the method is performed using the real database and the analysis proves that the proposed method acquired a maximal accuracy of 0.9723, minimal False Positive Rate of 0.0257, and maximal True Positive Rate of 0.9981, respectively.

中文翻译:

Taylor-AMS功能和深度卷积神经网络可将杂音转换为正常语音

当语音信号具有深远的特性时,通信就变得有效。这要求研究人员开发一种自动系统,以识别来自杂音的语音信号。一些传统的自动识别系统不适用于要求有效识别系统的静默环境。此外,传统的自动识别方法(如神经网络)在出现杂音的情况下也会表现不佳。因此,本文提出了一种使用深度卷积神经网络(DCNN)的自动耳语识别方法。使用建议的随机鲸鱼优化算法(Stochastic-WOA)进行DCNN的训练,该算法是通过将随机梯度下降算法与WOA集成来设计的。分类器的输入是包括音调色度,频谱质心,频谱偏度和泰勒振幅调制频谱图(Taylor-AMS)的功能,这些特征是通过预处理的泰勒级数和振幅调制频谱图(AMS)功能相结合而获得的输入语音信号。该方法在真实数据库中进行了实验,分析表明该方法的最大准确度为0.9723,最小误报率为0.0257,最大真率为0.9981。
更新日期:2020-02-14
down
wechat
bug