Enhancement of speech dynamics for voice activity detection using DNN,EURASIP Journal on Audio, Speech, and Music Processing

当前位置： X-MOL 学术 › EURASIP J. Audio Speech Music Proc. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Enhancement of speech dynamics for voice activity detection using DNN
EURASIP Journal on Audio, Speech, and Music Processing ( IF 2.4 ) Pub Date : 2018-09-12 , DOI: 10.1186/s13636-018-0135-7
Suci Dwijayanti , Kei Yamamori , Masato Miyoshi

Voice activity detection (VAD) is an important preprocessing step for various speech applications to identify speech and non-speech periods in input signals. In this paper, we propose a deep neural network (DNN)-based VAD method for detecting such periods in noisy signals using speech dynamics, which are time-varying speech signals that may be expressed as the first- and second-order derivatives of mel cepstra, also known as the delta and delta-delta features. Unlike these derivatives, in this paper, the dynamics are highlighted by speech period candidates, which are calculated based on heuristic rules for the patterns of the first and second derivatives of the input signals. These candidates, together with the log power spectra, are input into the DNN to obtain VAD decisions. In this study, experiments are conducted to compare the proposed method with a DNN-based method, which exclusively utilizes log power spectra by using speech signals smeared with five types of noise (white, babble, factory, car, and pink) with signal-to-noise ratios (SNRs) of 10, 5, 0, and − 5 dB. The experimental results show that the proposed method is superior under all the considered noise conditions, indicating that the speech period candidates improve the log power spectra.

中文翻译：

使用 DNN 增强语音活动检测的语音动态

语音活动检测 (VAD) 是各种语音应用程序的重要预处理步骤，用于识别输入信号中的语音和非语音周期。在本文中，我们提出了一种基于深度神经网络 (DNN) 的 VAD 方法，用于使用语音动力学检测噪声信号中的此类周期，这些信号是随时间变化的语音信号，可以表示为 mel 的一阶和二阶导数cepstra，也称为 delta 和 delta-delta 特征。与这些导数不同，在本文中，动态由语音周期候选者突出显示，这些候选词是基于输入信号的一阶和二阶导数模式的启发式规则计算的。这些候选与对数功率谱一起输入到 DNN 以获得 VAD 决策。在这项研究中，进行实验以将所提出的方法与基于 DNN 的方法进行比较，该方法通过使用带有信噪比的五种类型的噪声（白色、嗡嗡声、工厂、汽车和粉红色）的语音信号来专门利用对数功率谱(SNR) 为 10、5、0 和 − 5 dB。实验结果表明，所提出的方法在所有考虑的噪声条件下都具有优越性，表明候选语音周期改善了对数功率谱。

更新日期：2018-09-12

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文

全部期刊列表>>