当前位置: X-MOL 学术Comput. Speech Lang › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A neural network approach for speech activity detection for Apollo corpus
Computer Speech & Language ( IF 3.1 ) Pub Date : 2020-07-30 , DOI: 10.1016/j.csl.2020.101137
Vishala Pannala , B. Yegnanarayana

This paper describes a new method for speech activity detection (SAD) based on the recently proposed single frequency filtering (SFF) analysis of speech signals and a neural network model. The SFF analysis gives instantaneous spectrum of the speech signal at each sampling instant. The frequency resolution of the spectrum is decided by the number of frequencies used in the SFF analysis, which in turn depends on the frequency spacing. Using a frequency spacing of 10 Hz and a sampling frequency of 8 kHz, a 401 dimensional spectrum, covering 0–4 kHz, is obtained at each sampling instant. This is used as a feature vector to train an artificial neural network (ANN) model to discriminate (noisy) speech and nonspeech (mostly noise). The output of the trained ANN model for a given test utterance gives speech/nonspeech decision at every sampling instant. Post processing of the decision is used for SAD. The system generated SAD is evaluated on the Apollo corpus for SAD task in terms of detection cost function (DCF). The DCF values of the proposed system on the development and evaluation datasets are 3.1% and 4.6%, respectively, whereas the DCF values of the reported baseline system are 8.6% and 11.7%, respectively.



中文翻译:

用于阿波罗语料的语音活动检测的神经网络方法

本文基于最近提出的语音信号单频滤波(SFF)分析和神经网络模型,描述了一种新的语音活动检测(SAD)方法。SFF分析给出了每个采样时刻语音信号的瞬时频谱。频谱的频率分辨率由SFF分析中使用的频率数量决定,而频率数量又取决于频率间隔。使用10 Hz的频率间隔和8 kHz的采样频率,可以在每个采样时刻获得覆盖0–4 kHz的401维频谱。这被用作特征向量来训练人工神经网络(ANN)模型以区分(嘈杂的)语音和非语音(主要是噪声)。对于给定的测试发声,经过训练的ANN模型的输出会在每个采样时刻给出语音/非语音决策。决策的后处理用于SAD。系统生成的SAD在检测成本函数(DCF)方面在Apollo语料库上评估了SAD任务。在开发和评估数据集上,拟议系统的DCF值分别为3.1%和4.6%,而所报告的基准系统的DCF值分别为8.6%和11.7%。

更新日期:2020-08-06
down
wechat
bug