Nonlinear waveform distortion: Assessment and detection of clipping on speech data and systems,Speech Communication

当前位置： X-MOL 学术 › Speech Commun. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Nonlinear waveform distortion: Assessment and detection of clipping on speech data and systems
Speech Communication ( IF 2.4 ) Pub Date : 2021-08-12 , DOI: 10.1016/j.specom.2021.07.007
John H L Hansen ₁ , Allen Stauffer ₁ , Wei Xia ₁

Affiliation

Speech, speaker, and language systems have traditionally relied on carefully collected speech material for training acoustic models. There is an enormous amount of freely accessible audio content. A major challenge, however, is that such data is not professionally recorded, and therefore may contain a wide diversity of background noise, nonlinear distortions, or other unknown environmental or technology-based contamination or mismatch. There is a crucial need for automatic analysis to screen such unknown datasets before acoustic model development training, or to perform input audio purity screening prior to classification. In this study, we propose a waveform based clipping detection algorithm for naturalistic audio streams and examine the impact of clipping at different severities on speech quality measurements and automatic speaker recognition systems. We use the TIMIT and NIST SRE08 corpora as case studies. The results show, as expected, that clipping introduces a nonlinear distortion into clean speech data, which reduces speech quality and performance for speaker recognition. We also investigate what degree of clipping can be present to sustain effective speech system performance. The proposed detection system, which will be released, could contribute to massive new audio collections for speech and language technology development (e.g. Google Audioset (Gemmeke et al., 2017), CRSS-UTDallas Apollo Fearless-Steps (Yu et al., 2014) (19,000 h naturalistic audio from NASA Apollo missions)).

中文翻译：

非线性波形失真：评估和检测语音数据和系统的削波

语音、说话者和语言系统传统上依赖于精心收集的语音材料来训练声学模型。有大量可免费访问的音频内容。然而，一个主要挑战是此类数据没有经过专业记录，因此可能包含各种各样的背景噪声、非线性失真或其他未知的环境或基于技术的污染或不匹配。迫切需要在声学模型开发训练之前进行自动分析以筛选此类未知数据集，或在分类之前执行输入音频纯度筛选。在这项研究中，我们为自然音频流提出了一种基于波形的削波检测算法，并检查了不同严重程度的削波对语音质量测量和自动说话人识别系统的影响。我们使用 TIMIT 和 NIST SRE08 语料库作为案例研究。结果表明，正如预期的那样，削波会在干净的语音数据中引入非线性失真，从而降低语音质量和说话人识别的性能。我们还研究了可以存在何种程度的削波来维持有效的语音系统性能。即将发布的拟议检测系统可能有助于语音和语言技术开发的大量新音频集合（例如 Google Audioset (Gemmeke et al., 2017)、CRSS-UTDallas Apollo Fearless-Steps (Yu et al., 2014) ) (19,

更新日期：2021-09-17

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11