Temporal Auditory Coding Features for Causal Speech Enhancement,Electronics

当前位置： X-MOL 学术 › Electronics › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Temporal Auditory Coding Features for Causal Speech Enhancement
Electronics ( IF 2.6 ) Pub Date : 2020-10-16 , DOI: 10.3390/electronics9101698
Iordanis Thoidis , Lazaros Vrysis , Dimitrios Markou , George Papanikolaou

Perceptually motivated audio signal processing and feature extraction have played a key role in the determination of high-level semantic processes and the development of emerging systems and applications, such as mobile phone telecommunication and hearing aids. In the era of deep learning, speech enhancement methods based on neural networks have seen great success, mainly operating on the log-power spectra. Although these approaches surpass the need for exhaustive feature extraction and selection, it is still unclear whether they target the important sound characteristics related to speech perception. In this study, we propose a novel set of auditory-motivated features for single-channel speech enhancement by fusing temporal envelope and temporal fine structure information in the context of vocoder-like processing. A causal gated recurrent unit (GRU) neural network is employed to recover the low-frequency amplitude modulations of speech. Experimental results indicate that the exploited system achieves considerable gains for normal-hearing and hearing-impaired listeners, in terms of objective intelligibility and quality metrics. The proposed auditory-motivated feature set achieved better objective intelligibility results compared to the conventional log-magnitude spectrogram features, while mixed results were observed for simulated listeners with hearing loss. Finally, we demonstrate that the proposed analysis/synthesis framework provides satisfactory reconstruction accuracy of speech signals.

中文翻译：

因果语音增强的时间听觉编码功能

感知动机的音频信号处理和特征提取在高级语义过程的确定以及新兴系统和应用程序（例如移动电话电信和助听器）的开发中起着关键作用。在深度学习时代，基于神经网络的语音增强方法取得了巨大成功，主要用于对数功率谱。尽管这些方法超出了详尽的特征提取和选择的需求，但仍不清楚它们是否针对与语音感知相关的重要声音特征。在这项研究中，我们提出了一套新的听觉动机特征，通过在类声码器处理的背景下融合时间包络和时间精细结构信息来增强单通道语音。使用因果门控递归单元（GRU）神经网络恢复语音的低频幅度调制。实验结果表明，在客观清晰度和质量指标方面，利用该系统可以为正常听觉和听力受损的听众带来可观的收益。与传统的对数幅度谱图特征相比，所提出的听觉动机特征集获得了更好的客观清晰度结果，而对于患有听力损失的模拟听众则观察到了混合结果。最后，我们证明了所提出的分析/合成框架提供了令人满意的语音信号重建精度。实验结果表明，在客观清晰度和质量指标方面，利用该系统可以为正常听觉和听力受损的听众带来可观的收益。与传统的对数幅度谱图特征相比，所提出的听觉动机特征集获得了更好的客观清晰度结果，而对于患有听力损失的模拟听众则观察到了混合结果。最后，我们证明了所提出的分析/综合框架提供了令人满意的语音信号重建精度。实验结果表明，在客观清晰度和质量指标方面，利用该系统可以为正常听觉和听力受损的听众带来可观的收益。与传统的对数幅度谱图特征相比，所提出的听觉动机特征集获得了更好的客观清晰度结果，而对于患有听力损失的模拟听众则观察到了混合结果。最后，我们证明了所提出的分析/综合框架提供了令人满意的语音信号重建精度。而模拟听力下降的听众则有不同的结果。最后，我们证明了所提出的分析/合成框架提供了令人满意的语音信号重建精度。而模拟听力下降的听众则有不同的结果。最后，我们证明了所提出的分析/合成框架提供了令人满意的语音信号重建精度。

更新日期：2020-10-17

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11