当前位置:
X-MOL 学术
›
arXiv.cs.SD
›
论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Filtered Noise Shaping for Time Domain Room Impulse Response Estimation From Reverberant Speech
arXiv - CS - Sound Pub Date : 2021-07-15 , DOI: arxiv-2107.07503 Christian J. Steinmetz, Vamsi Krishna Ithapu, Paul Calamia
arXiv - CS - Sound Pub Date : 2021-07-15 , DOI: arxiv-2107.07503 Christian J. Steinmetz, Vamsi Krishna Ithapu, Paul Calamia
Deep learning approaches have emerged that aim to transform an audio signal
so that it sounds as if it was recorded in the same room as a reference
recording, with applications both in audio post-production and augmented
reality. In this work, we propose FiNS, a Filtered Noise Shaping network that
directly estimates the time domain room impulse response (RIR) from reverberant
speech. Our domain-inspired architecture features a time domain encoder and a
filtered noise shaping decoder that models the RIR as a summation of decaying
filtered noise signals, along with direct sound and early reflection
components. Previous methods for acoustic matching utilize either large models
to transform audio to match the target room or predict parameters for
algorithmic reverberators. Instead, blind estimation of the RIR enables
efficient and realistic transformation with a single convolution. An evaluation
demonstrates our model not only synthesizes RIRs that match parameters of the
target room, such as the $T_{60}$ and DRR, but also more accurately reproduces
perceptual characteristics of the target room, as shown in a listening test
when compared to deep learning baselines.
中文翻译:
用于从混响语音估计时域房间脉冲响应的滤波噪声整形
已经出现了旨在转换音频信号的深度学习方法,使其听起来就像是在与参考录音相同的房间中录制的,在音频后期制作和增强现实中都有应用。在这项工作中,我们提出了 FiNS,这是一种滤波噪声整形网络,可直接从混响语音中估计时域房间脉冲响应 (RIR)。我们受领域启发的架构具有时域编码器和滤波噪声整形解码器,可将 RIR 建模为衰减滤波噪声信号以及直接声音和早期反射分量的总和。先前的声学匹配方法利用大型模型来转换音频以匹配目标房间或预测算法混响器的参数。反而,RIR 的盲估计通过单个卷积实现了高效和现实的转换。评估表明,我们的模型不仅合成了与目标房间参数匹配的 RIR,例如 $T_{60}$ 和 DRR,而且更准确地再现了目标房间的感知特征,如听力测试中所示深度学习基线。
更新日期:2021-07-16
中文翻译:
用于从混响语音估计时域房间脉冲响应的滤波噪声整形
已经出现了旨在转换音频信号的深度学习方法,使其听起来就像是在与参考录音相同的房间中录制的,在音频后期制作和增强现实中都有应用。在这项工作中,我们提出了 FiNS,这是一种滤波噪声整形网络,可直接从混响语音中估计时域房间脉冲响应 (RIR)。我们受领域启发的架构具有时域编码器和滤波噪声整形解码器,可将 RIR 建模为衰减滤波噪声信号以及直接声音和早期反射分量的总和。先前的声学匹配方法利用大型模型来转换音频以匹配目标房间或预测算法混响器的参数。反而,RIR 的盲估计通过单个卷积实现了高效和现实的转换。评估表明,我们的模型不仅合成了与目标房间参数匹配的 RIR,例如 $T_{60}$ 和 DRR,而且更准确地再现了目标房间的感知特征,如听力测试中所示深度学习基线。