Audio inpainting with generative adversarial network,arXiv - CS - Machine Learning

当前位置： X-MOL 学术 › arXiv.cs.LG › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Audio inpainting with generative adversarial network
arXiv - CS - Machine Learning Pub Date : 2020-03-13 , DOI: arxiv-2003.07704
P. P. Ebner and A. Eltelt

We study the ability of Wasserstein Generative Adversarial Network (WGAN) to generate missing audio content which is, in context, (statistically similar) to the sound and the neighboring borders. We deal with the challenge of audio inpainting long range gaps (500 ms) using WGAN models. We improved the quality of the inpainting part using a new proposed WGAN architecture that uses a short-range and a long-range neighboring borders compared to the classical WGAN model. The performance was compared with two different audio instruments (piano and guitar) and on virtuoso pianists together with a string orchestra. The objective difference grading (ODG) was used to evaluate the performance of both architectures. The proposed model outperforms the classical WGAN model and improves the reconstruction of high-frequency content. Further, we got better results for instruments where the frequency spectrum is mainly in the lower range where small noises are less annoying for human ear and the inpainting part is more perceptible. Finally, we could show that better test results for audio dataset were reached where a particular instrument is accompanist by other instruments if we train the network only on this particular instrument neglecting the other instruments.

中文翻译：

使用生成对抗网络进行音频修复

我们研究了 Wasserstein Generative Adversarial Network (WGAN) 生成缺失音频内容的能力，这些内容在上下文中（统计上相似）与声音和相邻边界有关。我们使用 WGAN 模型应对音频修复远程间隙（500 毫秒）的挑战。与经典的 WGAN 模型相比，我们使用新提出的 WGAN 架构提高了修复部分的质量，该架构使用短程和长程相邻边界。表演与两种不同的音频乐器（钢琴和吉他）以及演奏家钢琴家和弦乐团进行了比较。客观差异分级（ODG）用于评估两种架构的性能。所提出的模型优于经典的 WGAN 模型，并改进了高频内容的重建。更多，对于频谱主要在较低范围内的仪器，我们得到了更好的结果，其中小噪音对人耳的干扰较小，修复部分更容易察觉。最后，如果我们仅在该特定乐器上训练网络而忽略其他乐器，我们可以证明在特定乐器与其他乐器伴奏的情况下，音频数据集的测试结果更好。

更新日期：2020-03-18

点击分享查看原文

点击收藏

阅读更多本刊最新论文