Independent Deeply Learned Tensor Analysis for Determined Audio Source Separation,arXiv - CS - Sound

当前位置： X-MOL 学术 › arXiv.cs.SD › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Independent Deeply Learned Tensor Analysis for Determined Audio Source Separation
arXiv - CS - Sound Pub Date : 2021-06-10 , DOI: arxiv-2106.05529
Naoki Narisawa, Rintaro Ikeshita, Norihiro Takamune, Daichi Kitamura, Tomohiko Nakamura, Hiroshi Saruwatari, Tomohiro Nakatani

We address the determined audio source separation problem in the time-frequency domain. In independent deeply learned matrix analysis (IDLMA), it is assumed that the inter-frequency correlation of each source spectrum is zero, which is inappropriate for modeling nonstationary signals such as music signals. To account for the correlation between frequencies, independent positive semidefinite tensor analysis has been proposed. This unsupervised (blind) method, however, severely restrict the structure of frequency covariance matrices (FCMs) to reduce the number of model parameters. As an extension of these conventional approaches, we here propose a supervised method that models FCMs using deep neural networks (DNNs). It is difficult to directly infer FCMs using DNNs. Therefore, we also propose a new FCM model represented as a convex combination of a diagonal FCM and a rank-1 FCM. Our FCM model is flexible enough to not only consider inter-frequency correlation, but also capture the dynamics of time-varying FCMs of nonstationary signals. We infer the proposed FCMs using two DNNs: DNN for power spectrum estimation and DNN for time-domain signal estimation. An experimental result of separating music signals shows that the proposed method provides higher separation performance than IDLMA.

中文翻译：

用于确定音频源分离的独立深度学习张量分析

我们解决了时频域中确定的音频源分离问题。在独立深度学习矩阵分析（IDLMA）中，假设每个源频谱的频率间相关性为零，这不适合对音乐信号等非平稳信号进行建模。为了说明频率之间的相关性，已经提出了独立正半定张量分析。然而，这种无监督（盲）方法严重限制了频率协方差矩阵 (FCM) 的结构，以减少模型参数的数量。作为这些传统方法的扩展，我们在这里提出了一种使用深度神经网络 (DNN) 对 FCM 进行建模的监督方法。使用 DNN 很难直接推断 FCM。所以，我们还提出了一种新的 FCM 模型，表示为对角线 FCM 和 1 级 FCM 的凸组合。我们的 FCM 模型足够灵活，不仅可以考虑频率间相关性，还可以捕获非平稳信号的时变 FCM 的动态。我们使用两个 DNN 推断提出的 FCM：用于功率谱估计的 DNN 和用于时域信号估计的 DNN。分离音乐信号的实验结果表明，所提出的方法提供了比 IDLMA 更高的分离性能。

更新日期：2021-06-11

点击分享查看原文

点击收藏

阅读更多本刊最新论文