Towards Musically Meaningful Explanations Using Source Separation,arXiv - CS - Sound

当前位置： X-MOL 学术 › arXiv.cs.SD › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Towards Musically Meaningful Explanations Using Source Separation
arXiv - CS - Sound Pub Date : 2020-09-04 , DOI: arxiv-2009.02051
Verena Haunschmid, Ethan Manilow, Gerhard Widmer

Deep neural networks (DNNs) are successfully applied in a wide variety of music information retrieval (MIR) tasks. Such models are usually considered "black boxes", meaning that their predictions are not interpretable. Prior work on explainable models in MIR has generally used image processing tools to produce explanations for DNN predictions, but these are not necessarily musically meaningful, or can be listened to (which, arguably, is important in music). We propose audioLIME, a method based on Local Interpretable Model-agnostic Explanation (LIME), extended by a musical definition of locality. LIME learns locally linear models on perturbations of an example that we want to explain. Instead of extracting components of the spectrogram using image segmentation as part of the LIME pipeline, we propose using source separation. The perturbations are created by switching on/off sources which makes our explanations listenable. We first validate audioLIME on a classifier that was deliberately trained to confuse the true target with a spurious signal, and show that this can easily be detected using our method. We then show that it passes a sanity check that many available explanation methods fail. Finally, we demonstrate the general applicability of our (model-agnostic) method on a third-party music tagger.

中文翻译：

使用源分离实现音乐上有意义的解释

深度神经网络 (DNN) 已成功应用于各种音乐信息检索 (MIR) 任务。这样的模型通常被认为是“黑匣子”，这意味着它们的预测是不可解释的。MIR 中可解释模型的先前工作通常使用图像处理工具来生成 DNN 预测的解释，但这些不一定在音乐上有意义，或者可以听（可以说，这在音乐中很重要）。我们提出了 audioLIME，这是一种基于局部可解释模型不可知解释 (LIME) 的方法，通过局部的音乐定义进行了扩展。LIME 在我们想要解释的示例的扰动上学习局部线性模型。我们建议使用源分离，而不是使用图像分割作为 LIME 管道的一部分来提取频谱图的组件。扰动是通过打开/关闭源产生的，这使我们的解释易于听。我们首先在一个分类器上验证 audioLIME，该分类器被故意训练以将真实目标与虚假信号混淆，并表明使用我们的方法可以轻松检测到这一点。然后我们证明它通过了许多可用的解释方法失败的完整性检查。最后，我们展示了我们的（模型不可知的）方法在第三方音乐标注器上的普遍适用性。

更新日期：2020-09-07

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>