当前位置: X-MOL 学术IEEE Signal Process. Lett. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Self-Supervised Learning of Audio Representations From Permutations With Differentiable Ranking
IEEE Signal Processing Letters ( IF 3.9 ) Pub Date : 2021-03-19 , DOI: 10.1109/lsp.2021.3067635
Andrew N. Carr , Quentin Berthet , Mathieu Blondel , Olivier Teboul , Neil Zeghidour

Self-supervised pre-training using so-called “pretext” tasks has recently shown impressive performance across a wide range of modalities. In this work, we advance self-supervised learning from permutations, by pre-training a model to reorder shuffled parts of the spectrogram of an audio signal, to improve downstream classification performance. We make two main contributions. First, we overcome the main challenges of integrating permutation inversions into an end-to-end training scheme, using recent advances in differentiable ranking. This was heretofore sidestepped by casting the reordering task as classification, fundamentally reducing the space of permutations that can be exploited. Our experiments validate that learning from all possible permutations improves the quality of the pre-trained representations over using a limited, fixed set. Second, we show that inverting permutations is a meaningful pretext task for learning audio representations in an unsupervised fashion. In particular, we improve instrument classification and pitch estimation of musical notes by reordering spectrogram patches in the time-frequency space.

中文翻译:

从具有可区分等级的排列中进行音频表示的自我监督学习

最近,使用所谓的“借口”任务进行自我监督的预训练在各种方式中均表现出令人印象深刻的表现。在这项工作中,我们通过预先训练模型以对音频信号的声谱图的混洗部分进行重新排序来改善排列的自我监督学习,以改善下游分类性能。我们做出两个主要贡献。首先,我们利用可区分排名的最新进展,克服了将置换反演整合到端到端训练方案中的主要挑战。迄今为止,这是通过将重新排序任务转换为分类来避免的,从根本上减少了可以利用的排列空间。我们的实验证明,从所有可能的排列中学习都比使用有限的固定集提高了预训练表示的质量。第二,我们表明,反向排列是一种无监督方式学习音频表示形式的有意义的借口任务。尤其是,我们通过对时频空间中的频谱图补丁进行重新排序来改善音符的乐器分类和音高估计。
更新日期:2021-04-23
down
wechat
bug