MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training,arXiv - CS - Information Retrieval

当前位置： X-MOL 学术 › arXiv.cs.IR › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training
arXiv - CS - Information Retrieval Pub Date : 2021-06-10 , DOI: arxiv-2106.05630
Mingliang Zeng, Xu Tan, Rui Wang, Zeqian Ju, Tao Qin, Tie-Yan Liu

Symbolic music understanding, which refers to the understanding of music from the symbolic data (e.g., MIDI format, but not audio), covers many music applications such as genre classification, emotion classification, and music pieces matching. While good music representations are beneficial for these applications, the lack of training data hinders representation learning. Inspired by the success of pre-training models in natural language processing, in this paper, we develop MusicBERT, a large-scale pre-trained model for music understanding. To this end, we construct a large-scale symbolic music corpus that contains more than 1 million music songs. Since symbolic music contains more structural (e.g., bar, position) and diverse information (e.g., tempo, instrument, and pitch), simply adopting the pre-training techniques from NLP to symbolic music only brings marginal gains. Therefore, we design several mechanisms, including OctupleMIDI encoding and bar-level masking strategy, to enhance pre-training with symbolic music data. Experiments demonstrate the advantages of MusicBERT on four music understanding tasks, including melody completion, accompaniment suggestion, genre classification, and style classification. Ablation studies also verify the effectiveness of our designs of OctupleMIDI encoding and bar-level masking strategy in MusicBERT.

中文翻译：

MusicBERT：具有大规模预训练的符号音乐理解

符号音乐理解是指从符号数据（例如，MIDI 格式，但不是音频）中理解音乐，涵盖了许多音乐应用，例如流派分类、情感分类和音乐片段匹配。虽然良好的音乐表征对这些应用有益，但缺乏训练数据阻碍了表征学习。受自然语言处理中预训练模型成功的启发，在本文中，我们开发了 MusicBERT，这是一种用于音乐理解的大规模预训练模型。为此，我们构建了一个包含超过 100 万首音乐歌曲的大规模符号音乐语料库。由于象征性音乐包含更多结构性（例如小节、位置）和多样化信息（例如速度、乐器和音高），简单地采用从 NLP 到符号音乐的预训练技术只会带来边际收益。因此，我们设计了多种机制，包括 OctupleMIDI 编码和小节级屏蔽策略，以增强符号音乐数据的预训练。实验证明了 MusicBERT 在四个音乐理解任务上的优势，包括旋律完成、伴奏建议、流派分类和风格分类。消融研究还验证了我们在 MusicBERT 中设计的 OctupleMIDI 编码和条级屏蔽策略的有效性。包括旋律补全、伴奏建议、体裁分类、风格分类。消融研究还验证了我们在 MusicBERT 中设计的 OctupleMIDI 编码和条级屏蔽策略的有效性。包括旋律补全、伴奏建议、体裁分类、风格分类。消融研究还验证了我们在 MusicBERT 中设计的 OctupleMIDI 编码和条级屏蔽策略的有效性。

更新日期：2021-06-11

点击分享查看原文

点击收藏

阅读更多本刊最新论文