Lyrics segmentation via bimodal text–audio representation,Natural Language Engineering

当前位置： X-MOL 学术 › Nat. Lang. Eng. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Lyrics segmentation via bimodal text–audio representation
Natural Language Engineering ( IF 2.3 ) Pub Date : 2021-05-05 , DOI: 10.1017/s1351324921000024
Michael Fell ₁ , Yaroslav Nechaev ₂ , Gabriel Meseguer-Brocal ₃ , Elena Cabrio ₄ , Fabien Gandon ₄ , Geoffroy Peeters ₅

Affiliation

Song lyrics contain repeated patterns that have been proven to facilitate automated lyrics segmentation, with the final goal of detecting the building blocks (e.g., chorus, verse) of a song text. Our contribution in this article is twofold. First, we introduce a convolutional neural network (CNN)-based model that learns to segment the lyrics based on their repetitive text structure. We experiment with novel features to reveal different kinds of repetitions in the lyrics, for instance based on phonetical and syntactical properties. Second, using a novel corpus where the song text is synchronized to the audio of the song, we show that the text and audio modalities capture complementary structure of the lyrics and that combining both is beneficial for lyrics segmentation performance. For the purely text-based lyrics segmentation on a dataset of 103k lyrics, we achieve an F-score of 67.4%, improving on the state of the art (59.2% F-score). On the synchronized text–audio dataset of 4.8k songs, we show that the additional audio features improve segmentation performance to 75.3% F-score, significantly outperforming the purely text-based approaches.

中文翻译：

通过双峰文本-音频表示进行歌词分割

歌曲歌词包含已被证明有助于自动歌词分割的重复模式，最终目标是检测歌曲文本的构建块（例如，合唱、诗句）。我们在本文中的贡献是双重的。首先，我们引入了一个基于卷积神经网络 (CNN) 的模型，该模型学习根据重复的文本结构来分割歌词。我们尝试使用新颖的功能来揭示歌词中不同类型的重复，例如基于语音和句法属性。其次，使用歌曲文本与歌曲音频同步的新颖语料库，我们表明文本和音频模态捕获歌词的互补结构，并且将两者结合有利于歌词分割性能。对于 103k 歌词数据集上的纯文本歌词分割，我们实现了 67.4% 的 F 分数，提高了最先进的水平（59.2% F 分数）。在 4.8k 歌曲的同步文本-音频数据集上，我们展示了额外的音频特征将分割性能提高到 75.3% F-score，显着优于纯基于文本的方法。

更新日期：2021-05-05

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11