MULTIMODAL ANALYSIS: Informed content estimation and audio source separation,arXiv - CS - Sound

当前位置： X-MOL 学术 › arXiv.cs.SD › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

MULTIMODAL ANALYSIS: Informed content estimation and audio source separation
arXiv - CS - Sound Pub Date : 2021-04-27 , DOI: arxiv-2104.13276
Gabriel Meseguer-Brocal

This dissertation proposes the study of multimodal learning in the context of musical signals. Throughout, we focus on the interaction between audio signals and text information. Among the many text sources related to music that can be used (e.g. reviews, metadata, or social network feedback), we concentrate on lyrics. The singing voice directly connects the audio signal and the text information in a unique way, combining melody and lyrics where a linguistic dimension complements the abstraction of musical instruments. Our study focuses on the audio and lyrics interaction for targeting source separation and informed content estimation.

中文翻译：

多模态分析：知情的内容估计和音频源分离

本文提出了在音乐信号语境下进行多模式学习的研究。在整个过程中，我们专注于音频信号和文本信息之间的交互。在与音乐相关的许多文本来源（例如评论，元数据或社交网络反馈）中，我们专注于歌词。歌声以独特的方式直接连接音频信号和文本信息，结合了旋律和歌词，其中语言维度补充了乐器的抽象性。我们的研究集中在音频和歌词交互上，以针对源分离和知情的内容估计。

更新日期：2021-04-29

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>