TensorFlow Audio Models in Essentia,arXiv - CS - Sound

当前位置： X-MOL 学术 › arXiv.cs.SD › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

TensorFlow Audio Models in Essentia
arXiv - CS - Sound Pub Date : 2020-03-16 , DOI: arxiv-2003.07393
Pablo Alonso-Jim\'enez, Dmitry Bogdanov, Jordi Pons, Xavier Serra

Essentia is a reference open-source C++/Python library for audio and music analysis. In this work, we present a set of algorithms that employ TensorFlow in Essentia, allow predictions with pre-trained deep learning models, and are designed to offer flexibility of use, easy extensibility, and real-time inference. To show the potential of this new interface with TensorFlow, we provide a number of pre-trained state-of-the-art music tagging and classification CNN models. We run an extensive evaluation of the developed models. In particular, we assess the generalization capabilities in a cross-collection evaluation utilizing both external tag datasets as well as manual annotations tailored to the taxonomies of our models.

中文翻译：

Essentia 中的 TensorFlow 音频模型

Essentia 是用于音频和音乐分析的参考开源 C++/Python 库。在这项工作中，我们提出了一组在 Essentia 中使用 TensorFlow 的算法，允许使用预先训练的深度学习模型进行预测，并旨在提供使用灵活性、易于扩展性和实时推理。为了展示这个新接口与 TensorFlow 的潜力，我们提供了许多预先训练的最先进的音乐标记和分类 CNN 模型。我们对开发的模型进行了广泛的评估。特别是，我们使用外部标签数据集以及针对我们模型分类法量身定制的手动注释来评估跨集合评估中的泛化能力。

更新日期：2020-03-18

点击分享查看原文

点击收藏

阅读更多本刊最新论文