Decoupling music notation to improve end-to-end Optical Music Recognition,Pattern Recognition Letters

当前位置： X-MOL 学术 › Pattern Recogn. Lett. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Decoupling music notation to improve end-to-end Optical Music Recognition
Pattern Recognition Letters ( IF 3.9 ) Pub Date : 2022-04-26 , DOI: 10.1016/j.patrec.2022.04.032
María Alfaro-Contreras ₁ , Antonio Ríos-Vila ₁ , Jose J. Valero-Mas ₁ , José M. Iñesta ₁ , Jorge Calvo-Zaragoza ₁

Affiliation

Inspired by the Text Recognition field, end-to-end schemes based on Convolutional Recurrent Neural Networks (CRNN) trained with the Connectionist Temporal Classification (CTC) loss function are considered one of the current state-of-the-art techniques for staff-level Optical Music Recognition (OMR). Unlike text symbols, music-notation elements may be defined as a combination of (i) a shape primitive located in (ii) a certain position in a staff. However, this double nature is generally neglected in the learning process, as each combination is treated as a single token. In this work, we study whether exploiting such particularity of music notation actually benefits the recognition performance and, if so, which approach is the most appropriate. For that, we thoroughly review existing specific approaches that explore this premise and propose different combinations of them. Furthermore, considering the limitations observed in such approaches, a novel decoding strategy specifically designed for OMR is proposed. The results obtained with four different corpora of historical manuscripts show the relevance of leveraging this double nature of music notation since it outperforms the standard approaches where it is ignored. In addition, the proposed decoding leads to significant reductions in the error rates with respect to the other cases.

中文翻译：

解耦乐谱以提高端到端的光学音乐识别

受文本识别领域的启发，基于卷积递归神经网络 (CRNN) 的端到端方案使用连接主义时间分类 (CTC) 损失函数进行训练，被认为是当前最先进的技术之一。级光学音乐识别（OMR）。与文本符号不同，音乐符号元素可以定义为(i)位于(ii)中的形状基元的组合在职员中的某个职位。然而，这种双重性质在学习过程中通常被忽略，因为每个组合都被视为单个标记。在这项工作中，我们研究了利用音乐符号的这种特殊性是否真的有利于识别性能，如果是的话，哪种方法是最合适的。为此，我们彻底审查了探索这一前提的现有具体方法，并提出了它们的不同组合。此外，考虑到在这些方法中观察到的局限性，提出了一种专门为 OMR 设计的新型解码策略。使用四种不同的历史手稿语料库获得的结果显示了利用音乐符号的这种双重性质的相关性，因为它优于被忽略的标准方法。此外，

更新日期：2022-04-26

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11