当前位置: X-MOL 学术Pattern Recogn. Lett. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A new fingerprint definition for effective song recognition
Pattern Recognition Letters ( IF 5.1 ) Pub Date : 2022-06-17 , DOI: 10.1016/j.patrec.2022.06.009
Salvatore Serrano , Murtadha Arif Bin Sahbudin , Chakib Chaouch , Marco Scarpa

Music and song recognition is an activity of wide interest for researchers and companies due to the intrinsic challenges and the possible economic profits it can give. Despite basic algorithms about song recognition are simple in principle, it is quite difficult to obtain an efficient and robust approach able to generate an effective algorithm for identifying songs on the fly. This statement is proved by the fact that there are very few companies in the world having their core business into this field, even if the potential market is very huge. In this paper, we propose a new approach for generating fingerprints from excerpts of songs that is the first step in implementing a complete algorithm of song recognition. Their generation is based on the Welch’s method for spectral density estimation, the use of a Mel filter bank and an exponential adaptive threshold curve in the frequency domain never used before. Even if the previous techniques are not new, at the best of our knowledge they are not used all together for fingerprint generation. Our main purpose is to show that the proposed fingerprint generation approach permits to obtain a very high accuracy in recognizing pieces of song and their position inside the song, as well as it appears robust compared to typical alteration of the audio signal. Specifically, the fingerprints we generate are highly insensitive to noise and audio lossy compression algorithms; moreover, we think the method is prone also to generate pitch insensitive fingerprints with a small modification. We show through an experimentation with a large database of songs the recognition accuracy obtained with our fingerprints is better than the landmark-based approach (already used by the famous Shazam application). This is not a negligible results because even small improvements means a very large number of more recognitions, with higher profit prospects in industrial applications. In order to better focus on the fingerprint structure and its generation algorithm, we don’t discuss any specific search algorithm, that is a subject of further work, and we use a linear search only in our experiments; in such a way, we think the goodness of the fingerprint as such is better evinced.



中文翻译:

用于有效歌曲识别的新指纹定义

音乐和歌曲识别是研究人员和公司广泛感兴趣的一项活动,因为它存在内在挑战和可能带来的经济利润。尽管关于歌曲识别的基本算法原则上很简单,但很难获得一种有效且鲁棒的方法,能够生成有效的算法来动态识别歌曲。事实证明,世界上很少有公司将其核心业务投入到这一领域,即使潜在市场非常巨大。在本文中,我们提出了一种从歌曲摘录中生成指纹的新方法,这是实现完整的歌曲识别算法的第一步。它们的生成基于 Welch 的谱密度估计方法,使用 Mel 滤波器组以及以前从未使用过的频域指数自适应阈值曲线。即使以前的技术不是新的,据我们所知,它们并没有一起用于指纹生成。我们的主要目的是表明,所提出的指纹生成方法允许在识别歌曲片段及其在歌曲中的位置方面获得非常高的准确度,并且与音频信号的典型变化相比,它看起来很稳健。具体来说,我们生成的指纹对噪声和音频有损压缩算法高度不敏感;此外,我们认为该方法还容易产生对音高不敏感的指纹,只需稍作修改。我们通过对大型歌曲数据库的实验表明,使用我们的指纹获得的识别准确性优于基于地标的方法(已被著名的 Shazam 应用程序使用)。这不是一个可以忽略的结果,因为即使是很小的改进也意味着大量的更多认可,在工业应用中具有更高的利润前景。为了更好地关注指纹结构及其生成算法,我们不讨论任何具体的搜索算法,这是进一步工作的主题,我们只在实验中使用线性搜索;这样一来,我们认为指纹本身的优点就得到了更好的体现。这不是一个可以忽略的结果,因为即使是很小的改进也意味着大量的更多认可,在工业应用中具有更高的利润前景。为了更好地关注指纹结构及其生成算法,我们不讨论任何具体的搜索算法,这是进一步工作的主题,我们只在实验中使用线性搜索;这样一来,我们认为指纹本身的优点就得到了更好的体现。这不是一个可以忽略的结果,因为即使是很小的改进也意味着大量的更多认可,在工业应用中具有更高的利润前景。为了更好地关注指纹结构及其生成算法,我们不讨论任何具体的搜索算法,这是进一步工作的主题,我们只在实验中使用线性搜索;这样一来,我们认为指纹本身的优点就得到了更好的体现。

更新日期:2022-06-17
down
wechat
bug