当前位置: X-MOL 学术EURASIP J. Audio Speech Music Proc. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Sparse pursuit and dictionary learning for blind source separation in polyphonic music recordings
EURASIP Journal on Audio, Speech, and Music Processing ( IF 1.7 ) Pub Date : 2021-01-28 , DOI: 10.1186/s13636-020-00190-4
Sören Schulze , Emily J. King

We propose an algorithm for the blind separation of single-channel audio signals. It is based on a parametric model that describes the spectral properties of the sounds of musical instruments independently of pitch. We develop a novel sparse pursuit algorithm that can match the discrete frequency spectra from the recorded signal with the continuous spectra delivered by the model. We first use this algorithm to convert an STFT spectrogram from the recording into a novel form of log-frequency spectrogram whose resolution exceeds that of the mel spectrogram. We then make use of the pitch-invariant properties of that representation in order to identify the sounds of the instruments via the same sparse pursuit method. As the model parameters which characterize the musical instruments are not known beforehand, we train a dictionary that contains them, using a modified version of Adam. Applying the algorithm on various audio samples, we find that it is capable of producing high-quality separation results when the model assumptions are satisfied and the instruments are clearly distinguishable, but combinations of instruments with similar spectral characteristics pose a conceptual difficulty. While a key feature of the model is that it explicitly models inharmonicity, its presence can also still impede performance of the sparse pursuit algorithm. In general, due to its pitch-invariance, our method is especially suitable for dealing with spectra from acoustic instruments, requiring only a minimal number of hyperparameters to be preset. Additionally, we demonstrate that the dictionary that is constructed for one recording can be applied to a different recording with similar instruments without additional training.

中文翻译:

稀疏追求和字典学习,用于和弦音乐录音中的盲源分离

我们提出了一种用于单声道音频信号的盲分离的算法。它基于一个参数模型,该模型独立于音高来描述乐器声音的频谱特性。我们开发了一种新颖的稀疏追踪算法,该算法可以将记录信号的离散频谱与模型传递的连续频谱进行匹配。我们首先使用该算法将STFT频谱图从记录转换为对数频率频谱图的新形式,其分辨率超过了mel频谱图的分辨率。然后,我们利用该表示的音高不变特性,以通过相同的稀疏追踪方法识别乐器的声音。由于表征乐器的模型参数事先未知,因此我们训练了包含这些参数的字典,使用亚当的修改版。将算法应用到各种音频样本中,我们发现,当满足模型假设并且可以清楚地区分仪器时,它能够产生高质量的分离结果,但是具有相似频谱特征的仪器组合会带来概念上的困难。尽管该模型的一个关键特征是它可以显式地对不和谐进行建模,但它的存在也仍然会阻碍稀疏追踪算法的性能。通常,由于其音高不变性,我们的方法特别适合处理声学乐器的频谱,只需要预设最少数量的超参数即可。另外,
更新日期:2021-01-28
down
wechat
bug