A Novel Singing Voice Separation Method Based on a Learnable Decomposition Technique,Circuits, Systems, and Signal Processing

当前位置： X-MOL 学术 › Circuits Syst. Signal Process. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A Novel Singing Voice Separation Method Based on a Learnable Decomposition Technique
Circuits, Systems, and Signal Processing ( IF 1.8 ) Pub Date : 2020-01-08 , DOI: 10.1007/s00034-019-01338-0
Samira Mavaddati

In this paper, a new monaural singing voice separation algorithm is presented. This field of signal processing provides important information in many areas dealing with voice recognition, data retrieval, and singer identification. The proposed approach includes a sparse and low-rank decomposition model using spectrogram of the singing voice signals. The vocal and non-vocal parts of a singing voice signal are investigated as sparse and low-rank components, respectively. An alternating optimization algorithm is applied to decompose the singing voice frames using the sparse representation technique over the vocal and non-vocal dictionaries. Also, a novel voice activity detector is presented based upon the energy of the sparse coefficients to learn atoms related to the non-vocal data in the training step. In the test phase, the learned non-vocal atoms of the music instrumental part are updated according to the non-vocal components captured from the test signal using domain adaptation technique. The proposed dictionary learning process includes two coherence measures: atom–data coherence and mutual coherence to provide a learning procedure with low reconstruction error along with a proper separation in the test step. The simulation results using different measures show that the proposed method leads to significantly better results in comparison with the earlier methods in this context and the traditional procedures.

中文翻译：

一种基于可学习分解技术的新歌声分离方法

在本文中，提出了一种新的单声道歌声分离算法。这一信号处理领域在处理语音识别、数据检索和歌手识别的许多领域提供了重要信息。所提出的方法包括使用歌声信号的频谱图的稀疏和低秩分解模型。歌声信号的声乐和非声乐部分分别作为稀疏和低阶分量进行研究。应用交替优化算法在声乐和非声乐字典上使用稀疏表示技术分解歌声帧。此外，基于稀疏系数的能量，提出了一种新颖的语音活动检测器，以在训练步骤中学习与非语音数据相关的原子。在测试阶段，使用域自适应技术根据从测试信号中捕获的非人声成分更新乐器部分的学习非人声原子。提出的字典学习过程包括两个连贯性措施：原子数据连贯性和相互连贯性，以提供具有低重构误差的学习过程以及测试步骤中的适当分离。使用不同措施的模拟结果表明，与此背景下的早期方法和传统程序相比，所提出的方法产生了明显更好的结果。原子-数据相干性和相互相干性，以提供具有低重构误差的学习过程以及测试步骤中的适当分离。使用不同措施的模拟结果表明，与此背景下的早期方法和传统程序相比，所提出的方法产生了明显更好的结果。原子-数据相干性和相互相干性，以提供具有低重构误差的学习过程以及测试步骤中的适当分离。使用不同措施的模拟结果表明，与此背景下的早期方法和传统程序相比，所提出的方法产生了明显更好的结果。

更新日期：2020-01-08

点击分享查看原文

点击收藏

阅读更多本刊最新论文