当前位置: X-MOL 学术EURASIP J. Audio Speech Music Proc. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Learning long-term filter banks for audio source separation and audio scene classification
EURASIP Journal on Audio, Speech, and Music Processing ( IF 2.4 ) Pub Date : 2018-05-30 , DOI: 10.1186/s13636-018-0127-7
Teng Zhang , Ji Wu

Abstract■■■Filter banks on short-time Fourier transform (STFT) spectrogram have long been studied to analyze and process audios. The frameshift in STFT procedure determines the temporal resolution. However, in many discriminative audio applications, long-term time and frequency correlations are needed. The authors in this work use Toeplitz matrix motivated filter banks to extract long-term time and frequency information. This paper investigates the mechanism of long-term filter banks and the corresponding spectrogram reconstruction method. The time duration and shape of the filter banks are well designed and learned using neural networks. We test our approach on different tasks. The spectrogram reconstruction error in audio source separation task is reduced by relatively 6.7% and the classification error in audio scene classification task is reduced by relatively 6.5%, when compared with the traditional frequency filter banks. The experiments also show that the time duration of long-term filter banks in classification task is much larger than in reconstruction task.

中文翻译:

学习用于音频源分离和音频场景分类的长期滤波器组

摘要■■■短时傅立叶变换(STFT) 频谱图上的滤波器组长期以来一直被研究用于分析和处理音频。STFT 程序中的移码决定了时间分辨率。然而,在许多判别音频应用中,需要长期的时间和频率相关性。这项工作的作者使用 Toeplitz 矩阵激励滤波器组来提取长期时间和频率信息。本文研究了长期滤波器组的机制和相应的频谱图重建方法。滤波器组的持续时间和形状经过精心设计,并使用神经网络学习。我们在不同的任务上测试我们的方法。音源分离任务中的频谱图重建误差相对减少了6。7%,音频场景分类任务的分类误差相对于传统频率滤波器组降低了 6.5%。实验还表明,长期滤波器组在分类任务中的持续时间远大于重建任务。
更新日期:2018-05-30
down
wechat
bug