当前位置:
X-MOL 学术
›
EURASIP J. Audio Speech Music Proc.
›
论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Discriminative frequency filter banks learning with neural networks
EURASIP Journal on Audio, Speech, and Music Processing ( IF 2.4 ) Pub Date : 2019-01-03 , DOI: 10.1186/s13636-018-0144-6 Teng Zhang , Ji Wu
EURASIP Journal on Audio, Speech, and Music Processing ( IF 2.4 ) Pub Date : 2019-01-03 , DOI: 10.1186/s13636-018-0144-6 Teng Zhang , Ji Wu
Filter banks on spectrums play an important role in many audio applications. Traditionally, the filters are linearly distributed on perceptual frequency scale such as Mel scale. To make the output smoother, these filters are often placed so that they overlap with each other. However, fixed-parameter filters are usually in the context of psychoacoustic experiments and selected experimentally. To make filter banks discriminative, the authors use a neural network structure to learn the frequency center, bandwidth, gain, and shape of the filters adaptively when filter banks are used as a feature extractor. This paper investigates several different constraints on discriminative frequency filter banks and the dual spectrum reconstruction problem. Experiments on audio source separation and audio scene classification tasks show performance improvements of the proposed filter banks when compared with traditional fixed-parameter triangular or gaussian filters on Mel scale. The classification errors on LITIS ROUEN dataset and DCASE2016 dataset are reduced by 13.9% and 4.6% relatively.
中文翻译:
用神经网络学习判别频率滤波器组
频谱上的滤波器组在许多音频应用中发挥着重要作用。传统上,滤波器在感知频率尺度上线性分布,例如梅尔尺度。为了使输出更平滑,通常放置这些过滤器以使它们彼此重叠。然而,固定参数滤波器通常是在心理声学实验的背景下并通过实验选择的。为了使滤波器组具有辨别力,当滤波器组用作特征提取器时,作者使用神经网络结构自适应地学习滤波器的频率中心、带宽、增益和形状。本文研究了判别频率滤波器组和双谱重建问题的几种不同约束。音频源分离和音频场景分类任务的实验表明,与 Mel 尺度上的传统固定参数三角或高斯滤波器相比,所提出的滤波器组的性能有所提高。LITIS ROUEN 数据集和 DCASE2016 数据集的分类错误相对减少了 13.9% 和 4.6%。
更新日期:2019-01-03
中文翻译:
用神经网络学习判别频率滤波器组
频谱上的滤波器组在许多音频应用中发挥着重要作用。传统上,滤波器在感知频率尺度上线性分布,例如梅尔尺度。为了使输出更平滑,通常放置这些过滤器以使它们彼此重叠。然而,固定参数滤波器通常是在心理声学实验的背景下并通过实验选择的。为了使滤波器组具有辨别力,当滤波器组用作特征提取器时,作者使用神经网络结构自适应地学习滤波器的频率中心、带宽、增益和形状。本文研究了判别频率滤波器组和双谱重建问题的几种不同约束。音频源分离和音频场景分类任务的实验表明,与 Mel 尺度上的传统固定参数三角或高斯滤波器相比,所提出的滤波器组的性能有所提高。LITIS ROUEN 数据集和 DCASE2016 数据集的分类错误相对减少了 13.9% 和 4.6%。