Multi-view representation for sound event recognition,Signal, Image and Video Processing

当前位置： X-MOL 学术 › Signal Image Video Process. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Multi-view representation for sound event recognition
Signal, Image and Video Processing ( IF 2.3 ) Pub Date : 2021-01-23 , DOI: 10.1007/s11760-020-01851-9
S. Chandrakala , Venkatraman M , Shreyas N , Jayalakshmi S L

The sound event recognition (SER) task is gaining lot of importance in emerging applications such as machine audition, audio surveillance, and environmental audio scene recognition. The recognition of sound events with noisy conditions in real-time surveillance applications is a difficult task. In this paper, we focus on learning patterns using multiple forms (views) of the given sound events. We propose two variants of the Multi-View Representation (MVR)-based approach for the SER task. The first variant combines the auditory image-based features and the cepstral features from sound signal. The second variant combines the statistical features extracted from the auditory images and the cepstral features of sound signal. In addition to these variants, Constant Q-transform and Variable Q-transform image-based features are also explored to study the other effective forms of multi-view representations. A discriminative model-based classifier is then used to recognize these representations as environmental sound events. The performance of the proposed MVR approaches is evaluated on three benchmark sound event datasets namely ESC-50, DCASE2016 Task 2, and DCASE2018 Task 2 for the SER task. The recognition accuracy of the proposed MVR approach is significantly better than the other approaches proposed in the recent literature.

中文翻译：

多视图表示，用于声音事件识别

在诸如机器试听，音频监视和环境音频场景识别等新兴应用中，声音事件识别（SER）任务变得越来越重要。在实时监视应用程序中识别带有噪声条件的声音事件是一项艰巨的任务。在本文中，我们专注于使用给定声音事件的多种形式（视图）的学习模式。我们为SER任务提出了两种基于多视图表示（MVR）的方法。第一种变形结合了基于听觉图像的特征和来自声音信号的倒谱特征。第二种变体结合了从听觉图像中提取的统计特征和声音信号的倒谱特征。除了这些变体之外，还研究了基于恒定Q变换和可变Q变换图像的特征，以研究多视图表示的其他有效形式。然后使用基于判别模型的分类器将这些表示识别为环境声音事件。在三个基准声音事件数据集（即SER任务的ESC-50，DCASE2016任务2和DCASE2018任务2）上评估了建议的MVR方法的性能。提出的MVR方法的识别精度明显优于最近文献中提出的其他方法。在三个基准声音事件数据集（即SER任务的ESC-50，DCASE2016任务2和DCASE2018任务2）上评估了建议的MVR方法的性能。提出的MVR方法的识别精度明显优于最近文献中提出的其他方法。在三个基准声音事件数据集（即SER任务的ESC-50，DCASE2016任务2和DCASE2018任务2）上评估了建议的MVR方法的性能。提出的MVR方法的识别精度明显优于最近文献中提出的其他方法。

更新日期：2021-01-24

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>