A Four-Stage Data Augmentation Approach to ResNet-Conformer Based Acoustic Modeling for Sound Event Localization and Detection,arXiv - CS - Sound

当前位置： X-MOL 学术 › arXiv.cs.SD › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A Four-Stage Data Augmentation Approach to ResNet-Conformer Based Acoustic Modeling for Sound Event Localization and Detection
arXiv - CS - Sound Pub Date : 2021-01-08 , DOI: arxiv-2101.02919
Qing Wang, Jun Du, Hua-Xin Wu, Jia Pan, Feng Ma, Chin-Hui Lee

In this paper, we propose a novel four-stage data augmentation approach to ResNet-Conformer based acoustic modeling for sound event localization and detection (SELD). First, we explore two spatial augmentation techniques, namely audio channel swapping (ACS) and multi-channel simulation (MCS), to deal with data sparsity in SELD. ACS and MDS focus on augmenting the limited training data with expanding direction of arrival (DOA) representations such that the acoustic models trained with the augmented data are robust to localization variations of acoustic sources. Next, time-domain mixing (TDM) and time-frequency masking (TFM) are also investigated to deal with overlapping sound events and data diversity. Finally, ACS, MCS, TDM and TFM are combined in a step-by-step manner to form an effective four-stage data augmentation scheme. Tested on the Detection and Classification of Acoustic Scenes and Events (DCASE) 2020 data sets, our proposed augmentation approach greatly improves the system performance, ranking our submitted system in the first place in the SELD task of DCASE 2020 Challenge. Furthermore, we employ a ResNet-Conformer architecture to model both global and local context dependencies of an audio sequence to yield further gains over those architectures used in the DCASE 2020 SELD evaluations.

中文翻译：

基于四级数据增强的基于ResNet-Conformer的声音建模的声音事件定位和检测方法

在本文中，我们为基于ResNet-Conformer的声音事件定位和检测（SELD）声学模型提出了一种新颖的四阶段数据增强方法。首先，我们探索两种空间增强技术，即音频通道交换（ACS）和多通道模拟（MCS），以应对SELD中的数据稀疏性。ACS和MDS专注于使用扩展的到达方向（DOA）表示来增强有限的训练数据，从而使利用增强的数据训练的声学模型对声源的定位变化具有鲁棒性。接下来，还研究了时域混合（TDM）和时频屏蔽（TFM）来处理重叠的声音事件和数据分集。最后，ACS，MCS，TDM和TFM逐步组合在一起，形成有效的四阶段数据增强方案。经过声学场景和事件检测和分类（DCASE）2020数据集的测试，我们提出的增强方法大大提高了系统性能，将我们提交的系统在DCASE 2020挑战的SELD任务中排名第一。此外，我们采用ResNet-Conformer架构对音频序列的全局和本地上下文相关性进行建模，以比DCASE 2020 SELD评估中使用的那些架构产生更多收益。

更新日期：2021-01-11

点击分享查看原文

点击收藏

阅读更多本刊最新论文