当前位置: X-MOL 学术IEEE Signal Process. Lett. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Autonomous In-Situ Soundscape Augmentation via Joint Selection of Masker and Gain
IEEE Signal Processing Letters ( IF 3.2 ) Pub Date : 7-27-2022 , DOI: 10.1109/lsp.2022.3194419
Karn N. Watcharasupat 1 , Kenneth Ooi 2 , Bhan Lam 2 , Trevor Wong 2 , Zhen-Ting Ong 2 , Woon-Seng Gan 2
Affiliation  

The selection of maskers and playback gain levels in an in-situ soundscape augmentation system is crucial to its effectiveness in improving the overall acoustic comfort of a given environment. Traditionally, the selection of appropriate maskers and gain levels has been informed by expert opinion, which may not be representative of the target population, or by listening tests, which can be time- and labor-intensive. Furthermore, the resulting static choices of masker and gain are often inflexible to dynamic real-world soundscapes. In this work, we utilized a deep learning model to perform joint selection of the optimal masker and its gain level for a given soundscape. The proposed model was designed with highly modular building blocks, allowing for an optimized inference process that can quickly search through a large number of masker-gain combinations. In addition, we introduced the use of feature-domain soundscape augmentation conditioned on the digital gain level, eliminating the computationally expensive waveform-domain mixing process during inference, as well as the tedious gain adjustment process required for new maskers. The proposed system was evaluated on a large-scale dataset of subjective responses to augmented soundscapes with 442 participants, with the best model achieving a mean squared error of 0.122±0.005 on pleasantness score, validating the ability of the model to predict combined effect of the masker and its gain level on the perceptual pleasantness level. The proposed system thus allows in-situ or mixed-reality soundscape augmentation to be performed autonomously with near real-time latency while continuously accounting for changes in acoustic environments.

中文翻译:


通过联合选择掩蔽器和增益进行自主原位声景增强



原位声景增强系统中掩蔽器和回放增益级别的选择对于提高给定环境的整体声学舒适度的有效性至关重要。传统上,适当的掩蔽器和增益水平的选择是根据专家意见(可能不能代表目标人群)或听力测试(可能需要大量时间和人力)来确定的。此外,由此产生的掩蔽器和增益的静态选择对于动态的现实世界音景通常不灵活。在这项工作中,我们利用深度学习模型来联合选择给定音景的最佳掩蔽器及其增益水平。所提出的模型采用高度模块化的构建块进行设计,可以实现优化的推理过程,可以快速搜索大量掩蔽增益组合。此外,我们引入了基于数字增益级别的特征域声景增强的使用,消除了推理过程中计算成本高昂的波形域混合过程,以及新掩蔽器所需的繁琐的增益调整过程。所提出的系统在 442 名参与者对增强音景的主观反应的大规模数据集上进行了评估,最佳模型在愉悦度得分上实现了 0.122±0.005 的均方误差,验证了该模型预测增强音景综合效果的能力。掩蔽器及其在感知愉悦水平上的增益水平。因此,所提出的系统允许以近乎实时的延迟自主执行原位或混合现实声景增强,同时持续考虑声学环境的变化。
更新日期:2024-08-26
down
wechat
bug