Deep perceptual embeddings for unlabelled animal sound events,The Journal of the Acoustical Society of America

当前位置： X-MOL 学术 › J. Acoust. Soc. Am. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Deep perceptual embeddings for unlabelled animal sound events
The Journal of the Acoustical Society of America ( IF 2.1 ) Pub Date : 2021-07-01 , DOI: 10.1121/10.0005475
Veronica Morfi ₁ , Robert F Lachlan ₂ , Dan Stowell ₁

Affiliation

Evaluating sound similarity is a fundamental building block in acoustic perception and computational analysis. Traditional data-driven analyses of perceptual similarity are based on heuristics or simplified linear models, and are thus limited. Deep learning embeddings, often using triplet networks, have been useful in many fields. However, such networks are usually trained using large class-labelled datasets. Such labels are not always feasible to acquire. We explore data-driven neural embeddings for sound event representation when class labels are absent, instead utilising proxies of perceptual similarity judgements. Ultimately, our target is to create a perceptual embedding space that reflects animals' perception of sound. We create deep perceptual embeddings for bird sounds using triplet models. In order to deal with the challenging nature of triplet loss training with the lack of class-labelled data, we utilise multidimensional scaling (MDS) pretraining, attention pooling, and a triplet mining scheme. We also evaluate the advantage of triplet learning compared to learning a neural embedding from a model trained on MDS alone. Using computational proxies of similarity judgements, we demonstrate the feasibility of the method to develop perceptual models for a wide range of data based on behavioural judgements, helping us understand how animals perceive sounds.

中文翻译：

未标记动物声音事件的深度感知嵌入

评估声音相似性是声学感知和计算分析的基本组成部分。感知相似性的传统数据驱动分析基于启发式或简化的线性模型，因此受到限制。深度学习嵌入，通常使用三元网络，在许多领域都很有用。然而，这样的网络通常是使用大型分类标签数据集进行训练的。这样的标签并不总是可行的。当类标签不存在时，我们探索数据驱动的神经嵌入来表示声音事件，而不是利用感知相似性判断的代理。最终，我们的目标是创建一个反映动物对声音感知的感知嵌入空间。我们使用三元组模型为鸟的声音创建深度感知嵌入。为了应对缺乏类标签数据的三元组损失训练的挑战性，我们利用多维缩放（MDS）预训练、注意力池和三元组挖掘方案。我们还评估了三元组学习与从仅在 MDS 上训练的模型中学习神经嵌入相比的优势。使用相似性判断的计算代理，我们证明了基于行为判断为广泛的数据开发感知模型的方法的可行性，帮助我们了解动物如何感知声音。我们还评估了三元组学习与从仅在 MDS 上训练的模型中学习神经嵌入相比的优势。使用相似性判断的计算代理，我们证明了基于行为判断为广泛的数据开发感知模型的方法的可行性，帮助我们了解动物如何感知声音。我们还评估了三元组学习与从仅在 MDS 上训练的模型中学习神经嵌入相比的优势。使用相似性判断的计算代理，我们证明了基于行为判断为广泛的数据开发感知模型的方法的可行性，帮助我们了解动物如何感知声音。

更新日期：2021-07-01

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文