Audio object classification using distributed beliefs and attention.,IEEE/ACM Transactions on Audio, Speech, and Language Processing

当前位置： X-MOL 学术 › IEEE ACM Trans. Audio Speech Lang. Process. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Audio object classification using distributed beliefs and attention.
IEEE/ACM Transactions on Audio, Speech, and Language Processing ( IF 4.1 ) Pub Date : 2020-01-15 , DOI: 10.1109/taslp.2020.2966867
Ashwin Bellur ₁ , Mounya Elhilali ₁

Affiliation

One of the unique characteristics of human hearing is its ability to recognize acoustic objects even in presence of severe noise and distortions. In this work, we explore two mechanisms underlying this ability: 1) redundant mapping of acoustic waveforms along distributed latent representations and 2) adaptive feedback based on prior knowledge to selectively attend to targets of interest. We propose a bio-mimetic account of acoustic object classification by developing a novel distributed deep belief network validated for the task of robust acoustic object classification using the UrbanSound database. The proposed distributed belief network (DBN) encompasses an array of independent sub-networks trained generatively to capture different abstractions of natural sounds. A supervised classifier then performs a readout of this distributed mapping. The overall architecture not only matches the state of the art system for acoustic object classification but leads to significant improvement over the baseline in mismatched noisy conditions (31.4% relative improvement in 0dB conditions). Furthermore, we incorporate mechanisms of attentional feedback that allows the DBN to deploy local memories of sounds targets estimated at multiple views to bias network activation when attending to a particular object. This adaptive feedback results in further improvement of object classification in unseen noise conditions (relative improvement of 54% over the baseline in 0dB conditions).

中文翻译：

使用分布式信念和注意力对音频对象进行分类。

人类听力的独特特征之一是即使在存在严重噪声和失真的情况下，它也能够识别声学对象。在这项工作中，我们探索了这种能力的两种机制：1）沿分布的潜在表示的声波波形的冗余映射； 2）基于先验知识的自适应反馈，以选择性地关注感兴趣的目标。我们通过开发一种新颖的分布式深度置信网络来提出声学对象分类的仿生解释，该分布式深度信任网络经验证可使用UrbanSound数据库进行鲁棒的声学对象分类。所提出的分布式信念网络（DBN）包含一系列经过独立训练的独立子网，这些子网经过专门训练以捕获自然声音的不同抽象。监督分类器然后执行此分布式映射的读取。总体架构不仅与声学对象分类的最新技术水平相匹配，而且在噪声不匹配的情况下也导致相对于基线的显着改善（0dB条件下的相对改善率为31.4％）。此外，我们结合了注意力反馈机制，该机制允许DBN部署在多个视图处估计的声音目标的本地存储，以在关注特定对象时偏向网络激活。这种自适应反馈可在看不见的噪声情况下进一步改善对象分类（在0dB条件下，相对于基线，相对改善了54％）。在0dB的条件下，相对改善了4％）。此外，我们结合了注意力反馈机制，该机制允许DBN部署在多个视图处估计的声音目标的本地存储，以在关注特定对象时偏向网络激活。这种自适应反馈可在看不见的噪声情况下进一步改善对象分类（在0dB条件下，相对于基线，相对改善了54％）。在0dB的条件下，相对改善了4％）。此外，我们结合了注意力反馈机制，该机制允许DBN部署在多个视图处估计的声音目标的本地存储，以在关注特定对象时偏向网络激活。这种自适应反馈可在看不见的噪声情况下进一步改善对象分类（在0dB条件下，相对于基线，相对改善了54％）。

更新日期：2020-01-15

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文