Relevance-based quantization of scattering features for unsupervised mining of environmental audio,EURASIP Journal on Audio, Speech, and Music Processing

当前位置： X-MOL 学术 › EURASIP J. Audio Speech Music Proc. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Relevance-based quantization of scattering features for unsupervised mining of environmental audio
EURASIP Journal on Audio, Speech, and Music Processing ( IF 1.7 ) Pub Date : 2018-09-29 , DOI: 10.1186/s13636-018-0138-4
Vincent Lostanlen , Grégoire Lafay , Joakim Andén , Mathieu Lagrange

The emerging field of computational acoustic monitoring aims at retrieving high-level information from acoustic scenes recorded by some network of sensors. These networks gather large amounts of data requiring analysis. To decide which parts to inspect further, we need tools that automatically mine the data, identifying recurring patterns and isolated events. This requires a similarity measure for acoustic scenes that does not impose strong assumptions on the data.The state of the art in audio similarity measurement is the “bag-of-frames” approach, which models a recording using summary statistics of short-term audio descriptors, such as mel-frequency cepstral coefficients (MFCCs). They successfully characterise static scenes with little variability in auditory content, but cannot accurately capture scenes with a few salient events superimposed over static background. To overcome this issue, we propose a two-scale representation which describes a recording using clusters of scattering coefficients. The scattering coefficients capture short-scale structure, while the cluster model captures longer time scales, allowing for more accurate characterization of sparse events. Evaluation within the acoustic scene similarity framework demonstrates the interest of the proposed approach.

中文翻译：

基于相关性的散射特征量化，用于环境音频的无监督挖掘

计算声学监测的新兴领域旨在从一些传感器网络记录的声学场景中检索高级信息。这些网络收集了大量需要分析的数据。为了决定进一步检查哪些部分，我们需要能够自动挖掘数据、识别重复出现的模式和孤立事件的工具。这需要对声学场景进行相似性测量，而不会对数据强加强假设。音频相似性测量的最新技术是“帧袋”方法，它使用短期音频的汇总统计对录音进行建模描述符，例如梅尔频率倒谱系数 (MFCC)。他们成功地表征了听觉内容几乎没有变化的静态场景，但无法准确捕捉在静态背景上叠加一些显着事件的场景。为了克服这个问题，我们提出了一种两尺度表示，它描述了使用散射系数集群的记录。散射系数捕捉短尺度结构，而聚类模型捕捉更长的时间尺度，从而可以更准确地表征稀疏事件。声学场景相似性框架内的评估表明了所提出方法的兴趣。

更新日期：2018-09-29

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文