当前位置: X-MOL 学术Digit. Signal Process. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Semantic feature extraction based on subspace learning with temporal constraints for acoustic event recognition
Digital Signal Processing ( IF 2.9 ) Pub Date : 2020-12-21 , DOI: 10.1016/j.dsp.2020.102947
Qiuying Shi , Jiqing Han

In acoustic event recognition (AER), it is important to extract semantic features. As two crucial aspects of semantic features, the essential content and the temporal structure can strongly affect the understanding of humans and even computers. In this paper, we first divide each acoustic event sample into short segments. Then, for jointly considering the above two aspects, two semantic feature extraction methods are proposed by learning a low-dimensional subspace. The first method, named subspace learning with temporal constraints (SLOC), is designed for not only preserving the essential content by a low-rank approximation scheme but also capturing the temporal structure between every two chronologically ordered segments. This temporal structure is encoded by forcing the corresponding projection coefficients associated with different elements of the subspace basis to increase separately. The second method, named non-negative sparse SLOC (NSSLOC), is proposed by introducing two constraints into the basis of SLOC. Specifically, a non-negative constraint is designed to better guarantee the low-rank approximation, and a row-wise sparse constraint is employed to implement a reasonable feature selection when calculating the projection coefficients. Moreover, we propose two optimization algorithms for our methods. For each acoustic event sample, the subspace basis learned by either of our methods is adopted as semantic features that are further used for classification. Finally, the proposed methods are evaluated on the AudioEvent and the ESC-50 databases. The experimental results indicate that our methods are better than or competitive with the related state-of-the-art methods.



中文翻译:

基于时间约束的子空间学习的语义特征提取

在声音事件识别(AER)中,提取语义特征很重要。作为语义特征的两个关键方面,基本内容和时间结构会严重影响人们甚至计算机的理解。在本文中,我们首先将每个声音事件样本分成短段。然后,结合以上两个方面,通过学习低维子空间,提出了两种语义特征提取方法。第一种方法称为具有时间约束的子空间学习(SLOC),其目的不仅是通过低秩逼近方案保留基本内容,而且还可以捕获每两个按时间顺序排列的段之间的时间结构。通过迫使与子空间基础的不同元素相关联的对应投影系数分别增加,来对这种时间结构进行编码。通过在SLOC的基础上引入两个约束,提出了第二种方法,称为非负稀疏SLOC(NSSLOC)。具体而言,设计一个非负约束以更好地保证低秩逼近,并且在计算投影系数时采用行稀疏约束来实现合理的特征选择。此外,我们为我们的方法提出了两种优化算法。对于每个声音事件样本,采用我们两种方法之一学习的子空间基础都将作为语义特征,并进一步用于分类。最后,在AudioEvent和ESC-50数据库上对提出的方法进行了评估。

更新日期:2021-01-06
down
wechat
bug