当前位置: X-MOL 学术J. Visual Commun. Image Represent. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Diverse Features Fusion Network for video-based action recognition
Journal of Visual Communication and Image Representation ( IF 2.6 ) Pub Date : 2021-04-16 , DOI: 10.1016/j.jvcir.2021.103121
Haoyang Deng , Jun Kong , Min Jiang , Tianshan Liu

The two-stream convolutional network has been proved to be one milestone in the study of video-based action recognition. Lots of recent works modify internal structure of two-stream convolutional network directly and put top-level features into a 2D/3D convolution fusion module or a simpler one. However, these fusion methods cannot fully utilize features and the way fusing only top-level features lacks rich vital details. To tackle these issues, a novel network called Diverse Features Fusion Network (DFFN) is proposed. The fusion stream of DFFN contains two types of uniquely designed modules, the diverse compact bilinear fusion (DCBF) module and the channel-spatial attention (CSA) module, to distill and refine diverse compact spatiotemporal features. The DCBF modules use the diverse compact bilinear algorithm to fuse features extracted from multiple layers of the base network that are called diverse features in this paper. Further, the CSA module leverages channel attention and multi-size spatial attention to boost key information as well as restraining the noise of fusion features. We evaluate our three-stream network DFFN on three public challenging video action benchmarks: UCF101, HMDB51 and Something-Something V1. Experiment results indicate that our method achieves state-of-the-art performance.



中文翻译:

多种功能融合网络,用于基于视频的动作识别

在基于视频的动作识别研究中,两流卷积网络已被证明是一个里程碑。许多最近的工作直接修改了两流卷积网络的内部结构,并将顶级特征放入2D / 3D卷积融合模块或一个更简单的模块中。但是,这些融合方法无法充分利用特征,并且仅融合顶层特征的方法缺少丰富的重要细节。为了解决这些问题,提出了一种新颖的称为多元特征融合网络(DFFN)的网络。DFFN的融合流包含两种类型的独特设计的模块:多样化的紧凑型双线性融合(DCBF)模块和通道空间注意(CSA)模块,以提炼和完善多样化的紧凑型时空特征。DCBF模块使用多样化的紧凑型双线性算法融合从基础网络的多层中提取的特征(在本文中称为多样化特征)。此外,CSA模块利用频道关注度和多尺寸空间关注度来增强关键信息并抑制融合特征的噪声。我们根据三个具有挑战性的公开视频行动基准评估我们的三流网络DFFN:UCF101,HMDB51和Something-Something V1。实验结果表明,我们的方法达到了最先进的性能。我们根据三个具有挑战性的公开视频行动基准评估我们的三流网络DFFN:UCF101,HMDB51和Something-Something V1。实验结果表明,我们的方法达到了最先进的性能。我们根据三个具有挑战性的公开视频行动基准评估我们的三流网络DFFN:UCF101,HMDB51和Something-Something V1。实验结果表明,我们的方法达到了最先进的性能。

更新日期:2021-04-21
down
wechat
bug