当前位置:
X-MOL 学术
›
arXiv.cs.CV
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
MixTConv: Mixed Temporal Convolutional Kernels for Efficient Action Recogntion
arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2020-01-19 , DOI: arxiv-2001.06769 Kaiyu Shan, Yongtao Wang, Zhuoying Wang, Tingting Liang, Zhi Tang, Ying Chen, and Yangyan Li
arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2020-01-19 , DOI: arxiv-2001.06769 Kaiyu Shan, Yongtao Wang, Zhuoying Wang, Tingting Liang, Zhi Tang, Ying Chen, and Yangyan Li
To efficiently extract spatiotemporal features of video for action
recognition, most state-of-the-art methods integrate 1D temporal convolution
into a conventional 2D CNN backbone. However, they all exploit 1D temporal
convolution of fixed kernel size (i.e., 3) in the network building block, thus
have suboptimal temporal modeling capability to handle both long-term and
short-term actions. To address this problem, we first investigate the impacts
of different kernel sizes for the 1D temporal convolutional filters. Then, we
propose a simple yet efficient operation called Mixed Temporal Convolution
(MixTConv), which consists of multiple depthwise 1D convolutional filters with
different kernel sizes. By plugging MixTConv into the conventional 2D CNN
backbone ResNet-50, we further propose an efficient and effective network
architecture named MSTNet for action recognition, and achieve state-of-the-art
results on multiple benchmarks.
中文翻译:
MixTConv:用于有效动作识别的混合时间卷积核
为了有效地提取视频的时空特征以进行动作识别,大多数最先进的方法将一维时间卷积集成到传统的二维 CNN 主干中。然而,它们都在网络构建块中利用固定内核大小(即 3)的一维时间卷积,因此具有次优的时间建模能力来处理长期和短期行为。为了解决这个问题,我们首先研究了不同内核大小对一维时间卷积滤波器的影响。然后,我们提出了一种称为混合时间卷积 (MixTConv) 的简单而有效的操作,它由多个具有不同内核大小的深度一维卷积滤波器组成。通过将 MixTConv 插入传统的 2D CNN 主干 ResNet-50,
更新日期:2020-01-28
中文翻译:
MixTConv:用于有效动作识别的混合时间卷积核
为了有效地提取视频的时空特征以进行动作识别,大多数最先进的方法将一维时间卷积集成到传统的二维 CNN 主干中。然而,它们都在网络构建块中利用固定内核大小(即 3)的一维时间卷积,因此具有次优的时间建模能力来处理长期和短期行为。为了解决这个问题,我们首先研究了不同内核大小对一维时间卷积滤波器的影响。然后,我们提出了一种称为混合时间卷积 (MixTConv) 的简单而有效的操作,它由多个具有不同内核大小的深度一维卷积滤波器组成。通过将 MixTConv 插入传统的 2D CNN 主干 ResNet-50,