当前位置: X-MOL 学术IEEE Trans. Neural Netw. Learn. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
AdapNet: Adaptability Decomposing Encoder–Decoder Network for Weakly Supervised Action Recognition and Localization
IEEE Transactions on Neural Networks and Learning Systems ( IF 10.4 ) Pub Date : 2020-01-23 , DOI: 10.1109/tnnls.2019.2962815
Xiao-Yu Zhang 1 , Changsheng Li 2 , Haichao Shi 1 , Xiaobin Zhu 3 , Peng Li 4 , Jing Dong 5
Affiliation  

The point process is a solid framework to model sequential data, such as videos, by exploring the underlying relevance. As a challenging problem for high-level video understanding, weakly supervised action recognition and localization in untrimmed videos have attracted intensive research attention. Knowledge transfer by leveraging the publicly available trimmed videos as external guidance is a promising attempt to make up for the coarse-grained video-level annotation and improve the generalization performance. However, unconstrained knowledge transfer may bring about irrelevant noise and jeopardize the learning model. This article proposes a novel adaptability decomposing encoder–decoder network to transfer reliable knowledge between the trimmed and untrimmed videos for action recognition and localization by bidirectional point process modeling, given only video-level annotations. By decomposing the original features into the domain-adaptable and domain-specific ones based on their adaptability, trimmed–untrimmed knowledge transfer can be safely confined within a more coherent subspace. An encoder–decoder-based structure is carefully designed and jointly optimized to facilitate effective action classification and temporal localization. Extensive experiments are conducted on two benchmark data sets (i.e., THUMOS14 and ActivityNet1.3), and the experimental results clearly corroborate the efficacy of our method.

中文翻译:

AdapNet:用于弱监督动作识别和定位的适应性分解编码器-解码器网络

点过程是一个可靠的框架,可以通过探索潜在的相关性来对序列数据(例如视频)进行建模。作为高级视频理解的一个具有挑战性的问题,未修剪视频中的弱监督动作识别和定位引起了广泛的研究关注。通过利用公开可用的修剪视频作为外部指导进行知识转移是弥补粗粒度视频级注释并提高泛化性能的一种有前途的尝试。然而,不受约束的知识转移可能会带来不相关的噪音并危及学习模型。本文提出了一种新颖的适应性分解编码器-解码器网络,通过双向点过程建模在经过修剪和未修剪的视频之间传输可靠的知识,用于动作识别和定位,仅给出视频级别的注释。通过根据其适应性将原始特征分解为领域适应性和特定领域的特征,修剪-未修剪的知识转移可以安全地限制在一个更连贯的子空间内。基于编码器-解码器的结构经过精心设计和联合优化,以促进有效的动作分类和时间定位。在两个基准数据集(即 THUMOS14 和 ActivityNet1.3)上进行了大量实验,实验结果清楚地证实了我们方法的有效性。基于编码器-解码器的结构经过精心设计和联合优化,以促进有效的动作分类和时间定位。在两个基准数据集(即 THUMOS14 和 ActivityNet1.3)上进行了大量实验,实验结果清楚地证实了我们方法的有效性。基于编码器-解码器的结构经过精心设计和联合优化,以促进有效的动作分类和时间定位。在两个基准数据集(即 THUMOS14 和 ActivityNet1.3)上进行了大量实验,实验结果清楚地证实了我们方法的有效性。
更新日期:2020-01-23
down
wechat
bug