当前位置: X-MOL 学术Int. J. Comput. Vis. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
SODA: Weakly Supervised Temporal Action Localization Based on Astute Background Response and Self-Distillation Learning
International Journal of Computer Vision ( IF 19.5 ) Pub Date : 2021-05-31 , DOI: 10.1007/s11263-021-01473-9
Tao Zhao , Junwei Han , Le Yang , Binglu Wang , Dingwen Zhang

Weakly supervised temporal action localization is a practical yet challenging task. Although great efforts have been made in recent years, the existing methods still have limited capacity in dealing with the challenges of over-localization, joint-localization, and under-localization. Based on our investigation, the first two challenges arise from insufficient ability to suppress background response, while the third challenge is due to the lack of discovering action frames. To better address these challenges, we first propose the astute background response strategy. By enforcing the classification target of the background category to be zero, such a strategy can endow the conductive effect between video-level classification and frame-level classification, thus guiding the action category to suppress responses at background frames astutely and helping address the over-localization and joint-localization challenges. For alleviating the under-localization challenge, we introduce the self-distillation learning strategy. It simultaneously learns one master network and multiple auxiliary networks, where the auxiliary networks enhance the master network to discover complete action frames. Experimental results on three benchmarks demonstrate the favorable performance of the proposed method against previous counterparts, and its efficacy to tackle the existing three challenges.



中文翻译:

SODA:基于敏锐背景响应和自蒸馏学习的弱监督时间动作定位

弱监督的时间动作定位是一项实用但具有挑战性的任务。尽管近年来做出了很大的努力,但现有方法在应对过度本地化、联合本地化和欠本地化的挑战方面的能力仍然有限。根据我们的调查,前两个挑战来自抑制背景响应的能力不足,而第三个挑战是由于缺乏发现动作框架。为了更好地应对这些挑战,我们首先提出了精明的背景响应策略。通过强制背景类别的分类目标为零,这种策略可以赋予视频级分类和帧级分类之间的传导作用,从而引导动作类别巧妙地抑制背景帧的响应,并帮助解决过度定位和联合定位的挑战。为了缓解本地化不足的挑战,我们引入了自蒸馏学习策略。它同时学习一个主网络和多个辅助网络,其中辅助网络增强主网络以发现完整的动作框架。在三个基准上的实验结果证明了所提出的方法相对于以前的同类方法的良好性能,及其解决现有三个挑战的有效性。它同时学习一个主网络和多个辅助网络,其中辅助网络增强主网络以发现完整的动作框架。在三个基准上的实验结果证明了所提出的方法相对于以前的同类方法的良好性能,及其解决现有三个挑战的有效性。它同时学习一个主网络和多个辅助网络,其中辅助网络增强主网络以发现完整的动作框架。在三个基准上的实验结果证明了所提出的方法相对于以前的同类方法的良好性能,及其解决现有三个挑战的有效性。

更新日期:2021-05-31
down
wechat
bug