SODA: Weakly Supervised Temporal Action Localization Based on Astute Background Response and Self-Distillation Learning

Zhao, Tao; Han, Junwei; Yang, Le; Wang, Binglu; Zhang, Dingwen

doi:10.1007/s11263-021-01473-9

SODA: Weakly Supervised Temporal Action Localization Based on Astute Background Response and Self-Distillation Learning

Published: 31 May 2021

Volume 129, pages 2474–2498, (2021)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Tao Zhao¹,
Junwei Han¹,
Le Yang¹,
Binglu Wang¹ &
…
Dingwen Zhang ORCID: orcid.org/0000-0001-8369-8886¹

1453 Accesses
21 Citations
Explore all metrics

Abstract

Weakly supervised temporal action localization is a practical yet challenging task. Although great efforts have been made in recent years, the existing methods still have limited capacity in dealing with the challenges of over-localization, joint-localization, and under-localization. Based on our investigation, the first two challenges arise from insufficient ability to suppress background response, while the third challenge is due to the lack of discovering action frames. To better address these challenges, we first propose the astute background response strategy. By enforcing the classification target of the background category to be zero, such a strategy can endow the conductive effect between video-level classification and frame-level classification, thus guiding the action category to suppress responses at background frames astutely and helping address the over-localization and joint-localization challenges. For alleviating the under-localization challenge, we introduce the self-distillation learning strategy. It simultaneously learns one master network and multiple auxiliary networks, where the auxiliary networks enhance the master network to discover complete action frames. Experimental results on three benchmarks demonstrate the favorable performance of the proposed method against previous counterparts, and its efficacy to tackle the existing three challenges.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

FSODv2: A Deep Calibrated Few-Shot Object Detection Network

Article 04 April 2024

Qi Fan, Wei Zhuo, … Yu-Wing Tai

Visual attention network

Article Open access 28 July 2023

Meng-Hao Guo, Cheng-Ze Lu, … Shi-Min Hu

Human Action Recognition and Prediction: A Survey

Article 28 March 2022

Yu Kong & Yun Fu

References

Bearman, A., Russakovsky, O., Ferrari, V., & Fei-Fei, L. (2016). What’s the point: Semantic segmentation with point supervision. In European conference on computer vision, (pp. 549–565). Springer
Buciluǎ, C., Caruana, R., & Niculescu-Mizil, A. (2006). Model compression. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, (pp. 535–541).
Carreira, J., & Zisserman, A. (2017). Quo vadis, action recognition? a new model and the kinetics dataset. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), (pp. 6299–6308).
Chan, L., Hosseini, M. S., & Plataniotis, KN. (2020). A comprehensive analysis of weakly-supervised semantic segmentation in different image domains. International Journal of Computer Vision, 129(2), 1–24
Chao, Y. W., Vijayanarasimhan, S., Seybold, B., Ross, D. A., Deng, J., & Sukthankar, R. (2018). Rethinking the faster r-cnn architecture for temporal action localization. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), (pp. 1130–1139).
Choe, J., & Shim, H. (2019). Attention-based dropout layer for weakly supervised object localization. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), (pp. 2219–2228).
Choe, J., Oh, S. J., Lee, S., Chun, S., Akata, Z., & Shim, H. (2020). Evaluating weakly supervised object localization methods right. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), (pp. 3133–3142).
Crowley, E. J., Gray, G., & Storkey, A. J. (2018). Moonshine: Distilling with cheap convolutions. In NeurIPS, (pp. 2893–2903).
Caba Heilbron, F., Victor Escorcia, B. G., & Niebles, J. C. (2015). Activitynet: A large-scale video benchmark for human activity understanding. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), (pp. 961–970).
Feichtenhofer, C., Fan, H., Malik, J., & He, K. (2019). Slowfast networks for video recognition. In Proceedings of the IEEE international conference on computer vision (ICCV), (pp. 6202–6211).
Gao, J., Chen, K., & Nevatia, R. (2018a). Ctap: Complementary temporal action proposal generation. In European conference on computer vision, (pp. 70–85). Springer
Gao, Z., Wang, L., Jojic, N., Niu, Z., Zheng, N., & Hua, G. (2018b). Video imprint. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(12), 3086–3099.
Article Google Scholar
Gong, C., Tao, D., Liu, W., Liu, L., & Yang, J. (2017). Label propagation via teaching-to-learn and learning-to-teach. IEEE Transactions on Neural Networks and Learning Systems, 28(6), 1452–1465.
Article Google Scholar
Gong, C., Chang, X., Fang, M., & Yang, J. (2018). Teaching semi-supervised classifier via generalized distillation. In IJCAI, (pp 2156–2162).
Gong, C., Yang, J., You, J. J., & Sugiyama, M. (2020a). Centroid estimation with guaranteed efficiency: A general framework for weakly supervised learning. IEEE Transactions on Pattern Analysis and Machine Intelligence. https://doi.org/10.1109/TPAMI.2020.3044997
Gong, G., Wang, X., Mu, Y., & Tian, Q. (2020b). Learning temporal co-attention models for unsupervised video action localization. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), (pp. 9819–9828).
Han, J., Yang, L., Zhang, D., Chang, X., & Liang, X. (2018a). Reinforcement cutting-agent learning for video object segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 9080–9089).
Han, J., Zhang, D., Cheng, G., Liu, N., & Xu, D. (2018b). Advanced deep-learning techniques for salient and category-specific object detection: a survey. IEEE Signal Processing Magazine, 35(1), 84–100.
Article Google Scholar
Hattori, H., Lee, N., Boddeti, V. N., Beainy, F., Kitani, K. M., & Kanade, T. (2018). Synthesizing a scene-specific pedestrian detector and pose estimator for static video surveillance. International Journal of Computer Vision, 126(9), 1027–1044.
Article Google Scholar
Hinton, G., Vinyals, O., & Dean, J. (2015). Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531
Hoeffding, W. (1994). Probability inequalities for sums of bounded random variables. In The Collected Works of Wassily Hoeffding, (pp. 409–426). Springer
Hou, Y., Ma, Z., Liu, C., & Loy, C. C. (2019). Learning lightweight lane detection cnns by self attention distillation. In Proceedings of the IEEE/CVF international conference on computer vision, (pp. 1013–1021).
Jain, M., Ghodrati, A., & Snoek, C. G. (2020). Actionbytes: Learning from trimmed videos to localize actions. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), (pp. 1171–1180).
Jiang, Y. G., Liu, J., Roshan Zamir, A., Toderici, G., Laptev, I., Shah, M., & Sukthankar, R. (2014). THUMOS challenge: Action recognition with a large number of classes. http://crcv.ucf.edu/THUMOS14/
Kingma, D. P., & Ba, J. (2015). Adam: A method for stochastic optimization. In International conference on learning representations (ICLR).
Kipf, T. N., & Welling, M. (2016). Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907
Lee, P., Uh, Y., & Byun, H. (2020). Background suppression network for weakly-supervised temporal action localization. In Proceedings of the AAAI conference on artificial intelligence.
Lin, T., Zhao, X., Su, H., Wang, C., & Yang, M. (2018). Bsn: Boundary sensitive network for temporal action proposal generation. In European conference on computer vision, (pp. 3–21). Springer
Lin, T., Liu, X., Li, X., Ding, E., & Wen, S. (2019). Bmn: Boundary-matching network for temporal action proposal generation. In Proceedings of the IEEE international conference on computer vision (ICCV), (pp. 3889–3898).
Liu, D., Jiang, T., & Wang, Y. (2019a). Completeness modeling and context separation for weakly supervised temporal action localization. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), (pp. 1298–1307).
Liu, L., Ouyang, W., Wang, X., Fieguth, P., Chen, J., Liu, X., et al. (2020). Deep learning for generic object detection: A survey. International journal of computer vision, 128(2), 261–318.
Article Google Scholar
Liu, Z., Wang, L., Zhang, Q., Gao, Z., Niu, Z., Zheng, N., & Hua, G. (2019b). Weakly supervised temporal action localization through contrast based evaluation networks. In Proceedings of the IEEE international conference on computer vision (ICCV), (pp. 3899–3908).
Long, F., Yao, T., Qiu, Z., Tian, X., Luo, J., & Mei, T. (2019). Gaussian temporal awareness networks for action localization. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), (pp. 344–353).
Lu, C., Shi, J., Wang, W., & Jia, J. (2019). Fast abnormal event detection. International Journal of Computer Vision, 127(8), 993–1011.
Article Google Scholar
Lu, X., Wang, W., Shen, J., Tai, Y. W., Crandall, D. J., & Hoi, S. C. (2020). Learning video object segmentation from unlabeled videos. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, (pp. 8960–8970).
Luo, Z., Guillory, D., Shi, B., Ke, W., Wan, F., Darrell, T., & Xu, H. (2020). Weakly-supervised action localization with expectation-maximization multi-instance learning. In European conference on computer vision.
Ma, F., Zhu, L., Yang, Y., Zha, S., Kundu, G., Feiszli, M., & Shou, Z. (2020). Sf-net: Single-frame supervision for temporal action localization. In European conference on computer vision.
Mettes, P., Van Gemert, J. C., & Snoek, C. G. (2016). Spot on: Action localization from pointly-supervised proposals. In European conference on computer vision, (pp. 437–453). Springer
Min, K., & Corso, J. J. (2020). Adversarial background-aware loss for weakly-supervised temporal activity localization. In European conference on computer vision, (pp. 283–299). Springer
Narayan, S., Cholakkal, H., Khan, F. S., & Shao, L. (2019). 3c-net: Category count and center loss for weakly-supervised action localization. In Proceedings of the IEEE international conference on computer vision (ICCV), (pp. 8679–8687).
Nguyen, P., Liu, T., Prasad, G., & Han, B. (2018). Weakly supervised action localization by sparse temporal pooling network. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), (pp. 6752–6761).
Nguyen, P. X., Ramanan, D., & Fowlkes, C. C. (2019). Weakly-supervised action localization with background modeling. In Proceedings of the IEEE international conference on computer vision (ICCV), (pp. 5502–5511).
Pang, Y., Zhao, X., Zhang, L., & Lu, H. (2020). Multi-scale interactive network for salient object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), (pp. 9413–9422).
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., & Antiga, L., et al. (2019). Pytorch: An imperative style, high-performance deep learning library. In Advances in neural information processing systems (NeurIPS), (pp. 8024–8035).
Paul, S., Roy, S., & Roy-Chowdhury, A. K. (2018). W-talc: Weakly-supervised temporal activity localization and classification. In European conference on computer vision, (pp. 588–607). Springer
Qiu, Z., Yao, T., & Mei, T. (2017). Learning spatio-temporal representation with pseudo-3d residual networks. In Proceedings of the IEEE international conference on computer vision (ICCV), (pp. 5533–5541).
Ramanathan, V., Wang, R., & Mahajan, D. (2020). Dlwl: Improving detection for lowshot classes with weakly labelled data. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, (pp. 9342–9352).
Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems (NeurIPS), (pp. 91–99).
Shi, B., Dai, Q., Mu, Y., & Wang, J. (2020). Weakly-supervised action localization by generative attention modeling. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), (pp. 1009–1019).
Shou, Z., Wang, D., & Chang, S. F. (2016). Temporal action localization in untrimmed videos via multi-stage cnns. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), (pp. 1049–1058).
Shou, Z., Gao, H., Zhang, L., Miyazawa, K., & Chang, SF. (2018). Autoloc: Weakly-supervised temporal action localization in untrimmed videos. In European conference on computer vision, (pp. 162–179). Springer
Simonyan, K., & Zisserman, A. (2014). Two-stream convolutional networks for action recognition in videos. In Advances in neural information processing systems (NeurIPS), (pp. 568–576).
Singh, K. K., & Lee, Y. J. (2017). Hide-and-seek: Forcing a network to be meticulous for weakly-supervised object and action localization. In Proceedings of the IEEE international conference on computer vision (ICCV), (pp. 3544–3553). IEEE
Song, L., Liu, J., Sun, M., & Shang, X. (2020). Weakly supervised group mask network for object detection. International Journal of Computer Vision, 129(3), 1–22.
Sun, G., Wang, W., Dai, J., & Van Gool, L. (2020). Mining cross-image semantics for weakly supervised semantic segmentation. In European conference on computer vision, (pp. 347–365). Springer
Toolkit, CPTG. (2019). v10. 1 documentation. URL: https://docs.nvidia.com/cuda/archive/101/pascal-tuning-guide/index.html
Tran, D., Bourdev, L., Fergus, R., Torresani, L., & Paluri, M. (2015). Learning spatiotemporal features with 3d convolutional networks. In Proceedings of the IEEE international conference on computer vision (ICCV), (pp. 4489–4497).
Wang, H., & Schmid, C. (2013). Action recognition with improved trajectories. In Proceedings of the IEEE international conference on computer vision (ICCV), (pp. 3551–3558).
Wang, H., Kläser, A., Schmid, C., & Liu, C. L. (2013). Dense trajectories and motion boundary descriptors for action recognition. International journal of computer vision, 103(1), 60–79.
Article MathSciNet Google Scholar
Wang, L., Xiong, Y., Wang, Z., Qiao, Y., Lin, D., Tang, X., & Van Gool, L. (2016). Temporal segment networks: Towards good practices for deep action recognition. In European conference on computer vision, (pp. 20–36). Springer
Wang, L., Xiong, Y., Lin, D., & Van Gool, L. (2017). Untrimmednets for weakly supervised action recognition and detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), (pp. 4325–4334).
Wei, H., Feng, L., Chen, X., & An, B. (2020). Combating noisy labels by agreement: A joint training method with co-regularization. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), (pp. 13726–13735).
Wu, R., Feng, M., Guan, W., Wang, D., Lu, H., & Ding, E. (2019a). A mutual learning method for salient object detection with intertwined multi-supervision. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), (pp. 8150–8159).
Wu, Y., Zhu, L., Yan, Y., & Yang, Y. (2019b). Dual attention matching for audio-visual event localization. In Proceedings of the IEEE/CVF international conference on computer vision, (pp. 6292–6300).
Xu, H., Das, A., & Saenko, K. (2017). R-c3d: Region convolutional 3d network for temporal activity detection. In Proceedings of the IEEE international conference on computer vision (ICCV), (pp. 5783–5792).
Xu, M., Zhao, C., Rojas, D. S., Thabet, A., & Ghanem, B. (2020). G-tad: Sub-graph localization for temporal action detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), (pp. 10156–10165).
Xu, Y., Zhang, C., Cheng, Z., Xie, J., Niu, Y., Pu, S., & Wu, F. (2019). Segregated temporal assembly recurrent networks for weakly supervised multiple action detection. In Proceedings of the AAAI conference on artificial intelligence, (vol. 33, pp. 9070–9078).
Yan, Y., Xu, C., Cai, D., & Corso, J. J. (2020). A weakly supervised multi-task ranking framework for actor-action semantic segmentation. International Journal of Computer Vision, 128(5), 1414–1432.
Article Google Scholar
Yang, L., Han, J., Zhang, D., Liu, N., & Zhang, D. (2018). Segmentation in weakly labeled videos via a semantic ranking and optical warping network. IEEE Transactions on Image Processing, 27(8), 4025–4037.
Article MathSciNet Google Scholar
Yang, L., Peng, H., Zhang, D., Fu, J., & Han, J. (2020a). Revisiting anchor mechanisms for temporal action localization. IEEE Transactions on Image Processing, 29, 8535–8548.
Article Google Scholar
Yang, T., Zhu, S., Chen, C., Yan, S., Zhang, M., & Willis, A. (2020b). Mutualnet: Adaptive convnet via mutual learning from network width and resolution. In European conference on computer vision, (pp. 299–315). Springer
Yu, T., Ren, Z., Li, Y., Yan, E., Xu, N., & Yuan, J. (2019). Temporal structure mining for weakly supervised action detection. In Proceedings of the IEEE international conference on computer vision (ICCV), (pp. 5522–5531).
Yuan, L., Tay, F. E., Li, G., Wang, T., & Feng, J. (2020). Revisiting knowledge distillation via label smoothing regularization. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), (pp. 3903–3911).
Yuan, Y., Lyu, Y., Shen, X., Tsang, I. W., & Yeung, D. Y. (2019). Marginalized average attentional network for weakly-supervised learning. In International conference on learning representations (ICLR).
Yun, S., Park, J., Lee, K., & Shin, J. (2020). Regularizing class-wise predictions via self-knowledge distillation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, (pp. 13876–13885).
Zach, C., Pock, T., & Bischof, H. (2007). A duality based approach for realtime tv-l 1 optical flow. In Joint pattern recognition symposium, Springer, (pp. 214–223).
Zeng, R., Huang, W., Tan, M., Rong, Y., Zhao, P., Huang, J., & Gan, C. (2019). Graph convolutional networks for temporal action localization. In Proceedings of the IEEE international conference on computer vision (ICCV), (pp. 7094–7103).
Zhai, Y., Wang, L., Tang, W., Zhang, Q., Yuan, J., & Hua, G. (2020). Two-stream consensus network for weakly-supervised temporal action localization. In European conference on computer vision, (pp. 37–54). Springer
Zhang, C., Xu, Y., Cheng, Z., Niu, Y., Pu, S., Wu, F., & Zou, F. (2019a). Adversarial seeded sequence growing for weakly-supervised temporal action localization. In Proceedings of the 27th ACM international conference on multimedia, (pp. 738–746).
Zhang, D., Han, J., Yang, L., & Xu, D. (2018a). Spftn: A joint learning framework for localizing and segmenting objects in weakly labeled videos. IEEE transactions on pattern analysis and machine intelligence, 42(2), 475–489.
Zhang, D., Han, J., Zhao, L., & Meng, D. (2019b). Leveraging prior-knowledge for weakly supervised object detection under a collaborative self-paced curriculum learning framework. International Journal of Computer Vision, 127(4), 363–380.
Article Google Scholar
Zhang, D., Han, J., Zhao, L., & Zhao, T. (2020a). From discriminant to complete: Reinforcement searching-agent learning for weakly supervised object detection. IEEE Transactions on Neural Networks and Learning Systems, 31(12), 5549–5560.
Article Google Scholar
Zhang, L., Song, J., Gao, A., Chen, J., Bao, C., & Ma, K. (2019c). Be your own teacher: Improve the performance of convolutional neural networks via self distillation. In Proceedings of the IEEE international conference on computer vision (ICCV), (pp. 3713–3722).
Zhang, X., Wei, Y., Feng, J., Yang, Y., & Huang, T. S. (2018b). Adversarial complementary learning for weakly supervised object localization. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), (pp. 1325–1334).
Zhang, X. Y., Shi, H., Li, C., Zheng, K., Zhu, X., & Duan, L. (2019d). Learning transferable self-attentive representations for action recognition in untrimmed videos with weak supervision. In Proceedings of the AAAI conference on artificial intelligence, (vol. 33, pp. 9227–9234).
Zhang, X. Y., Shi, H., Li, C., & Li, P. (2020b). Multi-instance multi-label action recognition and localization based on spatio-temporal pre-trimming for untrimmed videos. In AAAI, (pp. 12886–12893).
Zhang, Y., Xiang, T., Hospedales, T. M., & Lu, H. (2018c). Deep mutual learning. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), (pp. 4320–4328).
Zhao, P., Xie, L., Ju, C., Zhang, Y., Wang, Y., & Tian, Q. (2020). Bottom-up temporal action localization with mutual regularization. In European Conference on Computer Vision, (pp. 539–555). Springer
Zhong, J. X., Li, N., Kong, W., Zhang, T., Li, T. H., & Li, G. (2018). Step-by-step erasion, one-by-one collection: a weakly supervised temporal action detector. In: Proceedings of the 26th ACM international conference on Multimedia, (pp. 35–44).
Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., & Torralba, A. (2016). Learning deep features for discriminative localization. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), (pp. 2921–2929).
Zhu, L., & Yang, Y. (2020). Label independent memory for semi-supervised few-shot video classification. IEEE Transactions on Pattern Analysis and Machine Intelligence.https://doi.org/10.1109/TPAMI.2020.3007511.

Download references

Author information

Authors and Affiliations

School of Automation, Northwestern Polytechnical University, Xi’an, China
Tao Zhao, Junwei Han, Le Yang, Binglu Wang & Dingwen Zhang

Authors

Tao Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Junwei Han
View author publications
You can also search for this author in PubMed Google Scholar
Le Yang
View author publications
You can also search for this author in PubMed Google Scholar
Binglu Wang
View author publications
You can also search for this author in PubMed Google Scholar
Dingwen Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Junwei Han or Dingwen Zhang.

Additional information

Communicated by Dong Xu.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was supported by the the National Natural Science Foundation of China under Grants 61876140 and U1801265, Key-Area Research and Development Program of Guangdong Province(2019B010110001), the Research Funds for Interdisciplinary subject NWPU.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhao, T., Han, J., Yang, L. et al. SODA: Weakly Supervised Temporal Action Localization Based on Astute Background Response and Self-Distillation Learning. Int J Comput Vis 129, 2474–2498 (2021). https://doi.org/10.1007/s11263-021-01473-9

Download citation

Received: 14 December 2020
Accepted: 20 April 2021
Published: 31 May 2021
Issue Date: August 2021
DOI: https://doi.org/10.1007/s11263-021-01473-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SODA: Weakly Supervised Temporal Action Localization Based on Astute Background Response and Self-Distillation Learning

Abstract

Access this article

Similar content being viewed by others

FSODv2: A Deep Calibrated Few-Shot Object Detection Network

Visual attention network

Human Action Recognition and Prediction: A Survey

References

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

SODA: Weakly Supervised Temporal Action Localization Based on Astute Background Response and Self-Distillation Learning

Abstract

Access this article

Similar content being viewed by others

FSODv2: A Deep Calibrated Few-Shot Object Detection Network

Visual attention network

Human Action Recognition and Prediction: A Survey

References

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation