当前位置: X-MOL 学术IEEE Trans. Image Process. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Structural Knowledge Distillation for Efficient Skeleton-Based Action Recognition
IEEE Transactions on Image Processing ( IF 10.6 ) Pub Date : 2021-02-09 , DOI: 10.1109/tip.2021.3056895
Cunling Bian , Wei Feng , Liang Wan , Song Wang

Skeleton data have been extensively used for action recognition since they can robustly accommodate dynamic circumstances and complex backgrounds. To guarantee the action-recognition performance, we prefer to use advanced and time-consuming algorithms to get more accurate and complete skeletons from the scene. However, this may not be acceptable in time- and resource-stringent applications. In this paper, we explore the feasibility of using low-quality skeletons, which can be quickly and easily estimated from the scene, for action recognition. While the use of low-quality skeletons will surely lead to degraded action-recognition accuracy, in this paper we propose a structural knowledge distillation scheme to minimize this accuracy degradations and improve recognition model’s robustness to uncontrollable skeleton corruptions. More specifically, a teacher which observes high-quality skeletons obtained from a scene is used to help train a student which only sees low-quality skeletons generated from the same scene. At inference time, only the student network is deployed for processing low-quality skeletons. In the proposed network, a graph matching loss is proposed to distill the graph structural knowledge at an intermediate representation level. We also propose a new gradient revision strategy to seek a balance between mimicking the teacher model and directly improving the student model’s accuracy. Experiments are conducted on Kenetics400 , NTU RGB + D and Penn action recognition datasets and the comparison results demonstrate the effectiveness of our scheme.

中文翻译:

基于结构知识的高效基于骨架的动作识别

骨架数据已被广泛用于动作识别,因为它们可以稳健地适应动态环境和复杂的背景。为了保证动作识别性能,我们更喜欢使用高级且耗时的算法从场景中获取更准确,更完整的骨骼。但是,这在时间和资源紧张的应用程序中可能是不可接受的。在本文中,我们探索了使用低质量骨骼进行动作识别的可行性,该骨骼可以从场景中快速轻松地进行估计。虽然使用低质量的骨架肯定会导致动作识别精度下降,但在本文中,我们提出了一种结构化知识提炼方案,以最大程度地减少这种精度下降,并提高识别模型对不可控的骨架损坏的鲁棒性。进一步来说,老师会观察从场景中获得的高质量骨骼,以帮助培训仅看到从同一场景中生成的低质量骨骼的学生。在推论时,仅部署了学生网络来处理低质量的骨架。在所提出的网络中,提出了图匹配损失以在中间表示水平上提炼图结构知识。我们还提出了一种新的梯度修订策略,以在模仿教师模型和直接提高学生模型的准确性之间寻求平衡。实验进行 在所提出的网络中,提出了图匹配损失以在中间表示水平上提炼图结构知识。我们还提出了一种新的梯度修订策略,以在模仿教师模型和直接提高学生模型的准确性之间寻求平衡。实验进行 在所提出的网络中,提出了图匹配损失以在中间表示水平上提炼图结构知识。我们还提出了一种新的梯度修订策略,以在模仿教师模型和直接提高学生模型的准确性之间寻求平衡。实验进行Kenetics400NTU RGB + d佩恩 动作识别数据集和比较结果证明了该方案的有效性。
更新日期:2021-02-19
down
wechat
bug