当前位置: X-MOL 学术Pattern Recogn. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Towards Interpretable and Robust Hand Detection via Pixel-wise Prediction
Pattern Recognition ( IF 7.5 ) Pub Date : 2020-09-01 , DOI: 10.1016/j.patcog.2020.107202
Dan Liu , Libo Zhang , Tiejian Luo , Lili Tao , Yanjun Wu

Abstract The lack of interpretability of existing CNN-based hand detection methods makes it difficult to understand the rationale behind their predictions. In this paper, we propose a novel neural network model, which introduces interpretability into hand detection for the first time. The main improvements include: (1) Detect hands at pixel level to explain what pixels are the basis for its decision and improve transparency of the model. (2) The explainable Highlight Feature Fusion block highlights distinctive features among multiple layers and learns discriminative ones to gain robust performance. (3) We introduce a transparent representation, the rotation map, to learn rotation features instead of complex and non-transparent rotation and derotation layers. (4) Auxiliary supervision accelerates the training process, which saves more than 10 h in our experiments. Experimental results on the VIVA and Oxford hand detection and tracking datasets show competitive accuracy of our method compared with state-of-the-art methods with higher speed. Models and code are available: https://isrc.iscas.ac.cn/gitlab/research/pr2020-phdn .

中文翻译:

通过像素预测实现可解释和鲁棒的手部检测

摘要 现有基于 CNN 的手部检测方法缺乏可解释性,因此很难理解其预测背后的基本原理。在本文中,我们提出了一种新颖的神经网络模型,该模型首次将可解释性引入手部检测中。主要改进包括: (1) 在像素级别检测手部,解释哪些像素是其决策的基础,并提高模型的透明度。(2)可解释的Highlight Feature Fusion块突出了多层之间的独特特征,并学习了判别性特征以获得稳健的性能。(3) 我们引入了一个透明的表示,即旋转图,来学习旋转特征,而不是复杂和非透明的旋转和反旋转层。(4)辅助督导加速训练过程,这在我们的实验中节省了 10 多个小时。VIVA 和牛津手部检测和跟踪数据集的实验结果表明,与具有更高速度的最新方法相比,我们的方法具有竞争性的准确性。模型和代码可用:https://isrc.iscas.ac.cn/gitlab/research/pr2020-phdn。
更新日期:2020-09-01
down
wechat
bug