SHetConv: target keypoint detection based on heterogeneous convolution neural networks,Multimedia Systems

当前位置： X-MOL 学术 › Multimedia Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

SHetConv: target keypoint detection based on heterogeneous convolution neural networks
Multimedia Systems ( IF 3.9 ) Pub Date : 2021-01-27 , DOI: 10.1007/s00530-020-00729-7
Xiaojie Yin , Ning He , Xiaoxiao Liu , Ke Lu

Keypoint detection is an important research topic in target recognition and classification. This paper studies the detection of keypoints in images of Amur tigers and proposes a target keypoint detection method based on heterogeneous convolution neural networks. Because of the limited storage capacity of the monitoring device and higher accuracy requirement, we propose a heterogeneous convolution called SHetConv, which is composed of group convolution and standard convolution. We use two kinds of SHetConv, one to reduce the computational costs [number of FLOPs (FLOPs stands for the floating-point operations per second .)] and one to increase the receptive field. To further improve the effectiveness of the model, we propose a feature fusion module to make full use of the semantic information and spatial information of images. We evaluate the algorithm on Tiger Pose Keypoint, CIFAR-10 and MPII datasets. The experimental results show that our method has a better accuracy, recall rate and \({F_{{1}}}\)-score than other state-of-the-art keypoint detection methods. Moreover, the number of parameters and FLOPs are substantially reduced. Specifically, the number of parameter and FLOPs of the Our (scaled network + fusion module + shet2) model are 0.14 and 0.143 times those of the big HRNet-W48 model, and its \({F_{{1}}}\)-score is increased by 0.3%.

中文翻译：

SHetConv：基于异构卷积神经网络的目标关键点检测

关键点检测是目标识别和分类的重要研究课题。本文研究了东北虎图像中关键点的检测，提出了一种基于异构卷积神经网络的目标关键点检测方法。由于监视设备的存储容量有限以及对精度的更高要求，我们提出了一种称为SHetConv的异构卷积，它由组卷积和标准卷积组成。我们使用两种SHetConv，一种用于减少计算成本[FLOP的数量（FLOP表示每秒的浮点运算。）]，另一种用于增加接收域。为了进一步提高模型的有效性，我们提出了一种特征融合模块，以充分利用图像的语义信息和空间信息。我们在Tiger Pose Keypoint，CIFAR-10和MPII数据集上评估了该算法。实验结果表明，该方法具有较高的准确性，查全率和\（{F _ {{1}}} \） -比其他最新的关键点检测方法得分更高。此外，参数和FLOP的数量大大减少。具体来说，我们的（缩放网络+融合模块+ shet2）模型的参数和FLOP数量是大型HRNet-W48模型及其\（{F _ {{1}}} \）的参数和FLOP的0.14和0.143倍-得分提高0.3％。

更新日期：2021-01-28

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>