Pattern Recognition Letters ( IF 3.255 ) Pub Date : 2021-02-12 , DOI: 10.1016/j.patrec.2021.02.003 Cui-jin Li; Zhong Qu; Sheng-ye Wang; Ling Liu
Improving the detection accuracy and speed is the prerequisite of multi-object recognition in the complex traffic environment. Despite object detection has made significant advances based on deep neural networks, it remains a challenge to focus on small and occlusion objects. We address this challenge by allowing multiscale fusion. We introduce a cross-layer fusion multi-object detection and recognition algorithm based on Faster R-CNN, an approach that the five-layer structure of VGG16 (Visual Geometry Group) is used to obtain more characteristic information. We implement this idea with lateral embedding the 1×1 convolution kernel, max pooling and deconvolution, in conjunction with weighted balanced multi-class cross entropy loss function and Soft-NMS to control the imbalance between difficult and easy samples. Considering the actual situation in a complex traffic environment, we manually label mixed dataset. On Cityscapes and KITTI datasets, experimental results show that the proposed model achieves better effects than the current mainstream object detection models.
中文翻译:

复杂交通环境下基于改进快速R-CNN模型的跨层融合多目标检测与识别方法
提高检测精度和速度是复杂交通环境中多目标识别的前提。尽管基于深度神经网络的对象检测已经取得了重大进展,但是关注小型和遮挡对象仍然是一个挑战。我们通过允许多尺度融合来应对这一挑战。我们介绍了一种基于Faster R-CNN的跨层融合多目标检测和识别算法,该方法采用VGG16(视觉几何组)的五层结构获取更多特征信息。我们通过横向嵌入1×1卷积核,最大池化和反卷积以及加权平衡的多类交叉熵损失函数和Soft-NMS来实现这一想法,以控制难样本和易样本之间的不平衡。考虑到复杂交通环境中的实际情况,我们手动标记了混合数据集。在Cityscapes和KITTI数据集上,实验结果表明,与当前主流对象检测模型相比,该模型具有更好的效果。