当前位置: X-MOL 学术Pattern Recogn. Lett. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Occlusion-aware spatial attention transformer for occluded object recognition
Pattern Recognition Letters ( IF 5.1 ) Pub Date : 2022-05-10 , DOI: 10.1016/j.patrec.2022.05.006
Jiseong Heo 1 , Yooseung Wang 1 , Jihun Park 1
Affiliation  

Object classification under partial occlusion has been challenging for deep convolutional neural networks due to their innate locality in extracting features. We propose an Occlusion-aware Spatial Attention Transformer (OSAT) architecture based on Vision Transformer (ViT), CutMix augmentation, and Occlusion Mask Predictor (OMP) to solve the occlusion problem. ViT mainly utilizes the self-attention mechanism, which enables the model to capture spatially distant information. In addition, for occluded image augmentation, we combine CutMix augmentation with ViT. OMP is used as a multi-task learning method and for spatial attention on non-occluded region. Our proposed OSAT achieves state-of-the-art performance on occluded vehicle classification datasets from PASCAL3D+ and MS-COCO. Moreover, additional experiments show that OMP outperforms previous approach in occluder localization both quantitatively and qualitatively. According to our ablation studies, ViT is effective at analyzing occluded objects, and our approach of CutMix augmentation and OMP led to further improvements.



中文翻译:

用于遮挡对象识别的遮挡感知空间注意力转换器

由于深度卷积神经网络在提取特征方面的先天局部性,部分遮挡下的对象分类一直具有挑战性。我们提出了一种基于视觉转换器 (ViT)、CutMix 增强和遮挡掩码预测器 (OMP) 的遮挡感知空间注意转换器 (OSAT) 架构来解决遮挡问题。ViT 主要利用自注意力机制,使模型能够捕获空间上遥远的信息。此外,对于遮挡图像增强,我们将 CutMix 增强与 ViT 相结合。OMP 被用作多任务学习方法和非遮挡区域的空间注意力。我们提出的 OSAT 在来自 PASCAL3D+ 和 MS-COCO 的遮挡车辆分类数据集上实现了最先进的性能。而且,额外的实验表明,OMP 在数量和质量上都优于以前的遮挡定位方法。根据我们的消融研究,ViT 在分析被遮挡对象方面是有效的,我们的 CutMix 增强和 OMP 方法导致了进一步的改进。

更新日期:2022-05-13
down
wechat
bug