当前位置: X-MOL 学术Inform. Fusion › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Data augmentation for deep visual recognition using superpixel based pairwise image fusion
Information Fusion ( IF 14.7 ) Pub Date : 2024-02-16 , DOI: 10.1016/j.inffus.2024.102308
D. Sun , F. Dornaika

Data augmentation is an important paradigm for boosting the generalization capability of deep learning in image classification tasks. Image augmentation using cut-and-paste strategies has shown very good performance improvement for deep learning. However, these existing methods often overlook the image’s discriminative local context and rely on ad hoc regions consisting of square or rectangular local regions, leading to the loss of complete semantic object parts. In this work, we attempt to overcome these limitations and propose a superpixel-wise local-context-aware efficient image fusion approach for data augmentation. Our approach requires only one forward propagation using a superpixel attention-based label fusion with less computational complexity. The model is trained using a combination of a global classification of the fused (augmented) image loss, a superpixel-wise weighted local classification loss, and a superpixel-based weighted contrastive learning loss. The last two losses are based on the superpixel-aware attentive embeddings. Thus, the resulting deep encoder can learn both local and global features of the images while capturing object-part local context and information. Experiments on diverse benchmark image datasets indicate that the proposed method out-performs many region-based augmentation methods for visual recognition. We have demonstrated its effectiveness not only on CNN models but also on transformer models. The codes are accessible at .

中文翻译:

使用基于超像素的成对图像融合进行深度视觉识别的数据增强

数据增强是提高图像分类任务中深度学习泛化能力的重要范例。使用剪切和粘贴策略的图像增强已经显示出深度学习的非常好的性能改进。然而,这些现有的方法经常忽略图像的判别性局部上下文,并依赖于由正方形或矩形局部区域组成的临时区域,导致完整语义对象部分的丢失。在这项工作中,我们试图克服这些限制,并提出一种用于数据增强的超像素局部上下文感知的高效图像融合方法。我们的方法只需要使用基于超像素注意的标签融合进行一次前向传播,计算复杂度较低。该模型使用融合(增强)图像损失的全局分类、超像素加权局部分类损失和基于超像素的加权对比学习损失的组合进行训练。最后两个损失是基于超像素感知的注意力嵌入。因此,最终的深度编码器可以学习图像的局部和全局特征,同时捕获对象部分的局部上下文和信息。对不同基准图像数据集的实验表明,所提出的方法优于许多基于区域的视觉识别增强方法。我们不仅在 CNN 模型上证明了它的有效性,还在 Transformer 模型上证明了它的有效性。这些代码可在 访问。
更新日期:2024-02-16
down
wechat
bug