Weakly Supervised Semantic Segmentation With Consistency-Constrained Multiclass Attention for Remote Sensing Scenes,IEEE Transactions on Geoscience and Remote Sensing

当前位置： X-MOL 学术 › IEEE Trans. Geosci. Remote Sens. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Weakly Supervised Semantic Segmentation With Consistency-Constrained Multiclass Attention for Remote Sensing Scenes
IEEE Transactions on Geoscience and Remote Sensing ( IF 8.2 ) Pub Date : 2024-04-23 , DOI: 10.1109/tgrs.2024.3392737
Junjie Zhang ₁ , Qiming Zhang ₁ , Yongshun Gong ₂ , Jian Zhang ₃ , Liang Chen ₄ , Dan Zeng ₁

Affiliation

Obtaining image-level class labels for remote sensing (RS) images is a relatively straightforward process, sparking significant interest in weakly supervised semantic segmentation (WSSS). However, RS images present challenges beyond those encountered in generic WSSS, including complex backgrounds, densely distributed small objects, and considerable scale variations. To address the above issues, we introduce a consistency-constrained multiclass attention model, noted as CocoaNet. Specifically, CocoaNet endeavors to capture both semantic correlation and class distinctiveness using a global-local adaptive attention mechanism, which integrates the self-attention to model global correlation, complemented by a local perception branch that intensifies focus on local regions. The resulting class-specific attention weights and the patch-level pairwise affinity weights are employed to optimize the initial class activation maps (CAMs). This mechanism proves highly effective in mitigating interclass interference and managing the distribution of densely clustered small objects. Moreover, we invoke a consistency constraint to rectify activation inaccuracy. By utilizing a Siamese structure for the mutual supervision of features extracted from images at different scales, we address substantial scale variations in RS scenes. Simultaneously, a class contrast loss is adopted to enhance the discriminativeness of class-specific features. Departing from the conventional CAM optimization, which is rather complex and time-consuming, we harness the prior knowledge from the generic segment anything model (SAM) to design a joint optimization strategy (JOS) that refines target boundaries and further promotes discriminative visual features. We validate the effectiveness of our proposed approach on three benchmark datasets in multiclass RS scenarios, and the experimental results demonstrate that our model yields promising advancements compared to state-of-the-art methods.

中文翻译：

遥感场景中具有一致性约束多类注意力的弱监督语义分割

获取遥感 (RS) 图像的图像级类别标签是一个相对简单的过程，引发了人们对弱监督语义分割 (WSSS) 的浓厚兴趣。然而，RS 图像所面临的挑战超出了一般 WSSS 所遇到的挑战，包括复杂的背景、密集分布的小物体和相当大的尺度变化。为了解决上述问题，我们引入了一种一致性约束的多类注意力模型，称为 CocoaNet。具体来说，CocoaNet 致力于使用全局-局部自适应注意力机制来捕获语义相关性和类别独特性，该机制整合了自注意力来建模全局相关性，并辅以加强对局部区域的关注的局部感知分支。由此产生的特定类别注意力权重和补丁级成对亲和力权重用于优化初始类别激活图（CAM）。事实证明，这种机制在减轻类间干扰和管理密集簇小对象的分布方面非常有效。此外，我们调用一致性约束来纠正激活的不准确性。通过利用 Siamese 结构对从不同尺度的图像中提取的特征进行相互监督，我们解决了 RS 场景中巨大的尺度变化问题。同时，采用类别对比损失来增强类别特定特征的辨别力。与相当复杂且耗时的传统 CAM 优化不同，我们利用通用分段任何模型 (SAM) 的先验知识来设计联合优化策略 (JOS)，以细化目标边界并进一步提升辨别性视觉特征。我们在多类 RS 场景中的三个基准数据集上验证了我们提出的方法的有效性，实验结果表明，与最先进的方法相比，我们的模型取得了有希望的进步。

更新日期：2024-04-23

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>