当前位置: X-MOL 学术IEEE Trans. Geosci. Remote Sens. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
EMTCAL: Efficient Multiscale Transformer and Cross-Level Attention Learning for Remote Sensing Scene Classification
IEEE Transactions on Geoscience and Remote Sensing ( IF 8.2 ) Pub Date : 2022-07-28 , DOI: 10.1109/tgrs.2022.3194505
Xu Tang 1 , Mingteng Li 1 , Jingjing Ma 1 , Xiangrong Zhang 1 , Fang Liu 2 , Licheng Jiao 1
Affiliation  

In recent years, convolutional neural network (CNN)-based methods have been widely used for remote sensing (RS) scene classification tasks and have achieved excellent results. However, CNNs are not good at exploring contextual information, which is essential for fully understanding RS scenes. A new model named transformer attracts researchers’ attention to address this problem, which is skilled in mining the latent contextual information in RS scenes. Nevertheless, since the contents of RS scenes are diverse in type and various in scale, the performance of the original transformer in RS scene classification cannot reach what we expect. In addition, due to the specific self-attention mechanism, the time costs of the transformer are high, which hinders its practicability in the RS community. To overcome the above limitations, we propose a new model named efficient multiscale transformer and cross-level attention learning (EMTCAL) for RS scene classification in this article. EMTCAL combines the advantages of CNN and transformer to mine information within RS scenes fully. First, it uses a multilayer feature extraction module (MFEM) to acquire global visual features and multilevel convolutional features from RS scenes. Second, a contextual information extraction module (CIEM) is proposed to capture rich contextual information from multilevel features. In CIEM, taking the characteristics of RS scenes and the computational complexity into account, we propose an efficient multiscale transformer (EMST). EMST can mine the abundant knowledge with various scales hidden in RS scenes and model their inherent relations at small time costs. Third, a cross-level attention module (CLAM) is developed to aggregate and explore correlations of multilevel features. Finally, a class score fusion module (CSFM) is designed to integrate the contributions of global and aggregated multilevel features for discriminative scene representations. Extensive experiments are conducted on three public RS scene datasets. The positive results demonstrate that our EMTCAL can achieve superior classification performance and outperform many state-of-the-art methods. Our source codes are available in https://github.com/TangXu-Group/Remote-Sensing-Images-Classification/tree/main/EMTCAL.

中文翻译:

EMTCAL:用于遥感场景分类的高效多尺度变换器和跨级注意学习

近年来,基于卷积神经网络(CNN)的方法已广泛用于遥感(RS)场景分类任务,并取得了优异的成绩。然而,CNN 并不擅长探索上下文信息,这对于充分理解 RS 场景至关重要。一种名为 Transformer 的新模型吸引了研究人员的注意力来解决这个问题,它擅长挖掘 RS 场景中的潜在上下文信息。然而,由于 RS 场景的内容类型多样、规模多样,原始 Transformer 在 RS 场景分类中的表现并不能达到我们的预期。另外,由于特定的self-attention机制,transformer的时间成本较高,阻碍了其在RS社区的实用性。为了克服上述限制,我们在本文中提出了一种名为高效多尺度变换器和跨级注意学习(EMTCAL)的新模型,用于 RS 场景分类。EMTCAL 结合 CNN 和 Transformer 的优势,充分挖掘 RS 场景中的信息。首先,它使用多层特征提取模块(MFEM)从 RS 场景中获取全局视觉特征和多级卷积特征。其次,提出了一个上下文信息提取模块(CIEM)来从多级特征中捕获丰富的上下文信息。在 CIEM 中,考虑到 RS 场景的特点和计算复杂性,我们提出了一种高效的多尺度变换器(EMST)。EMST 可以挖掘隐藏在 RS 场景中的各种尺度的丰富知识,并以较小的时间成本对它们的内在关系进行建模。第三,开发了一个跨级注意模块(CLAM)来聚合和探索多级特征的相关性。最后,设计了一个类分数融合模块(CSFM)来整合全局和聚合多级特征对判别场景表示的贡献。在三个公共 RS 场景数据集上进行了广泛的实验。积极的结果表明,我们的 EMTCAL 可以实现卓越的分类性能,并优于许多最先进的方法。我们的源代码可在 积极的结果表明,我们的 EMTCAL 可以实现卓越的分类性能,并优于许多最先进的方法。我们的源代码可在 积极的结果表明,我们的 EMTCAL 可以实现卓越的分类性能,并优于许多最先进的方法。我们的源代码可在https://github.com/TangXu-Group/Remote-Sensing-Images-Classification/tree/main/EMTCAL.
更新日期:2022-07-28
down
wechat
bug