MASSFormer: Memory-Augmented Spectral-Spatial Transformer for Hyperspectral Image Classification,IEEE Transactions on Geoscience and Remote Sensing

当前位置： X-MOL 学术 › IEEE Trans. Geosci. Remote Sens. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

MASSFormer: Memory-Augmented Spectral-Spatial Transformer for Hyperspectral Image Classification
IEEE Transactions on Geoscience and Remote Sensing ( IF 8.2 ) Pub Date : 2024-04-22 , DOI: 10.1109/tgrs.2024.3392264
Le Sun ₁ , Hang Zhang ₂ , Yuhui Zheng ₂ , Zebin Wu ₃ , Zhonglin Ye ₄ , Haixing Zhao ₅

Affiliation

In recent years, convolutional neural networks (CNNs) have achieved remarkable success in hyperspectral image (HSI) classification tasks, primarily due to their outstanding spatial feature extraction capabilities. However, CNNs struggle to capture the diagnostic spectral information inherent in HSI. In contrast, vision transformers (ViTs) exhibit formidable prowess in handling spectral sequence information and exceling at capturing long-range correlations between pixels and bands. Nevertheless, due to the information loss during propagation, some existing transformer-based classification methods struggle to form sufficient spectral-spatial information mixing. To mitigate these limitations, we propose a memory-augmented spectral-spatial transformer (MASSFormer) for HSI classification. Specifically, MASSFormer incorporates two efficacious modules: the memory tokenizer (MT) and the memory-augmented transformer encoder (MATE). The former serves to transform spectral-spatial features into memory tokens for storing prior knowledge. The latter aims to extend traditional multihead self-attention (MHSA) operations by incorporating these memory tokens, enabling ample information blending while alleviating the potential depth decay in the model and consequently improving the model’s classification performance. Extensive experiments conducted on four benchmark datasets demonstrate that the proposed method outperforms state-of-the-art methods. The source code is available at https://github.com/hz63/ MASSFormer for the sake of reproducibility.

中文翻译：

MASSFormer：用于高光谱图像分类的记忆增强光谱空间变换器

近年来，卷积神经网络（CNN）在高光谱图像（HSI）分类任务中取得了显着的成功，这主要归功于其出色的空间特征提取能力。然而，CNN 很难捕获 HSI 固有的诊断光谱信息。相比之下，视觉变换器（ViT）在处理光谱序列信息和擅长捕获像素和波段之间的远程相关性方面表现出强大的能力。然而，由于传播过程中的信息丢失，一些现有的基于变压器的分类方法难以形成足够的光谱空间信息混合。为了减轻这些限制，我们提出了一种用于 HSI 分类的记忆增强频谱空间变换器 (MASSFormer)。具体来说，MASSFormer 包含两个有效的模块：内存分词器 (MT) 和内存增强变压器编码器 (MATE)。前者用于将光谱空间特征转换为存储先验知识的记忆标记。后者旨在通过合并这些记忆标记来扩展传统的多头自注意力（MHSA）操作，实现充分的信息混合，同时减轻模型中潜在的深度衰减，从而提高模型的分类性能。对四个基准数据集进行的广泛实验表明，所提出的方法优于最先进的方法。源代码位于https://github.com/hz63/MASSFormer 是为了重现性。

更新日期：2024-04-22

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>