当前位置: X-MOL 学术IEEE Trans. Ind. Inform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Uncertainty-Aware Multiview Deep Learning for Internet of Things Applications
IEEE Transactions on Industrial Informatics ( IF 11.7 ) Pub Date : 9-29-2022 , DOI: 10.1109/tii.2022.3206343
Cai Xu 1 , Wei Zhao 1 , Jinglong Zhao 1 , Ziyu Guan 1 , Xiangyu Song 2 , Jianxin Li 3
Affiliation  

The existing deep learning fusion methods mainly concentrate on convolutional neural networks (CNNs), and few attempts are made with transformer. Meanwhile, the convolutional operation is a content-independent interaction between the image and the convolution kernel, which may lose some important contexts and further limit fusion performance. Toward this end, we present a simple and strong fusion baseline for infrared and visible images, namely, residual Swin Transformer fusion network, termed SwinFuse. Our SwinFuse includes three parts: the global feature extraction, fusion layer, and feature reconstruction. In particular, we build a fully attentional feature encoding backbone to model the long-range dependencies, which is a pure transformer network and has a stronger representation ability compared with the CNNs. Moreover, we design a novel feature fusion strategy based on the [Math Processing Error]L_{1} -norm for sequence matrices and measure the corresponding activity levels from row and column vector dimensions, which can well retain competitive infrared brightness and distinct visible details. Finally, we testify our SwinFuse with nine state-of-the-art traditional and deep learning methods on three different datasets through subjective observations and objective comparisons, and the experimental results manifest that the proposed SwinFuse obtains surprising fusion performance with strong generalization ability and competitive computational efficiency. The code will be available at https://github.com/Zhishe-Wang/SwinFuse.

中文翻译:


适用于物联网应用的不确定性感知多视图深度学习



现有的深度学习融合方法主要集中在卷积神经网络(CNN),而对Transformer的尝试很少。同时,卷积操作是图像和卷积核之间与内容无关的交互,这可能会丢失一些重要的上下文并进一步限制融合性能。为此,我们提出了一种简单而强大的红外和可见光图像融合基线,即残差 Swin Transformer 融合网络,称为 SwinFuse。我们的SwinFuse包括三个部分:全局特征提取、融合层和特征重建。特别是,我们构建了一个完全注意的特征编码主干来对远程依赖关系进行建模,这是一个纯粹的变压器网络,与 CNN 相比具有更强的表示能力。此外,我们基于序列矩阵的[数学处理误差]L_{1}-范数设计了一种新颖的特征融合策略,并从行和列向量维度测量相应的活动水平,可以很好地保留有竞争力的红外亮度和独特的可见细节。最后,我们通过主观观察和客观比较,在三个不同的数据集上用九种最先进的传统和深度学习方法来验证我们的 SwinFuse,实验结果表明,所提出的 SwinFuse 获得了令人惊讶的融合性能,具有很强的泛化能力和竞争力。计算效率。代码可在 https://github.com/Zhishe-Wang/SwinFuse 获取。
更新日期:2024-08-28
down
wechat
bug