TransCD: scene change detection via transformer-based architecture,Optics Express

当前位置： X-MOL 学术 › Opt. Express › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

TransCD: scene change detection via transformer-based architecture
Optics Express ( IF 3.2 ) Pub Date : 2021-11-30 , DOI: 10.1364/oe.440720
Zhixue Wang ₁ , Yu Zhang ₁ , Lin Luo ₁ , Nan Wang ₁

Affiliation

Scene change detection (SCD) is a task to identify changes of interest between bi-temporal images acquired at different times. A critical idea of SCD is how to identify interesting changes while overcoming noisy changes induced by camera motion or environment variation, such as viewpoint, dynamic changes and outdoor conditions. The noisy changes cause corresponding pixel pairs to have spatial difference (position relation) and temporal difference (intensity relation). Due to the limitation of local receptive field, it is difficult for traditional models based on convolutional neural network (CNN) to establish long-range relations for the semantic changes. In order to address the above challenges, we explore the potential of a transformer in SCD and propose a transformer-based SCD architecture (TransCD). From the intuition that a SCD model should be able to model both interesting and noisy changes, we incorporate a siamese vision transformer (SViT) in a feature difference SCD framework. Our motivation is that SViT is able to establish global semantic relations and model long-range context, which is more robust to noisy changes. In addition, different from the pure CNN-based models with high computational complexity, the proposed model is more efficient and has fewer parameters. Extensive experiments on the CDNet-2014 dataset demonstrate that the proposed TransCD (SViT-E1-D1-32) outperforms the state-of-the-art SCD models and achieves 0.9361 in terms of the F1 score with an improvement of 7.31%.

中文翻译：

TransCD：通过基于变压器的架构进行场景变化检测

场景变化检测 (SCD) 是一项识别在不同时间获取的双时态图像之间感兴趣的变化的任务。SCD 的一个关键思想是如何识别有趣的变化，同时克服由摄像机运动或环境变化引起的噪声变化，例如视点、动态变化和室外条件。噪声变化导致对应的像素对具有空间差异（位置关系）和时间差异（强度关系）。由于局部感受野的限制，基于卷积神经网络（CNN）的传统模型难以为语义变化建立长期关系。为了解决上述挑战，我们探索了变压器在 SCD 中的潜力，并提出了一种基于变压器的 SCD 架构（TransCD）。根据 SCD 模型应该能够对有趣和嘈杂的变化进行建模的直觉，我们在特征差异 SCD 框架中加入了一个孪生视觉变换器 (SViT)。我们的动机是 SViT 能够建立全局语义关系和建模远程上下文，这对嘈杂的变化更加鲁棒。此外，与具有高计算复杂度的纯基于CNN的模型不同，所提出的模型效率更高且参数更少。CDNet-2014 数据集上的大量实验表明，所提出的 TransCD (SViT-E1-D1-32) 优于最先进的 SCD 模型，在 F1 分数方面达到 0.9361，提高了 7.31%。我们的动机是 SViT 能够建立全局语义关系和建模远程上下文，这对嘈杂的变化更加鲁棒。此外，与具有高计算复杂度的纯基于CNN的模型不同，所提出的模型效率更高且参数更少。CDNet-2014 数据集上的大量实验表明，所提出的 TransCD (SViT-E1-D1-32) 优于最先进的 SCD 模型，在 F1 分数方面达到 0.9361，提高了 7.31%。我们的动机是 SViT 能够建立全局语义关系和建模远程上下文，这对嘈杂的变化更加鲁棒。此外，与具有高计算复杂度的纯基于CNN的模型不同，所提出的模型效率更高且参数更少。CDNet-2014 数据集上的大量实验表明，所提出的 TransCD (SViT-E1-D1-32) 优于最先进的 SCD 模型，在 F1 分数方面达到 0.9361，提高了 7.31%。

更新日期：2021-12-06

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11