Change Detection Meets Visual Question Answering,IEEE Transactions on Geoscience and Remote Sensing

当前位置： X-MOL 学术 › IEEE Trans. Geosci. Remote Sens. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Change Detection Meets Visual Question Answering
IEEE Transactions on Geoscience and Remote Sensing ( IF 8.2 ) Pub Date : 2022-09-23 , DOI: 10.1109/tgrs.2022.3203314
Zhenghang Yuan ₁ , Lichao Mou ₁ , Zhitong Xiong ₁ , Xiao Xiang Zhu ₁

Affiliation

The Earth’s surface is continually changing, and identifying changes plays an important role in urban planning and sustainability. Although change detection techniques have been successfully developed for many years, these techniques are still limited to experts and facilitators in related fields. In order to provide every user with flexible access to change information and help them better understand land-cover changes, we introduce a novel task: change detection-based visual question answering (CDVQA) on multitemporal aerial images. In particular, multitemporal images can be queried to obtain high-level change-based information according to content changes between two input images. We first build a CDVQA dataset, including multitemporal image–question–answer triplets using an automatic question–answer generation method. Then, a baseline CDVQA framework is devised in this work, and it contains four parts: multitemporal feature encoding, multitemporal fusion, multimodal fusion, and answer prediction. In addition, we also introduce a change enhancing module to multitemporal feature encoding, aiming at incorporating more change-related information. Finally, the effects of different backbones and multitemporal fusion strategies are studied on the performance of CDVQA task. The experimental results provide useful insights for developing better CDVQA models, which are important for future research on this task. The dataset will be available at https://github.com/YZHJessica/CDVQA .

中文翻译：

变更检测遇到视觉问答

地球表面不断变化，识别变化在城市规划和可持续发展中发挥着重要作用。尽管变化检测技术已经成功开发多年，但这些技术仍然仅限于相关领域的专家和促进者。为了让每个用户都能灵活地访问变化信息并帮助他们更好地了解土地覆盖变化，我们引入了一项新任务：基于变化检测的多时相航拍图像视觉问答（CDVQA）。特别是，可以根据两个输入图像之间的内容变化查询多时相图像以获得基于变化的高级信息。我们首先构建了一个 CDVQA 数据集，包括使用自动问答生成方法的多时间图像-问答三元组。然后，在这项工作中设计了一个基线 CDVQA 框架，它包含四个部分：多时间特征编码、多时间融合、多模态融合和答案预测。此外，我们还在多时态特征编码中引入了变化增强模块，旨在整合更多与变化相关的信息。最后，研究了不同主干和多时间融合策略对 CDVQA 任务性能的影响。实验结果为开发更好的 CDVQA 模型提供了有用的见解，这对于这项任务的未来研究很重要。该数据集将在我们还为多时态特征编码引入了一个变化增强模块，旨在整合更多与变化相关的信息。最后，研究了不同主干和多时间融合策略对 CDVQA 任务性能的影响。实验结果为开发更好的 CDVQA 模型提供了有用的见解，这对于这项任务的未来研究很重要。该数据集将在我们还为多时态特征编码引入了一个变化增强模块，旨在整合更多与变化相关的信息。最后，研究了不同主干和多时间融合策略对 CDVQA 任务性能的影响。实验结果为开发更好的 CDVQA 模型提供了有用的见解，这对于这项任务的未来研究很重要。该数据集将在https://github.com/YZHJessica/CDVQA .

更新日期：2022-09-23

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>