当前位置: X-MOL 学术Comput. Vis. Image Underst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Semantic segmentation from remote sensor data and the exploitation of latent learning for classification of auxiliary tasks
Computer Vision and Image Understanding ( IF 4.5 ) Pub Date : 2021-07-20 , DOI: 10.1016/j.cviu.2021.103251
Bodhiswatta Chatterjee 1 , Charalambos Poullis 1
Affiliation  

In this paper we address three different aspects of semantic segmentation from remote sensor data using deep neural networks. Firstly, we focus on the semantic segmentation of buildings from remote sensor data and propose ICT-Net: a novel network with the underlying architecture of a fully convolutional network, infused with feature re-calibrated Dense blocks at each layer. Uniquely, the proposed network combines the localization accuracy and use of context of the U-Net network architecture, the compact internal representations and reduced feature redundancy of the Dense blocks, and the dynamic channel-wise feature re-weighting of the Squeeze-and-Excitation(SE) blocks. The proposed network has been tested on the INRIA and AIRS benchmark datasets and is shown to outperform other state of the art. Secondly, as the building classification is typically the first step of the reconstruction process, we investigate the relationship of the classification accuracy to the reconstruction accuracy. Due to the lack of (1) scene depth information, and (2) ground-truth (blueprints) for large urban-areas, the evaluation of the 3D reconstructions is not possible. Thus, we use boundary localization as a proxy to reconstruction accuracy and perform the evaluation in 2D. A comparative quantitative analysis of reconstruction accuracies corresponding to different classification accuracies confirms the strong correlation between the two. We present the results which show a consistent and considerable reduction in the reconstruction accuracy. Finally, we present the simple yet compelling concept of latent learning and the implications it carries within the context of deep learning. We posit that a network trained on a primary task (i.e. building classification) is unintentionally learning about auxiliary tasks (e.g. the classification of road, tree, etc.) which are complementary to the primary task. Although embedded in a trained network, this latent knowledge relating to the auxiliary tasks is never externalized or immediately expressed but instead only knowledge relating to the primary task is ever output by the network. We experimentally prove this occurrence of incidental learning on the pre-trained ICT-Net and show how sub-classification of the negative label is possible without further training/fine-tuning. We present the results of our experiments and explain how knowledge about auxiliary and complementary tasks – for which the network was never trained – can be retrieved and utilized for further classification. We extensively tested the proposed technique on the ISPRS benchmark dataset which contains multi-label ground truth, and report an average classification accuracy (F1 score) of 54.29% (SD=17.03) for roads, 10.15% (SD=2.54) for cars, 24.11% (SD=5.25) for trees, 42.74% (SD=6.62) for low vegetation, and 18.30% (SD=16.08) for clutter. The source code and supplemental material is publicly available at http://www.theICTlab.org/lp/2020ICTNet/.



中文翻译:

遥感数据的语义分割和潜在学习对辅助任务分类的利用

在本文中,我们讨论了使用深度神经网络从远程传感器数据中进行语义分割的三个不同方面。首先,我们专注于从远程传感器数据中对建筑物进行语义分割,并提出 ICT-Net:一种具有完全卷积网络底层架构的新型网络,在每一层都注入了特征重新校准的 Dense 块。独特的是,所提出的网络结合了 U-Net 网络架构的定位精度和上下文的使用、密集块的紧凑内部表示和减少的特征冗余,以及 Squeeze-and- 的动态通道特征重新加权。激励(SE)块。提议的网络已经在 INRIA 和 AIRS 基准数据集上进行了测试,并且表现优于其他最先进的技术。第二,由于建筑物分类通常是重建过程的第一步,因此我们研究了分类精度与重建精度之间的关系。由于缺乏 (1) 场景深度信息和 (2) 大型城市地区的地面实况(蓝图),因此无法对 3D 重建进行评估。因此,我们使用边界定位作为重建精度的代理并在 2D 中进行评估。对不同分类精度对应的重建精度的比较定量分析证实了两者之间的强相关性。我们提出的结果表明重建精度一致且显着降低。最后,我们提出了一个简单而引人注目的概念 我们研究了分类精度与重建精度的关系。由于缺乏 (1) 场景深度信息和 (2) 大型城市地区的地面实况(蓝图),因此无法对 3D 重建进行评估。因此,我们使用边界定位作为重建精度的代理并在 2D 中进行评估。对不同分类精度对应的重建精度的比较定量分析证实了两者之间的强相关性。我们提出的结果表明重建精度一致且显着降低。最后,我们提出了一个简单而引人注目的概念 我们研究了分类精度与重建精度的关系。由于缺乏 (1) 场景深度信息和 (2) 大型城市地区的地面实况(蓝图),因此无法对 3D 重建进行评估。因此,我们使用边界定位作为重建精度的代理并在 2D 中进行评估。对不同分类精度对应的重建精度的比较定量分析证实了两者之间的强相关性。我们提出的结果表明重建精度一致且显着降低。最后,我们提出了一个简单而引人注目的概念 (2) 大型城市地区的真实情况(蓝图),无法对 3D 重建进行评估。因此,我们使用边界定位作为重建精度的代理并在 2D 中进行评估。对不同分类精度对应的重建精度的比较定量分析证实了两者之间的强相关性。我们提出的结果表明重建精度一致且显着降低。最后,我们提出了一个简单而引人注目的概念 (2) 大型城市地区的真实情况(蓝图),无法对 3D 重建进行评估。因此,我们使用边界定位作为重建精度的代理并在 2D 中进行评估。对不同分类精度对应的重建精度的比较定量分析证实了两者之间的强相关性。我们提出的结果表明重建精度一致且显着降低。最后,我们提出了一个简单而引人注目的概念 对不同分类精度对应的重建精度的比较定量分析证实了两者之间的强相关性。我们提出的结果表明重建精度一致且显着降低。最后,我们提出了一个简单而引人注目的概念 对不同分类精度对应的重建精度的比较定量分析证实了两者之间的强相关性。我们提出的结果表明重建精度一致且显着降低。最后,我们提出了一个简单而引人注目的概念潜在学习及其在深度学习背景下的影响。我们假设在主要任务(即建筑分类)上训练的网络无意中学习了辅助任务(例如道路、树木的分类等),这些任务是对主要任务的补充。尽管嵌入在训练有素的网络中,但与辅助任务相关的这种潜在知识永远不会被外化或立即表达,而是网络永远只输出与主要任务相关的知识。我们通过实验证明了这种在预训练的 ICT-Net 上发生的偶然学习,并展示了如何在没有进一步训练/微调的情况对负面标签进行子分类. 我们展示了我们的实验结果,并解释了如何检索有关辅助和补充任务的知识(网络从未针对这些任务进行过训练)并将其用于进一步分类。我们在包含多标签地面实况的 ISPRS 基准数据集上广泛测试了所提出的技术,并报告道路的平均分类准确度(F1 分数)为 54.29%(SD=17.03),汽车为 10.15%(SD=2.54),树木为 24.11% (SD=5.25),低植被为 42.74% (SD=6.62),杂波为 18.30% (SD=16.08)。源代码和补充材料可在 http://www.theICTlab.org/lp/2020ICTNet/ 上公开获得。

更新日期:2021-07-24
down
wechat
bug