当前位置: X-MOL 学术Mach. Vis. Appl. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
An annotated image database of building facades categorized into land uses for object detection using deep learning
Machine Vision and Applications ( IF 3.3 ) Pub Date : 2022-08-27 , DOI: 10.1007/s00138-022-01335-5
Frederico Damasceno Bortoloti , Jonivane Tavares , Thomas Walter Rauber , Patrick Marques Ciarelli , Rayane Cardozo Gama Botelho

This article presents a machine learning approach to automatic land use categorization based on a convolutional artificial neural network architecture. It is intended to support the detection and classification of building facades in order to associate each building with its respective land use. Replacing the time-consuming manual acquisition of images in the field and subsequent interpretation of the data with computer-aided techniques facilitates the creation of useful maps for urban planning. A specific future objective of this study is to monitor the commercial evolution in the city of Vila Velha, Brazil. The initial step is object detection based on a deep network architecture called Faster R-CNN. The model is trained on a collection of street-level photographs of buildings of desired land uses, from a database of annotated images of building facades. Images are extracted from Google Street View scenes. Furthermore, in order to save manual annotation time, a semi-supervised dual pipeline method is proposed that uses a pre-trained predictor model from the Places365 database to learn unannotated images. Several backbones were connected to the Faster R-CNN architecture for comparisons. The experimental results with the VGG backbone show an improvement over published works, with an average accuracy of 86.49%.



中文翻译:

分类为土地用途的建筑立面带注释图像数据库,用于使用深度学习进行对象检测

本文介绍了一种基于卷积人工神经网络架构的自动土地利用分类的机器学习方法。它旨在支持建筑立面的检测和分类,以便将每栋建筑与其各自的土地使用联系起来。用计算机辅助技术代替耗时的现场图像手动采集和随后的数据解释,有助于为城市规划创建有用的地图。本研究的一个具体未来目标是监测巴西维拉韦利亚市的商业发展。第一步是基于称为 Faster R-CNN 的深度网络架构的对象检测。该模型在一系列具有所需土地用途的建筑物的街道照片上进行训练,来自建筑立面的注释图像数据库。图像是从谷歌街景场景中提取的。此外,为了节省人工标注时间,提出了一种半监督双流水线方法,该方法使用来自 Places365 数据库的预训练预测模型来学习未标注的图像。几个主干连接到 Faster R-CNN 架构以进行比较。VGG 主干的实验结果显示出比已发表的作品有所改进,平均准确率为 86.49%。几个主干连接到 Faster R-CNN 架构以进行比较。VGG 主干的实验结果显示出比已发表的作品有所改进,平均准确率为 86.49%。几个主干连接到 Faster R-CNN 架构以进行比较。VGG 主干的实验结果显示出比已发表的作品有所改进,平均准确率为 86.49%。

更新日期:2022-08-28
down
wechat
bug