当前位置: X-MOL 学术ISPRS J. Photogramm. Remote Sens. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
UAVid: A semantic segmentation dataset for UAV imagery
ISPRS Journal of Photogrammetry and Remote Sensing ( IF 12.7 ) Pub Date : 2020-05-30 , DOI: 10.1016/j.isprsjprs.2020.05.009
Ye Lyu , George Vosselman , Gui-Song Xia , Alper Yilmaz , Michael Ying Yang

Semantic segmentation has been one of the leading research interests in computer vision recently. It serves as a perception foundation for many fields, such as robotics and autonomous driving. The fast development of semantic segmentation attributes enormously to the large scale datasets, especially for the deep learning related methods. There already exist several semantic segmentation datasets for comparison among semantic segmentation methods in complex urban scenes, such as the Cityscapes and CamVid datasets, where the side views of the objects are captured with a camera mounted on the driving car. There also exist semantic labeling datasets for the airborne images and the satellite images, where the nadir views of the objects are captured. However, only a few datasets capture urban scenes from an oblique Unmanned Aerial Vehicle (UAV) perspective, where both of the top view and the side view of the objects can be observed, providing more information for object recognition. In this paper, we introduce our UAVid dataset, a new high-resolution UAV semantic segmentation dataset as a complement, which brings new challenges, including large scale variation, moving object recognition and temporal consistency preservation. Our UAV dataset consists of 30 video sequences capturing high-resolution images in oblique views. In total, 300 images have been densely labeled with 8 classes for the semantic labeling task. We have provided several deep learning baseline methods with pre-training, among which the proposed Multi-Scale-Dilation net performs the best via multi-scale feature extraction, reaching a mean intersection-over-union (IoU) score around 50%. We have also explored the influence of spatial-temporal regularization for sequence data by leveraging on feature space optimization (FSO) and 3D conditional random field (CRF). Our UAVid website and the labeling tool have been published online (https://uavid.nl/).



中文翻译:

UAVid:UAV影像的语义分割数据集

近年来,语义分割已成为计算机视觉领域的主要研究热点之一。它是许多领域(例如机器人技术和自动驾驶)的感知基础。语义分割的快速发展极大地影响了大型数据集,尤其是与深度学习相关的方法。已经存在一些语义分割数据集,用于在复杂的城市场景中进行语义分割方法之间的比较,例如Cityscapes和CamVid数据集,其中,对象的侧视图是通过安装在驾驶汽车上的摄像头捕获的。还存在机载图像和卫星图像的语义标记数据集,其中捕获了对象的最低点视图。然而,只有少数数据集从倾斜的无人机(UAV)角度捕获了城市场景,可以同时观察到物体的俯视图和侧视图,从而为物体识别提供了更多信息。在本文中,我们介绍了我们的UAVid数据集,这是一种新的高分辨率UAV语义分割数据集作为补充,它带来了新挑战,包括大规模变化,移动目标识别和时间一致性保存。我们的无人机数据集由30个视频序列组成,这些视频序列以斜视图捕获高分辨率的图像。总共有300个图像被8个类密集地标记,用于语义标记任务。我们提供了几种具有预训练的深度学习基线方法,其中建议的Multi-Scale-Dilation网络通过多尺度特征提取表现最佳,达到联盟的平均交集(IoU)得分约50%。我们还利用特征空间优化(FSO)和3D条件随机场(CRF)探索了时空正则化对序列数据的影响。我们的UAVid网站和标签工具已在线发布(https://uavid.nl/)。

更新日期:2020-05-30
down
wechat
bug