当前位置: X-MOL 学术ISPRS J. Photogramm. Remote Sens. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Self-supervised monocular depth estimation from oblique UAV videos
ISPRS Journal of Photogrammetry and Remote Sensing ( IF 10.6 ) Pub Date : 2021-04-13 , DOI: 10.1016/j.isprsjprs.2021.03.024
Logambal Madhuanand , Francesco Nex , Michael Ying Yang

Unmanned Aerial Vehicles (UAVs) have become an essential photogrammetric measurement as they are affordable, easily accessible and versatile. Aerial images captured from UAVs have applications in small and large scale texture mapping, 3D modelling, object detection tasks, Digital Terrain Model (DTM) and Digital Surface Model (DSM) generation etc. Photogrammetric techniques are routinely used for 3D reconstruction from UAV images where multiple images of the same scene are acquired. Developments in computer vision and deep learning techniques have made Single Image Depth Estimation (SIDE) a field of intense research. Using SIDE techniques on UAV images can overcome the need for multiple images for 3D reconstruction. This paper aims to estimate depth from a single UAV aerial image using deep learning. We follow a self-supervised learning approach, Self-Supervised Monocular Depth Estimation (SMDE), which does not need ground truth depth or any extra information other than images for learning to estimate depth. Monocular video frames are used for training the deep learning model which learns depth and pose information jointly through two different networks, one each for depth and pose. The predicted depth and pose are used to reconstruct one image from the viewpoint of another image utilising the temporal information from videos. We propose a novel architecture with two 2D Convolutional Neural Network (CNN) encoders and a 3D CNN decoder for extracting information from consecutive temporal frames. A contrastive loss term is introduced for improving the quality of image generation. Our experiments are carried out on the public UAVid video dataset. The experimental results demonstrate that our model outperforms the state-of-the-art methods in estimating the depths.



中文翻译:

通过倾斜无人机视频进行自我监督的单眼深度估计

无人机(UAV)价格低廉,易于使用且用途广泛,已成为一种必不可少的摄影测量方法。从无人机获取的航空图像可用于小型和大型纹理映射,3D建模,对象检测任务,数字地形模型(DTM)和数字表面模型(DSM)生成等。摄影测量技术通常用于从无人机图像进行3D重建,其中获取同一场景的多个图像。计算机视觉和深度学习技术的发展已使单图像深度估计(SIDE)成为一个深入研究的领域。在无人机图像上使用SIDE技术可以克服对3D重建使用多个图像的需求。本文旨在使用深度学习从单个无人机航拍图像估计深度。我们遵循自我监督的学习方法,自监单眼深度估计(SMDE),它不需要地面真实深度或除图像以外的任何其他信息即可学习估计深度。单眼视频帧用于训练深度学习模型,该模型通过两个不同的网络共同学习深度和姿势信息,每个网络分别用于深度和姿势。利用来自视频的时间信息,将预测的深度和姿势用于从另一幅图像的视点重构一个图像。我们提出了一种具有两个2D卷积神经网络(CNN)编码器和一个3D CNN解码器的新颖架构,用于从连续时间帧中提取信息。引入对比损失项以改善图像生成的质量。我们的实验是在公共UAVid视频数据集上进行的。

更新日期:2021-04-14
down
wechat
bug