Applied Soft Computing ( IF 5.472 ) Pub Date : 2020-10-17 , DOI: 10.1016/j.asoc.2020.106804 Songnan Chen; Mengxia Tang; Jiangming Kan
Monocular image depth prediction is an interesting challenge in three-dimensional (3D) perception, the purpose of which is to obtain the geometric features of 3D scenes from two-dimensional (2D) images. At present, the deep learning method for monocular depth prediction has yielded good results, but this approach treats it as a supervised deep regression problem. A significant weakness of current methods is the need to collect reams of depth measurement data in actual scenarios for training. In this paper, we design a novel convolutional neural network (CNN) with an encoding and decoding structure to estimate the depth map from monocular RGB images based on basic principles of binocular stereo vision, and use rectified stereo pairs to train our network from scratch in an unsupervised learning method without any depth data. We also explore a new upsampling strategy to improve the output resolution, and introduce a new dynamic optimization strategy to enhance the training speed and prediction accuracy. Extensive experiments on the publicly available KITTI and Cityscapes datasets demonstrate that our approach is more accurate than competing methods. The findings of the proposed methodology illustrate that our CNN model can be utilized as depth completion from LIDAR images.