当前位置: X-MOL 学术J. Real-Time Image Proc. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Efficient unsupervised monocular depth estimation using attention guided generative adversarial network
Journal of Real-Time Image Processing ( IF 2.9 ) Pub Date : 2021-03-22 , DOI: 10.1007/s11554-021-01092-0
Sumanta Bhattacharyya , Ju Shen , Stephen Welch , Chen Chen

Deep-learning-based approaches to depth estimation are rapidly advancing, offering better performance over traditional computer vision approaches across many domains. However, for many critical applications, cutting-edge deep-learning based approaches require too much computational overhead to be operationally feasible. This is especially true for depth-estimation methods that leverage adversarial learning, such as Generative Adversarial Networks (GANs). In this paper, we propose a computationally efficient GAN for unsupervised monocular depth estimation using factorized convolutions and an attention mechanism. Specifically, we leverage the Extremely Efficient Spatial Pyramid of Depth-wise Dilated Separable Convolutions (EESP) module of ESPNetv2 inside the network, leading to a total reduction of \(22.8\%\), \(35.37\%\), and \(31.5\%\) in the number of model parameters, FLOPs, and inference time respectively, as compared to the previous unsupervised GAN approach. Finally, we propose a context-aware attention architecture to generate detail-oriented depth images. We demonstrate superior performance of our proposed model on two benchmark datasets KITTI and Cityscapes. We have also provided more qualitative examples (Fig. 8) at the end of this paper.



中文翻译:

使用注意力指导的生成对抗网络进行有效的无监督单眼深度估计

基于深度学习的深度估计方法正在迅速发展,在许多领域都提供了优于传统计算机视觉方法的更好性能。但是,对于许多关键应用程序而言,基于尖端深度学习的方法需要太多的计算开销才能在操作上可行。对于利用对抗学习的深度估计方法,例如生成对抗网络(GAN),尤其如此。在本文中,我们提出了一种使用分解卷积和注意机制的无监督单眼深度估计的高效计算GAN。具体来说,我们利用网络内部ESPNetv2的深度有效的深度可扩展可分解卷积(EESP)模块的极高效空间金字塔,导致总\\(22.8 \%\)减少了,与以前的无监督GAN方法相比,模型参数,FLOP和推理时间分别为\(35.37 \%\)\(31.5 \%\)。最后,我们提出了一种上下文感知的注意力架构来生成面向细节的深度图像。我们在两个基准数据集KITTI和Cityscapes上证明了我们提出的模型的优越性能。在本文结尾处,我们还提供了更多定性示例(图8)。

更新日期:2021-03-22
down
wechat
bug