当前位置: X-MOL 学术Int. J. Mach. Learn. & Cyber. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Attention-based context aggregation network for monocular depth estimation
International Journal of Machine Learning and Cybernetics ( IF 3.1 ) Pub Date : 2021-01-03 , DOI: 10.1007/s13042-020-01251-y
Yuru Chen , Haitao Zhao , Zhengwei Hu , Jingchao Peng

Depth estimation is a traditional computer vision task, which plays a crucial role in understanding 3D scene geometry. Recently, algorithms that combine the multi-scale features extracted by the dilated convolution based block (atrous spatial pyramid pooling, ASPP) have gained significant improvements in depth estimation. However, the discretized and predefined dilation kernels cannot capture the continuous context information that differs in diverse scenes and easily introduce the grid artifacts. This paper proposes a novel algorithm, called attention-based context aggregation network (ACAN) for depth estimation. A supervised self-attention model is designed and utilized to adaptively learn the task-specific similarities between different pixels to model the continuous context information. Moreover, a soft ordinal inference is proposed to transform the predicted probabilities to continuous depth values which reduce the discretization error (about 1% decrease in RMSE). ACAN achieves state-of-the-art performance on public monocular depth-estimation benchmark datasets. The source code of ACAN can be found in https://github.com/miraiaroha/ACAN.



中文翻译:

基于注意力的上下文聚合网络,用于单眼深度估计

深度估计是传统的计算机视觉任务,它在理解3D场景几何中起着至关重要的作用。近来,结合了由基于膨胀的卷积的块提取的多尺度特征的算法(大气空间金字塔池,ASPP)在深度估计方面获得了重大改进。但是,离散化和预定义的膨胀内核无法捕获在不同场景中不同的连续上下文信息,因此无法轻松引入网格伪像。本文提出了一种新的算法,称为深度关注上下文聚合网络(ACAN)。设计并利用监督的自我注意模型来自适应地学习不同像素之间的特定于任务的相似性,以对连续上下文信息进行建模。此外,提出了一种软序数推论,以将预测的概率转换为连续的深度值,从而减少离散化误差(RMSE降低约1%)。ACAN在公共单眼深度估计基准数据集上实现了最先进的性能。ACAN的源代码可以在https://github.com/miraiaroha/ACAN中找到。

更新日期:2021-01-03
down
wechat
bug