FADE: Feature Aggregation for Depth Estimation With Multi-View Stereo,IEEE Transactions on Image Processing

当前位置： X-MOL 学术 › IEEE Trans. Image Process. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

FADE: Feature Aggregation for Depth Estimation With Multi-View Stereo
IEEE Transactions on Image Processing ( IF 10.8 ) Pub Date : 2020-05-22 , DOI: 10.1109/tip.2020.2991883
Hsiao-Chien Yang , Po-Heng Chen , Kuan-Wen Chen , Chen-Yi Lee , Yong-Sheng Chen

Both structural and contextual information is essential and widely used in image analysis. However, current multi-view stereo (MVS) approaches usually use a single common pre-trained model as pixel descriptor to extract features, which mix structural and contextual information together and thus increase the difficulty of matching correspondence. In this paper, we propose FADE (feature aggregation for depth estimation), which treats spatial and context information separately and focuses on aggregating features for efficient learning of the MVS problem. Spatial information includes image details such as edges and corners, whereas context information comprises object features such as shapes and traits. To aggregate these multi-level features, we use an attention mechanism to select important features for matching. We then build a plane sweep volume by using a homography backward warping method to generate match candidates. Furthermore, we propose a novel cost volume regularization network aims to minimize the noise in the matching candidates. Finally, we take advantage of 3D stacked hourglass and regression to produces high-quality depth maps. With these well-aggregated features, FADE can efficiently perform dense depth reconstruction, achieving state-of-the-art performance in terms of accuracy and requiring the least amount of model parameters.

中文翻译：

FADE：用于多视图立体声深度估计的特征聚合

结构信息和上下文信息都是必不可少的，并且广泛用于图像分析中。但是，当前的多视图立体声（MVS）方法通常使用单个通用的预训练模型作为像素描述符来提取特征，这些特征将结构信息和上下文信息混合在一起，从而增加了匹配对应关系的难度。在本文中，我们提出了FADE（用于深度估计的特征聚集），该函数分别处理空间和上下文信息，并着重于聚集特征以有效地学习MVS问题。空间信息包括图像细节，例如边缘和角落，而上下文信息包括对象特征，例如形状和特征。为了聚合这些多级功能，我们使用一种关注机制来选择要匹配的重要功能。然后，我们通过使用单应性向后翘曲方法构建平面扫掠体，以生成匹配候选者。此外，我们提出了一种新颖的成本量调整网络，旨在最小化匹配候选中的噪声。最后，我们利用3D堆叠沙漏和回归功能来生成高质量的深度图。凭借这些精心汇总的功能，FADE可以高效地执行密集深度重建，从而在准确性方面实现了最先进的性能，并且需要最少的模型参数。

更新日期：2020-07-03

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11