当前位置: X-MOL 学术Image Vis. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Attention-guided aggregation stereo matching network
Image and Vision Computing ( IF 4.2 ) Pub Date : 2020-12-10 , DOI: 10.1016/j.imavis.2020.104088
Yaru Zhang , Yaqian Li , Chao Wu , Bin Liu

Existing stereo matching networks based on deep learning lack multi-level and multi-module attention and integration for feature information. Therefore, we propose an attention-guided aggregation stereo matching network to encode and integrate information multiple times. Specifically, we design a residual network based on the 2D channel attention block to adaptively calibrate weight response, improving the robustness of the feature representation. We also construct a 3D stacked hourglass structure based on the 3D channel attention block to calibrate the weight response of the 4D cost volume in the channel dimension, further enhancing the network guidance and aggregation capabilities. In addition, we introduce a 4D guided cost volume, which pre-groups the extracted image features and exploits the similarity measures in each group to guide the concatenation features, further realizing interactive learning of cost volume. The experimental results on the Scene Flow and KITTI benchmark datasets showed that the proposed network significantly improves the prediction disparity accuracy with a small increase in calculation time.



中文翻译:

注意力引导聚合立体匹配网络

现有的基于深度学习的立体声匹配网络缺乏多层次,多模块的关注和特征信息的集成。因此,我们提出了一种注意力导向的聚集立体声匹配网络,以对信息进行多次编码和集成。具体来说,我们基于2D频道注意模块设计了残差网络,以自适应地校准权重响应,从而提高了特征表示的鲁棒性。我们还基于3D频道关注区域构建3D堆叠沙漏结构,以在频道维度上校准4D成本量的权重响应,从而进一步增强网络指导和聚合功能。此外,我们介绍了4D指导的费用量,该算法对提取的图像特征进行预分组,并利用每组中的相似性度量来指导串联特征,从而进一步实现成本量的交互学习。在Scene Flow和KITTI基准数据集上的实验结果表明,所提出的网络显着提高了预测视差精度,而计算时间却有所增加。

更新日期:2020-12-23
down
wechat
bug