Feature-Guided Spatial Attention Upsampling for Real-Time Stereo Matching Network,IEEE Multimedia

当前位置： X-MOL 学术 › IEEE Multimed. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Feature-Guided Spatial Attention Upsampling for Real-Time Stereo Matching Network
IEEE Multimedia ( IF 2.3 ) Pub Date : 2020-10-14 , DOI: 10.1109/mmul.2020.3030027
Yun Xie ₁ , Shaowu Zheng ₁ , Weihua Li ₁

Affiliation

In this article, we propose an end-to-end real-time stereo matching network (RTSMNet). RTSMNet consists of three modules. The global and local feature extraction (GLFE) module captures the hierarchical context information and generates the coarse cost volume. The initial disparity estimation module is a compact three-dimensional convolution architecture aiming to produce the low-resolution (LR) disparity map rapidly. The feature-guided spatial attention upsampling module takes the LR disparity map and the shared features from the GLFE module as guidance, first estimates residual disparity values and then an attention mechanism is developed to generate context-aware adaptive kernels for each upsampled pixel. The adaptive kernels emphasize higher attention weights on the reliable area, which can significantly reduce blurred edges and recover thin structures. The proposed networks achieve 66 ∼ 175 fps on a 2080Ti and 11 ∼ 42 fps on edge computing devices, with competitive accuracy compared to state-of-the-art methods on multiple benchmarks.

中文翻译：

实时立体匹配网络的特征指导的空间注意上采样

在本文中，我们提出了一个端到端的实时立体声匹配网络（RTSMNet）。RTSMNet由三个模块组成。全局和局部特征提取（GLFE）模块捕获分层上下文信息并生成粗略成本量。初始视差估计模块是一种紧凑的三维卷积体系结构，旨在快速生成低分辨率（LR）视差图。特征指导的空间注意力上采样模块以LR视差图和GLFE模块中的共享特征为指导，首先估计残留的视差值，然后开发一种注意力机制以为每个上采样像素生成上下文感知的自适应内核。自适应内核在可靠区域上强调更高的关注权重，可以显着减少模糊的边缘并恢复薄的结构。所提出的网络在2080Ti上可达到66〜175 fps，在边缘计算设备上可达到11〜42 fps，与多个基准上的最新技术相比，具有竞争优势。

更新日期：2020-10-14

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11