当前位置: X-MOL 学术IEEE Trans. Circ. Syst. Video Technol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Multi-Stage Feature Pyramid Stereo Network-Based Disparity Estimation Approach for Two to Three-Dimensional Video Conversion
IEEE Transactions on Circuits and Systems for Video Technology ( IF 8.4 ) Pub Date : 2020-08-04 , DOI: 10.1109/tcsvt.2020.3014053
Baiyu Pan , Liming Zhang , Hanzi Wang

Disparity estimation is a popular topic in computer vision and has drawn increasing attention in recent years. In this article, we propose a new multi-stage network for the purpose of two to three-dimensional video conversion that contains two training stages: an initial disparity estimation as the first training stage and depth-image-based rendering (DIBR) as an extra component to form the second training stage. In the first training stage, we propose a revised end-to-end feature pyramid stereo network, in which the original non-pyramid structure is replaced by a bottom-up convolutional neural network pyramid for disparity regression. It utilizes the spatial information by concatenating different scale features to boost the performance on boundary consistency. Mirror connections between feature extraction and disparity regression on the corresponding layers are also added to improve the quality of the results. In the second stage, we propose an improved disocclusion filling technique in the DIBR branch and connect the non-neural-network method to the disparity estimation network. This two-stage training strategy can work effectively to generate the improved disparity estimation for two to three-dimensional video conversion. Extensive experiments are conducted and some selected state-of-the-art algorithms are compared with our proposed approach on the popular KITTI2015 and Scene Flow datasets. The results demonstrate that our estimated disparity map can generate high quality 3D images.

中文翻译:

基于多阶段特征金字塔立体声网络的视差估计方法,用于二维到三维视频转换

视差估计是计算机视觉中的热门话题,并且近年来引起了越来越多的关注。在本文中,我们提出了一个新的多阶段网络,用于二维至三维视频转换,其中包含两个训练阶段:作为初始训练阶段的初始视差估计和作为基于深度图像的渲染(DIBR)。组成第二培训阶段的额外组件。在第一个训练阶段,我们提出了一个修订的端到端特征金字塔立体网络,其中原始的非金字塔结构被自下而上的卷积神经网络金字塔替代,以实现视差回归。它通过连接不同的比例尺特征来利用空间信息,以提高边界一致性的性能。还添加了相应图层上的特征提取和视差回归之间的镜像连接,以提高结果的质量。在第二阶段,我们在DIBR分支中提出了一种改进的遮挡填充技术,并将非神经网络方法连接到视差估计网络。该两阶段训练策略可以有效地工作,以生成用于二维视频到三维视频转换的改进的视差估计。进行了广泛的实验,并将一些精选的最新算法与我们在流行的KITTI2015和Scene Flow数据集上提出的方法进行了比较。结果表明,我们估计的视差图可以生成高质量的3D图像。我们在DIBR分支中提出了一种改进的遮挡填充技术,并将非神经网络方法连接到视差估计网络。该两阶段训练策略可以有效地工作,以生成用于二维视频到三维视频转换的改进的视差估计。进行了广泛的实验,并将一些精选的最新算法与我们在流行的KITTI2015和Scene Flow数据集上提出的方法进行了比较。结果表明,我们估计的视差图可以生成高质量的3D图像。我们在DIBR分支中提出了一种改进的遮挡填充技术,并将非神经网络方法连接到视差估计网络。此两阶段训练策略可以有效地工作,以生成用于二维视频到三维视频转换的改进的视差估计。进行了广泛的实验,并将一些精选的最新算法与我们在流行的KITTI2015和Scene Flow数据集上提出的方法进行了比较。结果表明,我们估计的视差图可以生成高质量的3D图像。进行了广泛的实验,并将一些精选的最新算法与我们在流行的KITTI2015和Scene Flow数据集上提出的方法进行了比较。结果表明,我们估计的视差图可以生成高质量的3D图像。进行了广泛的实验,并将一些精选的最新算法与我们在流行的KITTI2015和Scene Flow数据集上提出的方法进行了比较。结果表明,我们估计的视差图可以生成高质量的3D图像。
更新日期:2020-08-04
down
wechat
bug