当前位置: X-MOL 学术Neural Process Lett. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Object Reconstruction Based on Attentive Recurrent Network from Single and Multiple Images
Neural Processing Letters ( IF 2.6 ) Pub Date : 2021-01-05 , DOI: 10.1007/s11063-020-10399-1
Zishu Gao , En Li , Zhe Wang , Guodong Yang , Jiwu Lu , Bo Ouyang , Dawei Xu , Zize Liang

The application of traditional 3D reconstruction methods such as structure-from-motion and simultaneous localization and mapping are typically limited by illumination conditions, surface textures, and wide baseline viewpoints in the field of robotics. To solve this problem, many researchers have applied learning-based methods with convolutional neural network architectures. However, simply utilizing convolutional neural networks without taking other measures into account is computationally intensive, and the results are not satisfying. In this study, to obtain the most informative images for reconstruction, we introduce a residual block to a 2D encoder for improved feature extraction, and propose an attentive latent unit that makes it possible to select the most informative image being fed into the network rather than choosing one at random. The recurrent visual attentive network is injected into the auto-encoder network using reinforcement learning. The recurrent visual attentive network pays more attention to useful images, and the agent will quickly predict the 3D volume. This model is evaluated based on both single- and multi-view reconstructions. The experiment results show that the recurrent visual attentive network increases prediction performance in a way that is superior to other alternative methods, and our model has desirable capacity for generalization.



中文翻译:

基于注意力递归网络的单幅和多幅图像目标重构

传统3D重建方法(例如从运动结构以及同时进行的本地化和贴图)的应用通常受到光照条件,表面纹理以及机器人技术领域中较宽的基线视点的限制。为了解决这个问题,许多研究人员将基于学习的方法与卷积神经网络体系结构相结合。但是,仅使用卷积神经网络而不考虑其他措施是计算密集型的,并且结果不能令人满意。在这项研究中,为了获得最有用的图像以进行重建,我们将残差块引入2D编码器以改进特征提取,并提出了一个细心的潜伏单元,该单元可以选择输入到网络中而不是最有用的图像随机选择一个。使用强化学习将循环视觉注意力网络注入自动编码器网络。循环视觉注意力网络将更多注意力放在有用的图像上,代理会快速预测3D体积。该模型基于单视图和多视图重建进行评估。实验结果表明,循环视觉专心网络以优于其他替代方法的方式提高了预测性能,并且我们的模型具有理想的泛化能力。

更新日期:2021-01-06
down
wechat
bug