当前位置: X-MOL 学术Comp. Visual Media › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Recurrent 3D attentional networks for end-to-end active object recognition
Computational Visual Media ( IF 17.3 ) Pub Date : 2019-04-08 , DOI: 10.1007/s41095-019-0135-2
Min Liu , Yifei Shi , Lintao Zheng , Kai Xu , Hui Huang , Dinesh Manocha

Active vision is inherently attention-driven: an agent actively selects views to attend in order to rapidly perform a vision task while improving its internal representation of the scene being observed. Inspired by the recent success of attention-based models in 2D vision tasks based on single RGB images, we address multi-view depth-based active object recognition using an attention mechanism, by use of an end-to-end recurrent 3D attentional network. The architecture takes advantage of a recurrent neural network to store and update an internal representation. Our model, trained with 3D shape datasets, is able to iteratively attend the best views targeting an object of interest for recognizing it. To realize 3D view selection, we derive a 3D spatial transformer network. It is differentiable, allowing training with backpropagation, and so achieving much faster convergence than the reinforcement learning employed by most existing attention-based models. Experiments show that our method, with only depth input, achieves state-of-the-art next-best-view performance both in terms of time taken and recognition accuracy.

中文翻译:

循环3D注意网络用于端到端活动对象识别

主动视觉本质上是由注意力驱动的:座席主动选择要参加的视图,以便快速执行视觉任务,同时改善其对所观察场景的内部表示。受基于注意力的模型在基于单个RGB图像的2D视觉任务中最近成功的启发,我们通过使用端到端的循环3D注意力网络,通过注意力机制解决了基于多视图深度的活动对象识别。该体系结构利用循环神经网络来存储和更新内部表示。我们的模型经过3D形状数据集的训练,能够迭代地以针对感兴趣对象的最佳视图进行识别。为了实现3D视图选择,我们导出了3D空间转换器网络。它与众不同,可以进行反向传播训练,因此,与大多数现有的基于注意力的模型所采用的强化学习相比,实现了更快的收敛。实验表明,我们的方法仅使用深度输入,就所花费的时间和识别准确性而言,都可以达到最新的最佳观看效果。
更新日期:2019-04-08
down
wechat
bug