当前位置: X-MOL 学术IEEE Trans. Multimedia › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
3D Pose Estimation Based on Reinforce Learning for 2D Image-Based 3D Model Retrieval
IEEE Transactions on Multimedia ( IF 8.4 ) Pub Date : 2020-04-30 , DOI: 10.1109/tmm.2020.2991532
Wei-Zhi Nie , Wen-Wu Jia , Wen-Hui Li , An-An Liu , Si-Cheng Zhao

In this paper, we propose a novel characteristic view selection model (CVSM) to address the 2D image-based 3D object retrieval problem. This work includes two key contributions: 1) we propose a novel reinforcement learning model to estimate the 3D pose based on a 2D image; and 2) we render the pose-specific model to generate a representative angle view for retrieval applications. First, we define state, policy, action and reward functions to train an agent with the reinforcement learning framework, by which the agent can effectively reduce the computational cost of the characteristic view selection and directly obtain the 3D model pose. Second, to resolve the problem of computing similarity in the cross-domain between the virtual 3D model view and the real query image, we project them into the skeleton domain, and the skeleton information can effectively bridge the gap between the image and 3D model view for cross-media retrieval. To demonstrate the performance of our approach, we compare with some classic 3D pose estimation methods using the popular Pascal3D dataset. To demonstrate the performance of our approach in model retrieval, we collect a new dataset that includes pairs of 2D images and 3D objects, where 3D objects are based on the ModelNet40 dataset and 2D images are based on the ImageNet dataset, and we experiment with our method using the SHREC 2018 and SHREC 2019 databases. The experimental results demonstrate the superiority of our method.

中文翻译:

基于强化学习的3D姿态估计用于基于2D图像的3D模型检索

在本文中,我们提出了一种新颖的特征视图选择模型(CVSM),以解决基于2D图像的3D对象检索问题。这项工作包括两个主要贡献:1)我们提出了一种新颖的强化学习模型,用于基于2D图像估计3D姿势;2)渲染特定于姿势的模型以生成代表角度的视图,以供检索应用程序使用。首先,我们定义状态,策略,动作和奖励功能,以通过强化学习框架来训练智能体,从而可以有效降低特征视图选择的计算成本,并直接获得3D模型姿态。其次,要解决在虚拟3D模型视图和实际查询图像之间的跨域中计算相似度的问题,我们将它们投影到骨架域中,骨架信息可以有效地弥合图像和3D模型视图之间的差距,以进行跨媒体检索。为了演示我们方法的性能,我们使用流行的Pascal3D数据集与一些经典的3D姿势估计方法进行了比较。为了证明我们的方法在模型检索中的性能,我们收集了一个新的数据集,其中包括成对的2D图像和3D对象,其中3D对象基于ModelNet40数据集,而2D图像则基于ImageNet数据集,并进行实验使用SHREC 2018和SHREC 2019数据库的方法。实验结果证明了我们方法的优越性。我们将使用流行的Pascal3D数据集与一些经典的3D姿势估计方法进行比较。为了证明我们的方法在模型检索中的性能,我们收集了一个新的数据集,其中包括成对的2D图像和3D对象,其中3D对象基于ModelNet40数据集,而2D图像则基于ImageNet数据集,并进行实验使用SHREC 2018和SHREC 2019数据库的方法。实验结果证明了我们方法的优越性。我们将使用流行的Pascal3D数据集与一些经典的3D姿势估计方法进行比较。为了证明我们的方法在模型检索中的性能,我们收集了一个新的数据集,其中包括成对的2D图像和3D对象,其中3D对象基于ModelNet40数据集,而2D图像则基于ImageNet数据集,并进行实验使用SHREC 2018和SHREC 2019数据库的方法。实验结果证明了我们方法的优越性。
更新日期:2020-04-30
down
wechat
bug