当前位置: X-MOL 学术ACM Trans. Multimed. Comput. Commun. Appl. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
MMFN
ACM Transactions on Multimedia Computing, Communications, and Applications ( IF 5.1 ) Pub Date : 2020-12-17 , DOI: 10.1145/3410439
Weizhi Nie 1 , Qi Liang 1 , Yixin Wang 1 , Xing Wei 2 , Yuting Su 1
Affiliation  

In recent years, research into 3D shape recognition in the field of multimedia and computer vision has attracted wide attention. With the rapid development of deep learning, various deep models have achieved state-of-the-art performance based on different representations. There are many modalities for representing a 3D model, such as point cloud, multiview, and panorama view. Deep learning models based on these different modalities have different concerns, and all of them have achieved high performance for 3D shape recognition. However, all of these methods ignore the multimodality information in conditions where the same 3D model is represented by different modalities. Thus, we can obtain a better descriptor by guiding the training to consider these multiple representations. In this article, we propose MMFN, a novel multimodal fusion network for 3D shape recognition that employs correlations between the different modalities to generate a fused descriptor, which is more robust. In particular, we design two novel loss functions to help the model learn the correlation information during training. The first is correlation loss, which focuses on the correlations among different descriptors generated from different structures. This approach reduces the training time and improves the robustness of the fused descriptor of the 3D model. The second is instance loss, which preserves the independence of each modality and utilizes feature differentiation to guide model learning during the training process. More specifically, we use the weighted fusion method, which applies statistical methods to obtain robust descriptors that maximize the advantages of the information from the different modalities. We evaluated the proposed method on the ModelNet40 and ShapeNetCore55 datasets for 3D shape classification and retrieval tasks. The experimental results and comparisons with state-of-the-art methods demonstrate the superiority of our approach.

中文翻译:

中频网

近年来,多媒体和计算机视觉领域对3D形状识别的研究受到广泛关注。随着深度学习的快速发展,各种深度模型基于不同的表示已经取得了state-of-the-art的性能。有许多表示 3D 模型的方式,例如点云、多视图和全景视图。基于这些不同模态的深度学习模型有不同的关注点,它们都在 3D 形状识别方面取得了高性能。然而,在相同的 3D 模型由不同的模态表示的情况下,所有这些方法都忽略了多模态信息。因此,我们可以通过指导训练考虑这些多重表示来获得更好的描述符。在本文中,我们提出了 MMFN,一种用于 3D 形状识别的新型多模态融合网络,它利用不同模态之间的相关性来生成更稳健的融合描述符。特别是,我们设计了两个新颖的损失函数来帮助模型在训练期间学习相关信息。第一个是相关损失,它侧重于从不同结构生成的不同描述符之间的相关性。这种方法减少了训练时间,提高了 3D 模型融合描述符的鲁棒性。第二个是实例损失,它保留了每个模态的独立性,并在训练过程中利用特征差异来指导模型学习。更具体地说,我们使用加权融合方法,它应用统计方法来获得稳健的描述符,最大限度地发挥来自不同模式的信息的优势。我们在 ModelNet40 和 ShapeNetCore55 数据集上评估了所提出的方法,用于 3D 形状分类和检索任务。实验结果和与最先进方法的比较证明了我们方法的优越性。
更新日期:2020-12-17
down
wechat
bug