当前位置: X-MOL 学术arXiv.cs.AI › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Multimodal Co-learning: Challenges, Applications with Datasets, Recent Advances and Future Directions
arXiv - CS - Artificial Intelligence Pub Date : 2021-07-29 , DOI: arxiv-2107.13782
Anil Rahate, Rahee Walambe, Sheela Ramanna, Ketan Kotecha

Multimodal deep learning systems which employ multiple modalities like text, image, audio, video, etc., are showing better performance in comparison with individual modalities (i.e., unimodal) systems. Multimodal machine learning involves multiple aspects: representation, translation, alignment, fusion, and co-learning. In the current state of multimodal machine learning, the assumptions are that all modalities are present, aligned, and noiseless during training and testing time. However, in real-world tasks, typically, it is observed that one or more modalities are missing, noisy, lacking annotated data, have unreliable labels, and are scarce in training or testing and or both. This challenge is addressed by a learning paradigm called multimodal co-learning. The modeling of a (resource-poor) modality is aided by exploiting knowledge from another (resource-rich) modality using transfer of knowledge between modalities, including their representations and predictive models. Co-learning being an emerging area, there are no dedicated reviews explicitly focusing on all challenges addressed by co-learning. To that end, in this work, we provide a comprehensive survey on the emerging area of multimodal co-learning that has not been explored in its entirety yet. We review implementations that overcome one or more co-learning challenges without explicitly considering them as co-learning challenges. We present the comprehensive taxonomy of multimodal co-learning based on the challenges addressed by co-learning and associated implementations. The various techniques employed to include the latest ones are reviewed along with some of the applications and datasets. Our final goal is to discuss challenges and perspectives along with the important ideas and directions for future work that we hope to be beneficial for the entire research community focusing on this exciting domain.

中文翻译:

多模式协同学习:挑战、数据集应用、最新进展和未来方向

与单个模态(即单模态)系统相比,采用文本、图像、音频、视频等多种模态的多模态深度学习系统表现出更好的性能。多模态机器学习涉及多个方面:表示、翻译、对齐、融合和协同学习。在多模态机器学习的当前状态下,假设所有模态在训练和测试期间都存在、对齐且无噪音。然而,在现实世界的任务中,通常会观察到一种或多种模态缺失、嘈杂、缺少注释数据、标签不可靠,并且在训练或测试和/或两者中都很少见。这一挑战通过一种称为多模式协同学习的学习范式来解决。通过使用模态之间的知识转移(包括它们的表示和预测模型)来利用来自另一个(资源丰富的)模态的知识,有助于对(资源贫乏的)模态进行建模。共同学习是一个新兴领域,没有专门的评论明确关注共同学习解决的所有挑战。为此,在这项工作中,我们对尚未完全探索的多模态联合学习的新兴领域进行了全面调查。我们回顾了克服一个或多个共同学习挑战的实现,但没有明确将它们视为共同学习挑战。我们根据共同学习和相关实现所解决的挑战,提出了多模态共同学习的综合分类法。包括最新技术的各种技术与一些应用程序和数据集一起进行了审查。我们的最终目标是讨论挑战和观点,以及未来工作的重要思想和方向,我们希望对整个研究界有益于这个令人兴奋的领域。
更新日期:2021-07-30
down
wechat
bug