当前位置:
X-MOL 学术
›
arXiv.cs.AI
›
论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Multimodal Co-learning: Challenges, Applications with Datasets, Recent Advances and Future Directions
arXiv - CS - Artificial Intelligence Pub Date : 2021-07-29 , DOI: arxiv-2107.13782 Anil Rahate, Rahee Walambe, Sheela Ramanna, Ketan Kotecha
arXiv - CS - Artificial Intelligence Pub Date : 2021-07-29 , DOI: arxiv-2107.13782 Anil Rahate, Rahee Walambe, Sheela Ramanna, Ketan Kotecha
Multimodal deep learning systems which employ multiple modalities like text,
image, audio, video, etc., are showing better performance in comparison with
individual modalities (i.e., unimodal) systems. Multimodal machine learning
involves multiple aspects: representation, translation, alignment, fusion, and
co-learning. In the current state of multimodal machine learning, the
assumptions are that all modalities are present, aligned, and noiseless during
training and testing time. However, in real-world tasks, typically, it is
observed that one or more modalities are missing, noisy, lacking annotated
data, have unreliable labels, and are scarce in training or testing and or
both. This challenge is addressed by a learning paradigm called multimodal
co-learning. The modeling of a (resource-poor) modality is aided by exploiting
knowledge from another (resource-rich) modality using transfer of knowledge
between modalities, including their representations and predictive models.
Co-learning being an emerging area, there are no dedicated reviews explicitly
focusing on all challenges addressed by co-learning. To that end, in this work,
we provide a comprehensive survey on the emerging area of multimodal
co-learning that has not been explored in its entirety yet. We review
implementations that overcome one or more co-learning challenges without
explicitly considering them as co-learning challenges. We present the
comprehensive taxonomy of multimodal co-learning based on the challenges
addressed by co-learning and associated implementations. The various techniques
employed to include the latest ones are reviewed along with some of the
applications and datasets. Our final goal is to discuss challenges and
perspectives along with the important ideas and directions for future work that
we hope to be beneficial for the entire research community focusing on this
exciting domain.
中文翻译:
多模式协同学习:挑战、数据集应用、最新进展和未来方向
与单个模态(即单模态)系统相比,采用文本、图像、音频、视频等多种模态的多模态深度学习系统表现出更好的性能。多模态机器学习涉及多个方面:表示、翻译、对齐、融合和协同学习。在多模态机器学习的当前状态下,假设所有模态在训练和测试期间都存在、对齐且无噪音。然而,在现实世界的任务中,通常会观察到一种或多种模态缺失、嘈杂、缺少注释数据、标签不可靠,并且在训练或测试和/或两者中都很少见。这一挑战通过一种称为多模式协同学习的学习范式来解决。通过使用模态之间的知识转移(包括它们的表示和预测模型)来利用来自另一个(资源丰富的)模态的知识,有助于对(资源贫乏的)模态进行建模。共同学习是一个新兴领域,没有专门的评论明确关注共同学习解决的所有挑战。为此,在这项工作中,我们对尚未完全探索的多模态联合学习的新兴领域进行了全面调查。我们回顾了克服一个或多个共同学习挑战的实现,但没有明确将它们视为共同学习挑战。我们根据共同学习和相关实现所解决的挑战,提出了多模态共同学习的综合分类法。包括最新技术的各种技术与一些应用程序和数据集一起进行了审查。我们的最终目标是讨论挑战和观点,以及未来工作的重要思想和方向,我们希望对整个研究界有益于这个令人兴奋的领域。
更新日期:2021-07-30
中文翻译:
多模式协同学习:挑战、数据集应用、最新进展和未来方向
与单个模态(即单模态)系统相比,采用文本、图像、音频、视频等多种模态的多模态深度学习系统表现出更好的性能。多模态机器学习涉及多个方面:表示、翻译、对齐、融合和协同学习。在多模态机器学习的当前状态下,假设所有模态在训练和测试期间都存在、对齐且无噪音。然而,在现实世界的任务中,通常会观察到一种或多种模态缺失、嘈杂、缺少注释数据、标签不可靠,并且在训练或测试和/或两者中都很少见。这一挑战通过一种称为多模式协同学习的学习范式来解决。通过使用模态之间的知识转移(包括它们的表示和预测模型)来利用来自另一个(资源丰富的)模态的知识,有助于对(资源贫乏的)模态进行建模。共同学习是一个新兴领域,没有专门的评论明确关注共同学习解决的所有挑战。为此,在这项工作中,我们对尚未完全探索的多模态联合学习的新兴领域进行了全面调查。我们回顾了克服一个或多个共同学习挑战的实现,但没有明确将它们视为共同学习挑战。我们根据共同学习和相关实现所解决的挑战,提出了多模态共同学习的综合分类法。包括最新技术的各种技术与一些应用程序和数据集一起进行了审查。我们的最终目标是讨论挑战和观点,以及未来工作的重要思想和方向,我们希望对整个研究界有益于这个令人兴奋的领域。