当前位置: X-MOL 学术arXiv.cs.MM › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
MultiBench: Multiscale Benchmarks for Multimodal Representation Learning
arXiv - CS - Multimedia Pub Date : 2021-07-15 , DOI: arxiv-2107.07502
Paul Pu Liang, Yiwei Lyu, Xiang Fan, Zetian Wu, Yun Cheng, Jason Wu, Leslie Chen, Peter Wu, Michelle A. Lee, Yuke Zhu, Ruslan Salakhutdinov, Louis-Philippe Morency

Learning multimodal representations involves integrating information from multiple heterogeneous sources of data. It is a challenging yet crucial area with numerous real-world applications in multimedia, affective computing, robotics, finance, human-computer interaction, and healthcare. Unfortunately, multimodal research has seen limited resources to study (1) generalization across domains and modalities, (2) complexity during training and inference, and (3) robustness to noisy and missing modalities. In order to accelerate progress towards understudied modalities and tasks while ensuring real-world robustness, we release MultiBench, a systematic and unified large-scale benchmark spanning 15 datasets, 10 modalities, 20 prediction tasks, and 6 research areas. MultiBench provides an automated end-to-end machine learning pipeline that simplifies and standardizes data loading, experimental setup, and model evaluation. To enable holistic evaluation, MultiBench offers a comprehensive methodology to assess (1) generalization, (2) time and space complexity, and (3) modality robustness. MultiBench introduces impactful challenges for future research, including scalability to large-scale multimodal datasets and robustness to realistic imperfections. To accompany this benchmark, we also provide a standardized implementation of 20 core approaches in multimodal learning. Simply applying methods proposed in different research areas can improve the state-of-the-art performance on 9/15 datasets. Therefore, MultiBench presents a milestone in unifying disjoint efforts in multimodal research and paves the way towards a better understanding of the capabilities and limitations of multimodal models, all the while ensuring ease of use, accessibility, and reproducibility. MultiBench, our standardized code, and leaderboards are publicly available, will be regularly updated, and welcomes inputs from the community.

中文翻译:

MultiBench:多模态表示学习的多尺度基准

学习多模态表示涉及集成来自多个异构数据源的信息。这是一个具有挑战性但至关重要的领域,在多媒体、情感计算、机器人、金融、人机交互和医疗保健等领域有许多实际应用。不幸的是,多模态研究的资源有限,可用于研究 (1) 跨领域和模态的泛化,(2) 训练和推理过程中的复杂性,以及 (3) 对嘈杂和缺失模态的鲁棒性。为了在确保现实世界稳健性的同时加快研究未充分研究的模态和任务的进展,我们发布了 MultiBench,这是一个系统且统一的大规模基准测试,涵盖 15 个数据集、10 个模态、20 个预测任务和 6 个研究领域。MultiBench 提供自动化的端到端机器学习管道,可简化和标准化数据加载、实验设置和模型评估。为了实现整体评估,MultiBench 提供了一种综合方法来评估 (1) 泛化、(2) 时间和空间复杂性以及 (3) 模态稳健性。MultiBench 为未来的研究带来了有影响力的挑战,包括大规模多模态数据集的可扩展性和对现实缺陷的鲁棒性。为了配合此基准测试,我们还提供了多模态学习中 20 种核心方法的标准化实现。简单地应用在不同研究领域提出的方法可以提高 9/15 数据集的最新性能。所以,MultiBench 是统一多模态研究中的一个里程碑,为更好地理解多模态模型的能力和局限性铺平了道路,同时确保了易用性、可访问性和可重复性。MultiBench、我们的标准化代码和排行榜是公开可用的,将定期更新,并欢迎来自社区的投入。
更新日期:2021-07-16
down
wechat
bug