What is Multimodality?,arXiv - CS - General Literature

当前位置： X-MOL 学术 › arXiv.cs.GL › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

What is Multimodality?
arXiv - CS - General Literature Pub Date : 2021-03-10 , DOI: arxiv-2103.06304
Letitia Parcalabescu, Nils Trost, Anette Frank

The last years have shown rapid developments in the field of multimodal machine learning, combining e.g., vision, text or speech. In this position paper we explain how the field uses outdated definitions of multimodality that prove unfit for the machine learning era. We propose a new task-relative definition of (multi)modality in the context of multimodal machine learning that focuses on representations and information that are relevant for a given machine learning task. With our new definition of multimodality we aim to provide a missing foundation for multimodal research, an important component of language grounding and a crucial milestone towards NLU.

中文翻译：

什么是多式联运？

近年来，结合视觉，文本或语音，在多模式机器学习领域已显示出快速的发展。在本立场文件中，我们解释了该领域如何使用过时的多模态定义，这些定义被证明不适合机器学习时代。在多峰机器学习的上下文中，我们提出了一个新的（多）模态的相对于任务的定义，该定义着重于与给定机器学习任务相关的表示和信息。借助我们对多模态的新定义，我们旨在为多模态研究提供一个缺少的基础，语言基础的重要组成部分以及迈向NLU的重要里程碑。

更新日期：2021-03-12

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>