Perceiver IO: A General Architecture for Structured Inputs & Outputs,arXiv - CS - Computer Vision and Pattern Recognition

当前位置： X-MOL 学术 › arXiv.cs.CV › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Perceiver IO: A General Architecture for Structured Inputs & Outputs
arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2021-07-30 , DOI: arxiv-2107.14795
Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Andrew Brock, Evan Shelhamer, Olivier Hénaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, João Carreira

The recently-proposed Perceiver model obtains good results on several domains (images, audio, multimodal, point clouds) while scaling linearly in compute and memory with the input size. While the Perceiver supports many kinds of inputs, it can only produce very simple outputs such as class scores. Perceiver IO overcomes this limitation without sacrificing the original's appealing properties by learning to flexibly query the model's latent space to produce outputs of arbitrary size and semantics. Perceiver IO still decouples model depth from data size and still scales linearly with data size, but now with respect to both input and output sizes. The full Perceiver IO model achieves strong results on tasks with highly structured output spaces, such as natural language and visual understanding, StarCraft II, and multi-task and multi-modal domains. As highlights, Perceiver IO matches a Transformer-based BERT baseline on the GLUE language benchmark without the need for input tokenization and achieves state-of-the-art performance on Sintel optical flow estimation.

中文翻译：

Perceiver IO：结构化输入和输出的通用架构

最近提出的 Perceiver 模型在多个领域（图像、音频、多模态、点云）上获得了良好的结果，同时在计算和内存中随输入大小线性扩展。虽然 Perceiver 支持多种输入，但它只能产生非常简单的输出，例如类分数。Perceiver IO 通过学习灵活地查询模型的潜在空间以产生任意大小和语义的输出，从而克服了这一限制，而不会牺牲原始的吸引人的属性。Perceiver IO 仍然将模型深度与数据大小分离，并且仍然与数据大小线性扩展，但现在关于输入和输出大小。完整的 Perceiver IO 模型在具有高度结构化输出空间的任务上取得了很好的结果，例如自然语言和视觉理解、星际争霸 II、以及多任务和多模式领域。作为亮点，Perceiver IO 在 GLUE 语言基准上与基于 Transformer 的 BERT 基线相匹配，无需输入标记化，并在 Sintel 光流估计上实现了最先进的性能。

更新日期：2021-08-02

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>