当前位置: X-MOL 学术arXiv.cs.SC › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Neuro-Symbolic Visual Reasoning: Disentangling "Visual" from "Reasoning"
arXiv - CS - Symbolic Computation Pub Date : 2020-06-20 , DOI: arxiv-2006.11524
Saeed Amizadeh, Hamid Palangi, Oleksandr Polozov, Yichen Huang, Kazuhito Koishida

Visual reasoning tasks such as visual question answering (VQA) require an interplay of visual perception with reasoning about the question semantics grounded in perception. However, recent advances in this area are still primarily driven by perception improvements (e.g. scene graph generation) rather than reasoning. Neuro-symbolic models such as Neural Module Networks bring the benefits of compositional reasoning to VQA, but they are still entangled with visual representation learning, and thus neural reasoning is hard to improve and assess on its own. To address this, we propose (1) a framework to isolate and evaluate the reasoning aspect of VQA separately from its perception, and (2) a novel top-down calibration technique that allows the model to answer reasoning questions even with imperfect perception. To this end, we introduce a differentiable first-order logic formalism for VQA that explicitly decouples question answering from visual perception. On the challenging GQA dataset, this framework is used to perform in-depth, disentangled comparisons between well-known VQA models leading to informative insights regarding the participating models as well as the task.

中文翻译:

神经符号视觉推理:将“视觉”与“推理”分开

视觉推理任务,例如视觉问答 (VQA),需要视觉感知与基于感知的问题语义推理的相互作用。然而,该领域的最新进展仍然主要由感知改进(例如场景图生成)而不是推理驱动。神经模块网络等神经符号模型为 VQA 带来了组合推理的好处,但它们仍然与视觉表示学习纠缠不清,因此神经推理很难自行改进和评估。为了解决这个问题,我们提出了 (1) 一个框架来隔离和评估 VQA 的推理方面与其感知分开,以及 (2) 一种新颖的自上而下的校准技术,即使感知不完美,它也允许模型回答推理问题。为此,我们为 VQA 引入了一种可微的一阶逻辑形式,它明确地将问题回答与视觉感知分离。在具有挑战性的 GQA 数据集上,该框架用于在著名的 VQA 模型之间进行深入、分离的比较,从而获得有关参与模型和任务的信息性见解。
更新日期:2020-08-27
down
wechat
bug