当前位置: X-MOL 学术arXiv.cs.AI › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Interpretable Visual Reasoning via Induced Symbolic Space
arXiv - CS - Artificial Intelligence Pub Date : 2020-11-23 , DOI: arxiv-2011.11603
Zhonghao Wang, Mo Yu, Kai Wang, Jinjun Xiong, Wen-mei Hwu, Mark Hasegawa-Johnson, Humphrey Shi

We study the problem of concept induction in visual reasoning, i.e., identifying concepts and their hierarchical relationships from question-answer pairs associated with images; and achieve an interpretable model via working on the induced symbolic concept space. To this end, we first design a new framework named object-centric compositional attention model (OCCAM) to perform the visual reasoning task with object-level visual features. Then, we come up with a method to induce concepts of objects and relations using clues from the attention patterns between objects' visual features and question words. Finally, we achieve a higher level of interpretability by imposing OCCAM on the objects represented in the induced symbolic concept space. Experiments on the CLEVR dataset demonstrate: 1) our OCCAM achieves a new state of the art without human-annotated functional programs; 2) our induced concepts are both accurate and sufficient as OCCAM achieves an on-par performance on objects represented either in visual features or in the induced symbolic concept space.

中文翻译:

通过诱导符号空间可解释的视觉推理

我们研究视觉推理中的概念归纳问题,即从与图像相关的问题-答案对中识别概念及其层次关系;并通过在归纳的符号概念空间上工作来实现可解释的模型。为此,我们首先设计一个名为“以对象为中心的成分注意模型”(OCCAM)的新框架,以执行具有对象级视觉特征的视觉推理任务。然后,我们提出了一种方法,该方法利用从对象的视觉特征和疑问词之间的注意模式中获得的线索来推断对象和关系的概念。最后,我们通过在诱导符号概念空间中表示的对象上施加OCCAM,来实现更高级别的可解释性。CLEVR数据集上的实验表明:1)我们的OCCAM无需人工注释的功能程序即可达到最新的技术水平;2)我们的归纳概念既准确又充分,因为OCCAM可以在视觉特征或归纳符号概念空间中表示的对象上实现同等的性能。
更新日期:2020-11-25
down
wechat
bug