当前位置: X-MOL 学术arXiv.cs.LO › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
HySTER: A Hybrid Spatio-Temporal Event Reasoner
arXiv - CS - Logic in Computer Science Pub Date : 2021-01-17 , DOI: arxiv-2101.06644
Theophile Sautory, Nuri Cingillioglu, Alessandra Russo

The task of Video Question Answering (VideoQA) consists in answering natural language questions about a video and serves as a proxy to evaluate the performance of a model in scene sequence understanding. Most methods designed for VideoQA up-to-date are end-to-end deep learning architectures which struggle at complex temporal and causal reasoning and provide limited transparency in reasoning steps. We present the HySTER: a Hybrid Spatio-Temporal Event Reasoner to reason over physical events in videos. Our model leverages the strength of deep learning methods to extract information from video frames with the reasoning capabilities and explainability of symbolic artificial intelligence in an answer set programming framework. We define a method based on general temporal, causal and physics rules which can be transferred across tasks. We apply our model to the CLEVRER dataset and demonstrate state-of-the-art results in question answering accuracy. This work sets the foundations for the incorporation of inductive logic programming in the field of VideoQA.

中文翻译:

HySTER:时空混合事件推理机

视频问题解答(VideoQA)的任务在于回答有关视频的自然语言问题,并充当代理来评估场景序列理解中模型的性能。大多数为VideoQA设计的最新方法都是端到端的深度学习体系结构,这些体系结构在复杂的时间和因果推理中苦苦挣扎,并且推理步骤的透明性有限。我们介绍了HySTER:一种时空混合事件推理机,可以对视频中的物理事件进行推理。我们的模型利用深度学习方法的优势,在答案集编程框架中以符号人工智能的推理能力和可解释性从视频帧中提取信息。我们定义了一种基于一般时间,因果和物理规则的方法,该规则可以跨任务传递。我们将模型应用于CLEVRER数据集,并展示了有关问题解答准确性的最新结果。这项工作为将归纳逻辑编程纳入VideoQA领域奠定了基础。
更新日期:2021-01-19
down
wechat
bug