Knowledge Guided Learning: Towards Open Domain Egocentric Action Recognition with Zero Supervision,arXiv - CS - Artificial Intelligence

当前位置： X-MOL 学术 › arXiv.cs.AI › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Knowledge Guided Learning: Towards Open Domain Egocentric Action Recognition with Zero Supervision
arXiv - CS - Artificial Intelligence Pub Date : 2020-09-16 , DOI: arxiv-2009.07470
Sathyanarayanan N. Aakur, Sanjoy Kundu, Nikhil Gunti

Advances in deep learning have enabled the development of models that have exhibited a remarkable tendency to recognize and even localize actions in videos. However, they tend to experience errors when faced with scenes or examples beyond their initial training environment. Hence, they fail to adapt to new domains without significant retraining with large amounts of annotated data. Current algorithms are trained in an inductive learning environment where they use data-driven models to learn associations between input observations with a fixed set of known classes. In this paper, we propose to overcome these limitations by moving to an open world setting by decoupling the ideas of recognition and reasoning. Building upon the compositional representation offered by Grenander's Pattern Theory formalism, we show that attention and commonsense knowledge can be used to enable the self-supervised discovery of novel actions in egocentric videos in an open-world setting, a considerably more difficult task than zero-shot learning and (un)supervised domain adaptation tasks where target domain data (both labeled and unlabeled) are available during training. We show that our approach can be used to infer and learn novel classes for open vocabulary classification in egocentric videos and novel object detection with zero supervision. Extensive experiments show that it performs competitively with fully supervised baselines on publicly available datasets under open-world conditions. This is one of the first works to address the problem of open-world action recognition in egocentric videos with zero human supervision to the best of our knowledge.

中文翻译：

知识引导学习：实现零监督的开放领域以自我为中心的动作识别

深度学习的进步推动了模型的发展，这些模型表现出显着的趋势来识别甚至定位视频中的动作。然而，当面对超出初始训练环境的场景或示例时，他们往往会遇到错误。因此，如果不使用大量带注释的数据进行大量再训练，它们就无法适应新领域。当前的算法在归纳学习环境中进行训练，在那里它们使用数据驱动模型来学习输入观察与一组固定的已知类之间的关联。在本文中，我们建议通过将识别和推理的思想解耦，转向开放世界环境来克服这些限制。基于格伦南德的模式理论形式主义提供的组合表示，我们表明，注意力和常识知识可用于在开放世界环境中自我监督地发现以自我为中心的视频中的新动作，这是一项比零镜头学习和（无）监督域适应任务更困难的任务域数据（标记和未标记）在训练期间可用。我们表明，我们的方法可用于推断和学习新类别，以在以自我为中心的视频中进行开放词汇分类和零监督的新对象检测。大量实验表明，它在开放世界条件下的公开可用数据集上与完全监督的基线相比具有竞争力。据我们所知，这是在零人工监督的以自我为中心的视频中解决开放世界动作识别问题的首批作品之一。比零样本学习和（非）监督域适应任务困难得多的任务，其中目标域数据（标记和未标记）在训练期间可用。我们表明，我们的方法可用于推断和学习新类别，以在以自我为中心的视频中进行开放词汇分类和零监督的新对象检测。大量实验表明，它在开放世界条件下的公开可用数据集上与完全监督的基线相比具有竞争力。据我们所知，这是在零人工监督的以自我为中心的视频中解决开放世界动作识别问题的首批作品之一。比零样本学习和（非）监督域适应任务困难得多的任务，其中目标域数据（标记和未标记）在训练期间可用。我们表明，我们的方法可用于推断和学习新类别，以在以自我为中心的视频中进行开放词汇分类和零监督的新对象检测。大量实验表明，它在开放世界条件下的公开可用数据集上与完全监督的基线相比具有竞争力。据我们所知，这是在零人工监督的以自我为中心的视频中解决开放世界动作识别问题的首批作品之一。我们表明，我们的方法可用于推断和学习新类别，以在以自我为中心的视频中进行开放词汇分类和零监督的新对象检测。大量实验表明，它在开放世界条件下的公开可用数据集上与完全监督的基线相比具有竞争力。据我们所知，这是在零人工监督的以自我为中心的视频中解决开放世界动作识别问题的首批作品之一。我们表明，我们的方法可用于推断和学习新类别，以在以自我为中心的视频中进行开放词汇分类和零监督的新对象检测。大量实验表明，它在开放世界条件下的公开可用数据集上与完全监督的基线相比具有竞争力。据我们所知，这是在零人工监督的以自我为中心的视频中解决开放世界动作识别问题的首批作品之一。

更新日期：2020-09-17

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>