当前位置: X-MOL 学术Int. J. Comput. Vis. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Rescaling Egocentric Vision: Collection, Pipeline and Challenges for EPIC-KITCHENS-100
International Journal of Computer Vision ( IF 19.5 ) Pub Date : 2021-10-20 , DOI: 10.1007/s11263-021-01531-2
Dima Damen 1 , Hazel Doughty 1, 2 , Evangelos Kazakos 1 , Jian Ma 1 , Davide Moltisanti 1 , Jonathan Munro 1 , Toby Perrett 1 , Will Price 1 , Michael Wray 1 , Giovanni Maria Farinella 3 , Antonino Furnari 3
Affiliation  

This paper introduces the pipeline to extend the largest dataset in egocentric vision, EPIC-KITCHENS. The effort culminates in EPIC-KITCHENS-100, a collection of 100 hours, 20M frames, 90K actions in 700 variable-length videos, capturing long-term unscripted activities in 45 environments, using head-mounted cameras. Compared to its previous version (Damen in Scaling egocentric vision: ECCV, 2018), EPIC-KITCHENS-100 has been annotated using a novel pipeline that allows denser (54% more actions per minute) and more complete annotations of fine-grained actions (+128% more action segments). This collection enables new challenges such as action detection and evaluating the “test of time”—i.e. whether models trained on data collected in 2018 can generalise to new footage collected two years later. The dataset is aligned with 6 challenges: action recognition (full and weak supervision), action detection, action anticipation, cross-modal retrieval (from captions), as well as unsupervised domain adaptation for action recognition. For each challenge, we define the task, provide baselines and evaluation metrics.



中文翻译:

重新调整以自我为中心的愿景:EPIC-KITCHENS-100 的收集、管道和挑战

本文介绍了扩展以自我为中心的最大数据集 EPIC-KITCHENS 的管道。最终成果是 EPIC-KITCHENS-100,该系列包含 700 个可变长度视频中的 100 小时、20M 帧、90K 动作,使用头戴式摄像头在 45 种环境中捕捉长期无脚本活动。与其之前的版本(Damen in Scaling egocentric vision: ECCV, 2018)相比,EPIC-KITCHENS-100 使用了一种新颖的管道进行了注释,该管道允许更密集(每分钟增加 54% 的操作)和更完整的细粒度操作注释( +128% 的动作段)。该集合带来了新的挑战,例如动作检测和评估“时间测试”——即根据 2018 年收集的数据训练的模型是否可以推广到两年后收集的新镜头。该数据集符合 6 个挑战:动作识别(全监督和弱监督)、动作检测、动作预期、跨模态检索(来自字幕)以及用于动作识别的无监督域适应。对于每个挑战,我们定义任务,提供基线和评估指标。

更新日期:2021-10-21
down
wechat
bug