当前位置: X-MOL 学术IEEE Trans. Pattern Anal. Mach. Intell. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Hybrid RNN-HMM Approach for Weakly Supervised Temporal Action Segmentation
IEEE Transactions on Pattern Analysis and Machine Intelligence ( IF 20.8 ) Pub Date : 12-21-2018 , DOI: 10.1109/tpami.2018.2884469
Hilde Kuehne , Alexander Richard , Juergen Gall

Action recognition has become a rapidly developing research field within the last decade. But with the increasing demand for large scale data, the need of hand annotated data for the training becomes more and more impractical. One way to avoid frame-based human annotation is the use of action order information to learn the respective action classes. In this context, we propose a hierarchical approach to address the problem of weakly supervised learning of human actions from ordered action labels by structuring recognition in a coarse-to-fine manner. Given a set of videos and an ordered list of the occurring actions, the task is to infer start and end frames of the related action classes within the video and to train the respective action classifiers without any need for hand labeled frame boundaries. We address this problem by combining a framewise RNN model with a coarse probabilistic inference. This combination allows for the temporal alignment of long sequences and thus, for an iterative training of both elements. While this system alone already generates good results, we show that the performance can be further improved by approximating the number of subactions to the characteristics of the different action classes as well as by the introduction of a regularizing length prior. The proposed system is evaluated on two benchmark datasets, the Breakfast and the Hollywood extended dataset, showing a competitive performance on various weak learning tasks such as temporal action segmentation and action alignment.

中文翻译:


用于弱监督时间动作分割的混合 RNN-HMM 方法



动作识别在过去十年中已成为一个快速发展的研究领域。但随着对大规模数据的需求不断增加,手动标注数据进行训练的需求变得越来越不切实际。避免基于帧的人工注释的一种方法是使用动作顺序信息来学习相应的动作类。在这种情况下,我们提出了一种分层方法,通过以从粗到细的方式构建识别来解决从有序动作标签弱监督学习人类动作的问题。给定一组视频和发生动作的有序列表,任务是推断视频中相关动作类的开始和结束帧,并训练相应的动作分类器,而不需要手动标记帧边界。我们通过将逐帧 RNN 模型与粗略概率推理相结合来解决这个问题。这种组合允许长序列的时间对齐,从而允许两个元素的迭代训练。虽然这个系统本身已经产生了良好的结果,但我们表明,通过将子动作的数量近似于不同动作类的特征以及引入正则化长度先验,可以进一步提高性能。所提出的系统在两个基准数据集(早餐数据集和好莱坞扩展数据集)上进行了评估,显示出在各种弱学习任务(例如时间动作分割和动作对齐)上的竞争性能。
更新日期:2024-08-22
down
wechat
bug