当前位置: X-MOL 学术IEEE Trans. Pattern Anal. Mach. Intell. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Hybrid RNN-HMM Approach for Weakly Supervised Temporal Action Segmentation.
IEEE Transactions on Pattern Analysis and Machine Intelligence ( IF 23.6 ) Pub Date : 2018-12-21 , DOI: 10.1109/tpami.2018.2884469
Hilde Kuehne , Alexander Richard , Juergen Gall

Action recognition has become a rapidly developing research field within the last decade. But with the increasing demand for large scale data, the need of hand annotated data for the training becomes more and more impractical. One way to avoid frame-based human annotation is the use of action order information to learn the respective action classes. In this context, we propose a hierarchical approach to address the problem of weakly supervised learning of human actions from ordered action labels by structuring recognition in a coarse-to-fine manner. Given a set of videos and an ordered list of the occurring actions, the task is to infer start and end frames of the related action classes within the video and to train the respective action classifiers without any need for hand labeled frame boundaries. We address this problem by combining a framewise RNN model with a coarse probabilistic inference. This combination allows for the temporal alignment of long sequences and thus, for an iterative training of both elements. While this system alone already generates good results, we show that the performance can be further improved by approximating the number of subactions to the characteristics of the different action classes as well as by the introduction of a regularizing length prior. The proposed system is evaluated on two benchmark datasets, the Breakfast and the Hollywood extended dataset, showing a competitive performance on various weak learning tasks such as temporal action segmentation and action alignment.

中文翻译:

用于弱监督时间行为分割的混合RNN-HMM方法。

在过去十年中,动作识别已成为一个快速发展的研究领域。但是随着对大规模数据需求的增加,用于训练的手注释数据的需求变得越来越不切实际。避免基于帧的人类注释的一种方法是使用动作顺序信息来学习相应的动作类。在这种情况下,我们提出了一种层次结构的方法,通过以粗糙到精细的方式构造识别,来解决从有序动作标签中弱监督学习人类动作的问题。给定一组视频和已发生动作的有序列表,任务是推断视频内相关动作类的开始和结束帧,并训练各个动作分类器,而无需手工标记帧边界。我们通过将基于框架的RNN模型与粗略的概率推断相结合来解决此问题。这种组合允许长序列在时间上进行比对,因此可以迭代训练两个元素。虽然仅此系统已经产生了良好的结果,但我们表明,可以通过根据不同动作类的特征近似子动作的数量以及通过引入规则长度来进一步提高性能。所提议的系统在两个基准数据集(早餐和好莱坞扩展数据集)上进行了评估,显示出在各种弱学习任务(例如时间动作分段和动作对齐)上的竞争表现。对这两个元素进行迭代训练。虽然仅此系统已经产生了良好的结果,但我们表明,可以通过根据不同动作类的特征近似子动作的数量以及通过引入规则长度来进一步提高性能。所提议的系统在两个基准数据集(早餐和好莱坞扩展数据集)上进行了评估,显示出在各种弱学习任务(例如时间动作分段和动作对齐)上的竞争表现。对这两个元素进行迭代训练。虽然仅此系统已经产生了良好的结果,但我们表明,可以通过根据不同动作类的特征近似子动作的数量以及通过引入规则长度来进一步提高性能。所提议的系统在两个基准数据集(早餐和好莱坞扩展数据集)上进行了评估,显示出在各种弱学习任务(例如时间动作分段和动作对齐)上的竞争表现。
更新日期:2020-03-06
down
wechat
bug