当前位置: X-MOL 学术Cognit. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Discriminative Dictionary Design for Action Classification in Still Images and Videos
Cognitive Computation ( IF 4.3 ) Pub Date : 2021-03-03 , DOI: 10.1007/s12559-021-09851-8
Abhinaba Roy , Biplab Banerjee , Amir Hussain , Soujanya Poria

In this paper, we address the problem of action recognition from still images and videos. Traditional local features such as SIFT and STIP invariably pose two potential problems: 1) they are not evenly distributed in different entities of a given category and 2) many of such features are not exclusive of the visual concept the entities represent. In order to generate a dictionary taking the aforementioned issues into account, we propose a novel discriminative method for identifying robust and category specific local features which maximize the class separability to a greater extent. Specifically, we pose the selection of potent local descriptors as filtering-based feature selection problem, which ranks the local features per category based on a novel measure of distinctiveness. The underlying visual entities are subsequently represented based on the learned dictionary, and this stage is followed by action classification using the random forest model followed by label propagation refinement. The framework is validated on the action recognition datasets based on still images (Stanford-40) as well as videos (UCF-50). We get 51.2% and 66.7% recognition accuracy for Standford-40 and UCF-50, respectively. Compared to other representative methods from the literature, our approach exhibits superior performances. This proves the effectiveness of adaptive ranking methodology presented in this work.



中文翻译:

静态图像和视频中动作分类的判别词典设计

在本文中,我们解决了从静止图像和视频中识别动作的问题。传统的局部特征(如SIFT和STIP)总是会带来两个潜在的问题:1)它们在给定类别的不同实体中分布不均; 2)许多此类特征并不排除实体所代表的视觉概念。为了生成考虑到上述问题的字典,我们提出了一种新颖的判别方法,用于识别健壮且特定于类别的局部特征,从而最大程度地提高了类的可分离性。具体而言,我们将选择有效的局部描述符作为基于过滤的特征选择问题,该问题基于一种新颖的独特性度量对每个类别的局部特征进行排名。随后基于所学字典来表示基础视觉实体,并且在此阶段之后,使用随机森林模型进行动作分类,然后进行标签传播细化。该框架在基于静止图像(Stanford-40)和视频(UCF-50)的动作识别数据集上得到了验证。我们对Standford-40和UCF-50的识别准确度分别为51.2%和66.7%。与文献中的其他代表性方法相比,我们的方法具有更好的性能。这证明了这项工作中提出的自适应排名方法的有效性。该框架在基于静止图像(Stanford-40)和视频(UCF-50)的动作识别数据集上得到了验证。我们对Standford-40和UCF-50的识别准确度分别为51.2%和66.7%。与文献中的其他代表性方法相比,我们的方法具有更好的性能。这证明了这项工作中提出的自适应排名方法的有效性。该框架在基于静止图像(Stanford-40)和视频(UCF-50)的动作识别数据集上得到了验证。我们对Standford-40和UCF-50的识别准确度分别为51.2%和66.7%。与文献中的其他代表性方法相比,我们的方法具有更好的性能。这证明了这项工作中提出的自适应排名方法的有效性。

更新日期:2021-03-03
down
wechat
bug