当前位置: X-MOL 学术Future Gener. Comput. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Video quality evaluation toward complicated sport activities for clustering analysis
Future Generation Computer Systems ( IF 7.5 ) Pub Date : 2021-01-19 , DOI: 10.1016/j.future.2021.01.018
Wei Yang , Jian Wang , Jinlong Shi

Automatically clustering various sophisticated human activities (e.g., dancing, martial arts, and gymnastics) based on their quality scores is an indispensable technique in physical training, human–computer interaction, etc. Conventionally, many action recognition models are built upon the visual/semantic appearance of human body movements. Recently, due to the introduction of Microsoft Kinect, many skeleton-based human action understanding frameworks have been proposed. In this work, we propose a novel method to cluster the quality of complicated human actions towards contactless operative video reading system (COVRS). More specifically, we first extract the skeleton by leveraging the Kinect, which is subsequently fed into an aggregation deep neural network to extract the deep feature for each human action skeleton. In COVRS, the human hand gesture is an informative clue. Thus, we propose a ranking algorithm to extract the position of human five figures, based on which the deep hand gesture representation is hierarchically learned. Noticeably, it is observable that, the acoustic feature from many human activities also contributes to the quality assessment. We extract multiple acoustic features from the audio associated with each human activity video. Finally, based on the above human skeleton and hand gesture deep features, as well as the shallow acoustic features, we employ a probabilistic model to integrate them for clustering the various human activities using the quality of COVRS. Comprehensive experimental have demonstrated the effectiveness and efficiency of our method. Besides, empirical results have shown that our probabilistic quality model is highly extensible, where additionally visual/acoustic features can be encoded according to different applications.



中文翻译:

针对复杂体育活动的视频质量评估以进行聚类分析

根据质量得分自动将各种复杂的人类活动(例如,舞蹈,武术和体操)进行聚类是体育锻炼,人机交互等必不可少的技术。通常,许多动作识别模型都是基于视觉/语义的人体运动的外观。最近,由于Microsoft Kinect的引入,已经提出了许多基于骨骼的人类动作理解框架。在这项工作中,我们提出了一种新颖的方法来将复杂的人类行为的质量集中到非接触式手术视频阅读系统(COVRS)。更具体地说,我们首先利用Kinect提取骨骼,然后将其输入到聚合深度神经网络中,以提取每个人类动作骨骼的深度特征。在COVRS中,人类的手势是一个有用的线索。因此,我们提出了一种排序算法来提取人的五个图形的位置,在此基础上可以学习深度手势表示。值得注意的是,许多人类活动的声学特征也有助于质量评估。我们从与每个人类活动视频相关的音频中提取多个声学特征。最后,基于上述人体骨骼和手势的深层特征以及浅层声学特征,我们采用概率模型将它们集成在一起,从而利用COVRS的质量对各种人类活动进行聚类。综合实验证明了该方法的有效性和有效性。除了,

更新日期:2021-02-08
down
wechat
bug