DTW-CNN: time series-based human interaction prediction in videos using CNN-extracted features,The Visual Computer

当前位置： X-MOL 学术 › Vis. Comput. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

DTW-CNN: time series-based human interaction prediction in videos using CNN-extracted features
The Visual Computer ( IF 3.0 ) Pub Date : 2019-07-11 , DOI: 10.1007/s00371-019-01722-6
Mahlagha Afrasiabi , Hassan khotanlou , Muharram Mansoorizadeh

Recently, the prediction of interactions in videos has been an active subject in computer vision. Its goal is to deduce interactions in their early stages. Many approaches have been proposed to predict interaction, but it still remains a challenging problem. In the present paper, features are optical flow fields extracted from video frames using convolutional neural networks. This feature, which is extracted from successive frames, constructs a time series. Then, the problem is modeled in the form of a time series prediction. Prediction of the interaction type is based on matching the time series under experiment with the time series available in the training set. Dynamic time warping provides an optimal match between a pair of time-series data by a nonlinear mapping between two data. Finally, the SVM and KNN classification methods with dynamic time warping distance are used to predict the video label. The results showed that the proposed model improved on standard interaction recognition datasets including the TVHI, BIT, and UT interaction.

中文翻译：

DTW-CNN：使用 CNN 提取的特征在视频中进行基于时间序列的人机交互预测

最近，视频中交互的预测一直是计算机视觉中的一个活跃主题。它的目标是在早期阶段推断交互。已经提出了许多方法来预测相互作用，但它仍然是一个具有挑战性的问题。在本文中，特征是使用卷积神经网络从视频帧中提取的光流场。这个从连续帧中提取的特征构建了一个时间序列。然后，以时间序列预测的形式对问题进行建模。交互类型的预测基于将实验中的时间序列与训练集中可用的时间序列进行匹配。动态时间扭曲通过两个数据之间的非线性映射提供一对时间序列数据之间的最佳匹配。最后，使用具有动态时间扭曲距离的 SVM 和 KNN 分类方法来预测视频标签。结果表明，所提出的模型改进了标准交互识别数据集，包括 TVHI、BIT 和 UT 交互。

更新日期：2019-07-11

点击分享查看原文

点击收藏

阅读更多本刊最新论文