当前位置: X-MOL 学术Int. J. Comput. Vis. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Beyond Temporal Pooling: Recurrence and Temporal Convolutions for Gesture Recognition in Video
International Journal of Computer Vision ( IF 11.6 ) Pub Date : 2016-10-04 , DOI: 10.1007/s11263-016-0957-7
Lionel Pigou , Aäron van den Oord , Sander Dieleman , Mieke Van Herreweghe , Joni Dambre

Recent studies have demonstrated the power of recurrent neural networks for machine translation, image captioning and speech recognition. For the task of capturing temporal structure in video, however, there still remain numerous open research questions. Current research suggests using a simple temporal feature pooling strategy to take into account the temporal aspect of video. We demonstrate that this method is not sufficient for gesture recognition, where temporal information is more discriminative compared to general video classification tasks. We explore deep architectures for gesture recognition in video and propose a new end-to-end trainable neural network architecture incorporating temporal convolutions and bidirectional recurrence. Our main contributions are twofold; first, we show that recurrence is crucial for this task; second, we show that adding temporal convolutions leads to significant improvements. We evaluate the different approaches on the Montalbano gesture recognition dataset, where we achieve state-of-the-art results.

中文翻译:

超越时间池:视频中手势识别的循环和时间卷积

最近的研究已经证明了循环神经网络在机器翻译、图像字幕和语音识别方面的力量。然而,对于捕获视频中的时间结构的任务,仍然存在许多开放的研究问题。当前的研究建议使用简单的时间特征池策略来考虑视频的时间方面。我们证明这种方法不足以用于手势识别,与一般视频分类任务相比,时间信息更具辨别力。我们探索了视频中手势识别的深层架构,并提出了一种新的端到端可训练神经网络架构,该架构结合了时间卷积和双向递归。我们的主要贡献是双重的;首先,我们证明了重复对于这项任务至关重要;第二,我们表明添加时间卷积会带来显着的改进。我们在 Montalbano 手势识别数据集上评估了不同的方法,在那里我们取得了最先进的结果。
更新日期:2016-10-04
down
wechat
bug