Learnable spatiotemporal feature pyramid for prediction of future optical flow in videos,Machine Vision and Applications

当前位置： X-MOL 学术 › Mach. Vis. Appl. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Learnable spatiotemporal feature pyramid for prediction of future optical flow in videos
Machine Vision and Applications ( IF 3.3 ) Pub Date : 2020-11-17 , DOI: 10.1007/s00138-020-01145-7
Laisha Wadhwa , Snehasis Mukherjee

The success of deep learning-based techniques in solving various computer vision problems motivated the researchers to apply deep learning to predict the optical flow of a video in the next frame. However, the problem of predicting the motion of an object in the next few frames remains an unsolved and less explored problem. Given a sequence of frames, predicting the motion in the next few frames of the video becomes difficult in cases where the displacement of optical flow vector across frames is large. Traditional CNNs often fail to learn the dynamics of the objects across frames in case of large displacements of objects in consecutive frames. In this paper, we present an efficient CNN based on the concept of feature pyramid for extracting the spatial features from a few consecutive frames. The spatial features extracted from consecutive frames by a modified PWC-Net architecture are fed into a bidirectional LSTM for obtaining the temporal features. The proposed spatiotemporal feature pyramid is able to capture the abrupt motion of the moving objects in video, especially when displacement of the object is large across the consecutive frames. Further, the proposed spatiotemporal pyramidal feature can effectively predict the optical flow in next few frames, instead of predicting only the next frame. The proposed method of predicting optical flow outperforms the state of the art when applied on challenging datasets such as “MPI Sintel Final Pass,” “Monkaa” and “Flying Chairs” where abrupt and large displacement of the moving objects in consecutive frames is the main challenge.

中文翻译：

可学习的时空特征金字塔，用于预测视频中未来的光流

基于深度学习的技术成功解决了各种计算机视觉问题，促使研究人员应用深度学习来预测下一帧视频的光流。然而，预测对象在接下来的几帧中的运动的问题仍然是一个尚未解决且探索较少的问题。给定一系列帧，在跨帧的光流矢量位移很大的情况下，很难预测视频的后几个帧中的运动。如果对象在连续帧中发生大位移，传统的CNN常常无法学习跨帧的对象动力学。在本文中，我们提出了一种基于特征金字塔概念的有效CNN，用于从几个连续的帧中提取空间特征。通过修改的PWC-Net体系结构从连续帧中提取的空间特征被馈送到双向LSTM中，以获取时间特征。提出的时空特征金字塔能够捕获视频中运动对象的突然运动，尤其是当对象在连续帧中的位移较大时。此外，提出的时空金字塔特征可以有效地预测接下来的几帧中的光流，而不是仅预测下一帧。当在具有挑战性的数据集（例如“ MPI Sintel Final Pass”，“ Monkaa”和“ Flying Chairs”）中使用挑战性数据集时，所提出的预测光流的方法优于现有技术，在这些数据集中，连续帧中移动对象的突然和大位移是主要挑战。

更新日期：2020-11-17

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>