Video summarization and captioning using dynamic mode decomposition for surveillance,International Journal of Information Technology

当前位置： X-MOL 学术 › Int. J. Inf. Technol. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Video summarization and captioning using dynamic mode decomposition for surveillance
International Journal of Information Technology Pub Date : 2021-05-13 , DOI: 10.1007/s41870-021-00668-0
Rakesh Radarapu , Akkajosyula Surya Sai Gopal , Madhusudhan NH , Anand Kumar M.

Video surveillance has become a major tool in security maintenance. But analyzing in a playback version to detect any motion or any sort of movements might be tedious work because only for a short length of the video there would be any motion. There would be a lot of time wasted in analyzing the video and also it is impossible to always find the accurate frame where the transition has occurred. So there is a need in obtaining a summary video that captures any changes/motion. With the advancements in image processing using OpenCV and deep learning, video summarization is no longer an impossible work. Captions are generated for the summarized videos using an encoder–decoder captioning model. With the help of large, well-labeled video data sets like common objects in context, Microsoft video description, video captioning is a feasible task. Encoder–decoder models are used extensively to extract text from visual features with the arrival of long short term memory (LSTM). Attention mechanism has been widely used on decoder for the work of video captioning. Keyframes are obtained from very long videos using methods like dynamic mode decomposition, an algorithm in fluid dynamics, OpenCV’s absdiff(). We propose these tools for motion detection and video/image captioning for very long videos which are common in video surveillance.

中文翻译：

使用动态模式分解进行监视的视频摘要和字幕

视频监控已成为安全维护的主要工具。但是分析播放版本以检测任何运动或任何类型的运动可能是繁琐的工作，因为只有短短的视频才会有运动。分析视频将浪费大量时间，并且不可能总是找到发生过渡的准确帧。因此，需要获得一个捕获任何更改/动作的摘要视频。随着使用OpenCV和深度学习进行图像处理的进步，视频摘要不再是不可能的工作。使用编码器-解码器字幕模型为摘要视频生成字幕。借助带有大量标签的视频数据集（如上下文中的常见对象），Microsoft视频描述，视频字幕是一项可行的任务。随着长期短期记忆（LSTM）的到来，编码器/解码器模型被广泛用于从视觉特征中提取文本。注意机制已在解码器上广泛用于视频字幕的工作。关键帧是使用诸如动态模式分解，流体动力学算法，OpenCV的方法之类的方法从很长的视频中获得的absdiff（）。我们建议使用这些工具来对视频监视中常见的超长视频进行运动检测和视频/图像字幕。

更新日期：2021-05-13

点击分享查看原文

点击收藏

阅读更多本刊最新论文