当前位置: X-MOL 学术Multimed. Tools Appl. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
GVSUM: generic video summarization using deep visual features
Multimedia Tools and Applications ( IF 3.6 ) Pub Date : 2021-01-23 , DOI: 10.1007/s11042-020-10460-0
Madhushree Basavarajaiah , Priyanka Sharma

Video Summarization is the method of producing a summary of the video content. A generic video summarization method named GVSUM is proposed in this paper. The generic summary is generated by choosing keyframes whenever a major scene change occurs in the video. All frames of the video are assigned a cluster number based on their visual features and the keyframes are extracted when the cluster number of the frame changes. Visual features of the video are extracted from a pre-trained Convolutional Neural Network (CNN) and then k-means clustering is applied on these features followed by a sequential keyframe generation technique. However, the optimum value of number of clusters can also be chosen before summarizing by applying Average Silhouette Width method. Mean Opinion Scores (MOS) of the summaries generated show that the GVSUM approach gives satisfactory results for a generic video summarization as it picks up a frame wherever the the visual content changes. The quantitative F1 measure also shows promising results.



中文翻译:

GVSUM:使用深层视觉功能的通用视频摘要

视频摘要是产生视频内容摘要的方法。本文提出了一种通用的视频汇总方法GVSUM。只要视频中发生重大场景变化,就可以通过选择关键帧来生成通用摘要。根据视频的所有视觉特征,为视频的所有帧分配一个聚类编号,并在帧的聚类编号更改时提取关键帧。从预训练的卷积神经网络(CNN)中提取视频的视觉特征,然后将k均值聚类应用于这些特征,然后再采用顺序关键帧生成技术。但是,也可以在应用平均轮廓宽度方法进行汇总之前,选择聚类数的最佳值。生成的摘要的平均意见评分(MOS)表明,GVSUM方法在可视内容发生变化的任何地方拾取帧时,都能为通用视频摘要提供令人满意的结果。定量的F 1度量也显示出可喜的结果。

更新日期:2021-01-24
down
wechat
bug