Graph Attention Networks Adjusted Bi-LSTM for Video Summarization,IEEE Signal Processing Letters

当前位置： X-MOL 学术 › IEEE Signal Process. Lett. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Graph Attention Networks Adjusted Bi-LSTM for Video Summarization
IEEE Signal Processing Letters ( IF 3.2 ) Pub Date : 2021-03-17 , DOI: 10.1109/lsp.2021.3066349
Rui Zhong ₁ , Rui Wang ₁ , Yang Zou ₁ , Zhiqiang Hong ₁ , Min Hu ₂

Affiliation

The high redundancy among keyframes is a critical issue for the prior summarizing methods in dealing with user-created videos. To address the critical issue, we present a Graph Attention Networks (GAT) adjusted Bi-directional Long Short-term Memory (Bi-LSTM) model for unsupervised video summarization. First, the GAT is adopted to transform an image's visual features into higher-level features by the Contextual Features based Transformation (CFT) mechanism. Specifically, a novel Salient-Area-Size-based spatial attention model is presented to extract frame-wise visual features on the observation that humans tend to focus on sizable and moving objects. Second, the higher-level visual features are integrated with semantic features processed by Bi-LSTM to refine the frame-wise probability of being selected as keyframes. Extensive experiments demonstrate that our method outperforms state-of-the-art methods.

中文翻译：

图注意力网络调整的Bi-LSTM用于视频汇总

关键帧之间的高度冗余对于处理用户创建的视频的现有摘要方法而言是一个关键问题。为了解决关键问题，我们提出了一种图形注意力网络（GAT）调整的双向长期短期记忆（Bi-LSTM）模型，用于无监督视频摘要。首先，采用GAT通过基于上下文特征的转换（CFT）机制将图像的视觉特征转换为更高级别的特征。具体来说，提出了一种基于显着区域大小的新型空间注意力模型，该模型可在人们倾向于关注较大且移动的对象的观察结果中提取逐帧视觉特征。其次，将较高级别的视觉特征与Bi-LSTM处理的语义特征集成在一起，以细化被选择为关键帧的逐帧概率。

更新日期：2021-04-23

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11