Describing video scenarios using deep learning techniques,International Journal of Intelligent Systems

当前位置： X-MOL 学术 › Int. J. Intell. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Describing video scenarios using deep learning techniques
International Journal of Intelligent Systems ( IF 7 ) Pub Date : 2021-02-25 , DOI: 10.1002/int.22387
Yin‐Fu Huang, Li‐Ping Shih, Chia‐Hsin Tsai, Guan‐Ting Shen

The combination of computer vision and natural language processing is still a very challenging issue. In contrast to previous models focusing on generating only a single sentence for a video, we think that describing a longer video is an important application. In this paper, we propose a video scenario description system that considers video genres to generate multiple sentences. First, the semantics and genres of videos are analyzed. Next, video descriptions are also analyzed. Then, relevant semantic features are selected and translated into the corresponding video descriptions through deep learning. In the experiments, we compare the generated video descriptions based on four evaluation metrics. The results reveal our method is comparable with the state‐of‐the‐art methods.

中文翻译：

使用深度学习技术描述视频场景

计算机视觉和自然语言处理的结合仍然是一个非常具有挑战性的问题。与以前的仅专注于为视频生成单个句子的模型相反，我们认为描述更长的视频是一个重要的应用。在本文中，我们提出了一种视频场景描述系统，该系统考虑了视频流派以生成多个句子。首先，分析视频的语义和类型。接下来，还将分析视频描述。然后，通过深度学习选择相关的语义特征并将其翻译为相应的视频描述。在实验中，我们根据四个评估指标比较了生成的视频描述。结果表明，我们的方法可与最新方法相媲美。

更新日期：2021-04-27

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>