VSumVis: Interactive Visual Understanding and Diagnosis of Video Summarization Model,ACM Transactions on Intelligent Systems and Technology

当前位置： X-MOL 学术 › ACM Trans. Intell. Syst. Technol. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

VSumVis: Interactive Visual Understanding and Diagnosis of Video Summarization Model
ACM Transactions on Intelligent Systems and Technology ( IF 5 ) Pub Date : 2021-06-08 , DOI: 10.1145/3458928
Guodao Sun ₁ , Hao Wu ₁ , Lin Zhu ₁ , Chaoqing Xu ₁ , Haoran Liang ₁ , Binwei Xu ₁ , Ronghua Liang ₁

Affiliation

With the rapid development of mobile Internet, the popularity of video capture devices has brought a surge in multimedia video resources. Utilizing machine learning methods combined with well-designed features, we could automatically obtain video summarization to relax video resource consumption and retrieval issues. However, there always exists a gap between the summarization obtained by the model and the ones annotated by users. How to help users understand the difference, provide insights in improving the model, and enhance the trust in the model remains challenging in the current study. To address these challenges, we propose VSumVis under a user-centered design methodology, a visual analysis system with multi-feature examination and multi-level exploration, which could help users explore and analyze video content, as well as the intrinsic relationship that existed in our video summarization model. The system contains multiple coordinated views, i.e., video view, projection view, detail view, and sequential frames view. A multi-level analysis process to integrate video events and frames are presented with clusters and nodes visualization in our system. Temporal patterns concerning the difference between the manual annotation score and the saliency score produced by our model are further investigated and distinguished with sequential frames view. Moreover, we propose a set of rich user interactions that enable an in-depth, multi-faceted analysis of the features in our video summarization model. We conduct case studies and interviews with domain experts to provide anecdotal evidence about the effectiveness of our approach. Quantitative feedback from a user study confirms the usefulness of our visual system for exploring the video summarization model.

中文翻译：

VSumVis：视频摘要模型的交互式视觉理解与诊断

随着移动互联网的快速发展，视频采集设备的普及带来了多媒体视频资源的激增。利用机器学习方法结合精心设计的特征，我们可以自动获得视频摘要，以缓解视频资源消耗和检索问题。但是，模型获得的摘要与用户注释的摘要之间总是存在差距。如何帮助用户理解差异，提供改进模型的见解，增强对模型的信任，仍然是当前研究的挑战。为了应对这些挑战，我们提出了以用户为中心的设计方法论下的VSumVis，一个具有多特征检查和多层次探索的视觉分析系统，可以帮助用户探索和分析视频内容，以及我们的视频摘要模型中存在的内在关系。该系统包含多个协调视图，即视频视图、投影视图、细节视图和顺序帧视图。在我们的系统中，通过集群和节点可视化呈现了一个集成视频事件和帧的多层次分析过程。关于手动注释分数和我们模型产生的显着性分数之间差异的时间模式进一步研究并通过顺序帧视图进行区分。此外，我们提出了一组丰富的用户交互，可以对我们的视频摘要模型中的特征进行深入、多方面的分析。我们进行案例研究并与领域专家进行访谈，以提供有关我们方法有效性的轶事证据。

更新日期：2021-06-08

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>