当前位置: X-MOL 学术Multimedia Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Key frame extraction based on global motion statistics for team-sport videos
Multimedia Systems ( IF 3.5 ) Pub Date : 2021-05-03 , DOI: 10.1007/s00530-021-00777-7
Yuan Yuan , Zhe Lu , Zhou Yang , Meng Jian , Lifang Wu , Zeyu Li , Xu Liu

Key frame extraction is an important manner of video summarization. It can be used to interpret video content quickly. Existing approaches first partition the entire video into video clips by shot boundary detection, and then, extract key frames by frame clustering. However, in most team-sport videos, a video clip usually includes many events, and it is difficult to extract the key frames related to all of these events accurately, because different events of a game shot can have features of similar appearance. As is well known, most events in team-sport videos are attack and defense conversions, which are related to global translation. Therefore, by using fine-grained partition based on the global motion, a shot could be further partitioned into more video clips, from which more key frames could be extracted and they are related to the events. In this study, global horizontal motion is introduced to further partition video clips into fine-grained video clips. Furthermore, global motion statistics are utilized to extract candidate key frames. Finally, the representative key frames are extracted based on the spatial–temporal consistence and hierarchical clustering, and the redundant frames are removed. A dataset called SportKF is built, which includes 25 videos of 197,878 frames in 112 min and 764 key frames from four types of sports (basketball, football, American football and field hockey). The experimental results demonstrate that the proposed scheme achieves state-of-the-art performance by introducing global motion statistics.



中文翻译:

基于全局运动统计的团队运动视频关键帧提取

关键帧提取是视频汇总的一种重要方式。它可以用于快速解释视频内容。现有方法首先通过镜头边界检测将整个视频划分为视频片段,然后通过帧聚类提取关键帧。但是,在大多数团队运动视频中,视频剪辑通常包含许多事件,并且由于游戏镜头的不同事件可能具有相似的外观特征,因此很难准确地提取与所有这些事件相关的关键帧。众所周知,团队运动视频中的大多数事件都是进攻和防守转换,这与全球翻译有关。因此,通过使用基于全局运动的细粒度分区,可以将镜头进一步划分为更多视频片段,从中可以提取更多关键帧并将它们与事件相关。在这项研究中,引入了全局水平运动以将视频片段进一步划分为细粒度的视频片段。此外,利用全局运动统计信息来提取候选关键帧。最后,基于时空一致性和层次聚类提取代表性关键帧,并去除冗余帧。建立了一个名为SportKF的数据集,其中包括25种视频,这些视频在112分钟内达到197,878帧,并从四种运动(篮球,橄榄球,美式足球和曲棍球)中获得764个关键帧。实验结果表明,该方案通过引入全局运动统计数据来实现最新性能。全局运动统计被用于提取候选关键帧。最后,基于时空一致性和层次聚类提取代表性关键帧,并去除冗余帧。建立了一个名为SportKF的数据集,其中包括25种视频,这些视频在112分钟内达到197,878帧,并从四种运动(篮球,橄榄球,美式足球和曲棍球)中获得764个关键帧。实验结果表明,该方案通过引入全局运动统计数据来实现最新性能。全局运动统计被用于提取候选关键帧。最后,基于时空一致性和层次聚类提取代表性关键帧,并去除冗余帧。建立了名为SportKF的数据集,其中包括25种视频,这些视频在112分钟内达到197,878帧,并从四种运动(篮球,橄榄球,美式足球和曲棍球)中获得764个关键帧。实验结果表明,该方案通过引入全局运动统计数据来实现最新性能。112分钟内878帧,764种关键帧来自四种类型的运动(篮球,足球,美式足球和曲棍球)。实验结果表明,该方案通过引入全局运动统计数据来实现最新性能。112分钟内878帧,764种关键帧来自四种类型的运动(篮球,足球,美式足球和曲棍球)。实验结果表明,该方案通过引入全局运动统计数据来实现最新性能。

更新日期:2021-05-03
down
wechat
bug