当前位置: X-MOL 学术Image Vis. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Viewpoint constrained and unconstrained Cricket stroke localization from untrimmed videos
Image and Vision Computing ( IF 4.7 ) Pub Date : 2020-05-29 , DOI: 10.1016/j.imavis.2020.103944
Arpan Gupta , Sakthi Balan Muthiah

In this work, we create two new video datasets for the task of temporal Cricket stroke extraction. The two datasets, namely, the Highlights dataset (with approx. 117K frames) and the Generic dataset (with approx. 1.93M frames), comprise of Cricket telecast videos collected from available online sources and down-sampled to 360×640 at 25FPS. These untrimmed videos have been manually annotated with temporal Cricket strokes considering viewpoint invariance assumption.

We construct two learning based localization pipelines which are dependent (Constrained) and independent (Unconstrained) of our viewpoint labeling assumption. The Unconstrained pipeline finetunes a pretrained C3D model with GRU training in disconnected and connected modes, while our Constrained pipeline uses boundary detection with first frame classification for generating the temporal localizations. Two post-processing steps, of filtering and boundary correction, are also discussed which help in improving the overall accuracy values.

A modified evaluation metric, Weighted Mean TIoU, for single category temporal localization problem is also presented and compared with the evaluations of the standard mAP metric (threshold ≥ 0.5) on the created dataset. The best weighted mean TIoU of our method was 0.9376 and 0.7145 on the Highlights and Generic test partitions, respectively.

Moreover, we compare our baseline method with 3D Segment CNNs and Temporal Recurrent Networks (TRNs) which have state of art results on THUMOS 2014 dataset.



中文翻译:

未修剪视频的视点受约束和不受约束Cri行程定位

在这项工作中,我们创建了两个新的视频数据集,用于临时板球笔划提取任务。这两个数据集,即Highlights数据集(约11.7万帧)和Generic数据集(约193万帧),由从可用的在线资源收集并以25FPS下采样到360×640的板球电视广播视频组成。考虑到视点不变性假设,这些未修剪的视频已使用时间板球笔划进行了手动注释。

我们构造了两个基于学习的本地化管道,这些管道与我们的视点标签假设相关(受约束)和独立(不受约束)。无约束管道在断开和连接模式下通过GRU训练对预训练的C3D模型进行微调,而我们的约束管道使用具有第一帧分类的边界检测来生成时间定位。还讨论了滤波和边界校正这两个后处理步骤,它们有助于提高整体精度值。

还提出了针对单类时间定位问题的改进评估指标加权平均TIoU,并将其与对创建的数据集上标准mAP指标(阈值≥0.5)的评估进行比较。在Highlights和Generic测试分区上,我们方法的最佳加权平均TIoU分别为0.9376和0.7145。

此外,我们将基线方法与3D分段CNN和时域递归网络(TRN)进行了比较,这些结果在THUMOS 2014数据集上具有最新水平。

更新日期:2020-05-29
down
wechat
bug