当前位置: X-MOL 学术Image Vis. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Semantic video segmentation with dynamic keyframe selection and distortion-aware feature rectification
Image and Vision Computing ( IF 4.7 ) Pub Date : 2021-04-18 , DOI: 10.1016/j.imavis.2021.104184
Mehwish Awan , Jitae Shin

The per-frame segmentation methods have a high computational cost, thereby, these methods are insufficient to cope with the fast inference need of semantic video segmentation. To efficaciously reuse the extracted features by feature propagation, in this paper, we present distortion-aware feature rectification and online selection of keyframes for fast and accurate video segmentation. The proposed dynamic keyframe scheduling scheme is based on the extent of temporal variations using reinforcement learning. We employ policy gradient reinforcement strategy to learn policy function for maximizing the expected reward. The policy network has two actions (key and non-key) in the action space. State information is derived from the element-wise difference frame of the current frame and the warped current frame generated by the propagated previous frame. Afterward, an adaptive partial feature rectification with distortion-aware corrections is performed for the warped features of the non-key frames. Precise feature propagation is a critical task to uphold the temporal updates in the video sequence since it enormously affects the accuracy as well as the throughput of the whole video analysis framework. The distorted feature maps are revised with the light-weight feature extractor by the guidance of the distortion map while the correctly propagated features are not influenced. Deep feature flow approach is adopted for feature propagation. We evaluate our scheme on the Cityscapes and CamVid datasets with DeepLabv3 as segmentation network and LiteFlowNet for computing flow fields. Experimental results show that the proposed method outperforms the previous state-of-the-art methods significantly both in terms of accuracy and throughput.



中文翻译:

具有动态关键帧选择和失真感知功能校正的语义视频分割

每帧分割方法具有很高的计算成本,因此,这些方法不足以应付语义视频分割的快速推断需求。为了通过特征传播有效地重用提取的特征,在本文中,我们提出了失真感知特征校正和关键帧的在线选择,以实现快速,准确的视频分割。所提出的动态关键帧调度方案基于使用强化学习的时间变化程度。我们采用策略梯度强化策略来学习策略功能以最大化期望的回报。策略网络在操作空间中有两个操作(关键和非关键)。状态信息是从当前帧的逐元素差异帧和由传播的前一帧生成的变形的当前帧派生的。之后,针对非关键帧的变形特征执行具有失真感知校正的自适应局部特征校正。精确的特征传播是维持视频序列中时间更新的一项关键任务,因为它极大地影响了整个视频分析框架的准确性和吞吐量。在不影响正确传播的特征的情况下,通过轻量级特征提取器在变形图的指导下对变形的特征图进行了修改。采用深度特征流方法进行特征传播。我们使用DeepLabv3作为细分网络,使用LiteFlowNet计算流场,对Cityscapes和CamVid数据集评估方案。

更新日期:2021-04-19
down
wechat
bug