Temporally Distributed Networks for Fast Video Semantic Segmentation,arXiv - CS - Multimedia

当前位置： X-MOL 学术 › arXiv.cs.MM › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Temporally Distributed Networks for Fast Video Semantic Segmentation
arXiv - CS - Multimedia Pub Date : 2020-04-03 , DOI: arxiv-2004.01800
Ping Hu, Fabian Caba Heilbron, Oliver Wang, Zhe Lin, Stan Sclaroff and Federico Perazzi

We present TDNet, a temporally distributed network designed for fast and accurate video semantic segmentation. We observe that features extracted from a certain high-level layer of a deep CNN can be approximated by composing features extracted from several shallower sub-networks. Leveraging the inherent temporal continuity in videos, we distribute these sub-networks over sequential frames. Therefore, at each time step, we only need to perform a lightweight computation to extract a sub-features group from a single sub-network. The full features used for segmentation are then recomposed by application of a novel attention propagation module that compensates for geometry deformation between frames. A grouped knowledge distillation loss is also introduced to further improve the representation power at both full and sub-feature levels. Experiments on Cityscapes, CamVid, and NYUD-v2 demonstrate that our method achieves state-of-the-art accuracy with significantly faster speed and lower latency.

中文翻译：

用于快速视频语义分割的时间分布式网络

我们提出了 TDNet，这是一种为快速准确的视频语义分割而设计的时间分布式网络。我们观察到，可以通过组合从几个较浅的子网络中提取的特征来近似从深度 CNN 的某个高层提取的特征。利用视频中固有的时间连续性，我们将这些子网络分布在连续帧上。因此，在每个时间步，我们只需要执行一次轻量级计算就可以从单个子网络中提取一个子特征组。然后通过应用一种新颖的注意力传播模块来重新组合用于分割的完整特征，该模块补偿帧之间的几何变形。还引入了分组知识蒸馏损失，以进一步提高完整和子特征级别的表示能力。

更新日期：2020-04-08

点击分享查看原文

点击收藏

阅读更多本刊最新论文