Dynamic Neural Representational Decoders for High-Resolution Semantic Segmentation,arXiv - CS - Computer Vision and Pattern Recognition

当前位置： X-MOL 学术 › arXiv.cs.CV › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Dynamic Neural Representational Decoders for High-Resolution Semantic Segmentation
arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2021-07-30 , DOI: arxiv-2107.14428
Bowen Zhang, Yifan Liu, Zhi Tian, Chunhua Shen

Semantic segmentation requires per-pixel prediction for a given image. Typically, the output resolution of a segmentation network is severely reduced due to the downsampling operations in the CNN backbone. Most previous methods employ upsampling decoders to recover the spatial resolution. Various decoders were designed in the literature. Here, we propose a novel decoder, termed dynamic neural representational decoder (NRD), which is simple yet significantly more efficient. As each location on the encoder's output corresponds to a local patch of the semantic labels, in this work, we represent these local patches of labels with compact neural networks. This neural representation enables our decoder to leverage the smoothness prior in the semantic label space, and thus makes our decoder more efficient. Furthermore, these neural representations are dynamically generated and conditioned on the outputs of the encoder networks. The desired semantic labels can be efficiently decoded from the neural representations, resulting in high-resolution semantic segmentation predictions. We empirically show that our proposed decoder can outperform the decoder in DeeplabV3+ with only 30% computational complexity, and achieve competitive performance with the methods using dilated encoders with only 15% computation. Experiments on the Cityscapes, ADE20K, and PASCAL Context datasets demonstrate the effectiveness and efficiency of our proposed method.

中文翻译：

用于高分辨率语义分割的动态神经表征解码器

语义分割需要对给定图像进行逐像素预测。通常，由于 CNN 主干中的下采样操作，分割网络的输出分辨率会严重降低。大多数以前的方法采用上采样解码器来恢复空间分辨率。文献中设计了各种解码器。在这里，我们提出了一种新颖的解码器，称为动态神经表征解码器 (NRD)，它简单但效率更高。由于编码器输出上的每个位置对应于语义标签的局部补丁，在这项工作中，我们用紧凑的神经网络表示这些局部标签补丁。这种神经表示使我们的解码器能够利用语义标签空间中的平滑先验，从而使我们的解码器更高效。此外，这些神经表示是动态生成的，并以编码器网络的输出为条件。可以从神经表示中有效地解码所需的语义标签，从而产生高分辨率的语义分割预测。我们凭经验表明，我们提出的解码器可以仅以 30% 的计算复杂度优于 DeeplabV3+ 中的解码器，并且与使用仅 15% 计算量的扩张编码器的方法相比，实现了具有竞争力的性能。Cityscapes、ADE20K 和 PASCAL Context 数据集的实验证明了我们提出的方法的有效性和效率。我们凭经验表明，我们提出的解码器可以仅以 30% 的计算复杂度优于 DeeplabV3+ 中的解码器，并且与使用仅 15% 计算量的扩张编码器的方法相比，实现了具有竞争力的性能。Cityscapes、ADE20K 和 PASCAL Context 数据集的实验证明了我们提出的方法的有效性和效率。我们凭经验表明，我们提出的解码器可以仅以 30% 的计算复杂度优于 DeeplabV3+ 中的解码器，并且与使用仅 15% 计算量的扩张编码器的方法相比，实现了具有竞争力的性能。Cityscapes、ADE20K 和 PASCAL Context 数据集的实验证明了我们提出的方法的有效性和效率。

更新日期：2021-08-02

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>