End-to-end Compression Towards Machine Vision: Network Architecture Design and Optimization,arXiv - CS - Multimedia

当前位置： X-MOL 学术 › arXiv.cs.MM › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

End-to-end Compression Towards Machine Vision: Network Architecture Design and Optimization
arXiv - CS - Multimedia Pub Date : 2021-07-01 , DOI: arxiv-2107.00328
Shurun Wang, Zhao Wang, Shiqi Wang, Yan Ye

The research of visual signal compression has a long history. Fueled by deep learning, exciting progress has been made recently. Despite achieving better compression performance, existing end-to-end compression algorithms are still designed towards better signal quality in terms of rate-distortion optimization. In this paper, we show that the design and optimization of network architecture could be further improved for compression towards machine vision. We propose an inverted bottleneck structure for end-to-end compression towards machine vision, which specifically accounts for efficient representation of the semantic information. Moreover, we quest the capability of optimization by incorporating the analytics accuracy into the optimization process, and the optimality is further explored with generalized rate-accuracy optimization in an iterative manner. We use object detection as a showcase for end-to-end compression towards machine vision, and extensive experiments show that the proposed scheme achieves significant BD-rate savings in terms of analysis performance. Moreover, the promise of the scheme is also demonstrated with strong generalization capability towards other machine vision tasks, due to the enabling of signal-level reconstruction.

中文翻译：

面向机器视觉的端到端压缩：网络架构设计与优化

视觉信号压缩的研究历史悠久。在深度学习的推动下，最近取得了令人兴奋的进展。尽管实现了更好的压缩性能，但现有的端到端压缩算法在速率失真优化方面仍旨在实现更好的信号质量。在本文中，我们表明可以进一步改进网络架构的设计和优化，以实现面向机器视觉的压缩。我们提出了一种面向机器视觉的端到端压缩的倒置瓶颈结构，它专门考虑了语义信息的有效表示。此外，我们通过将分析准确性纳入优化过程来寻求优化能力，并且以迭代的方式通过广义速率-准确度优化进一步探索最优性。我们使用对象检测作为面向机器视觉的端到端压缩的展示，大量实验表明，所提出的方案在分析性能方面实现了显着的 BD-rate 节省。此外，由于启用了信号级重建，该方案还具有对其他机器视觉任务的强大泛化能力。

更新日期：2021-07-02

点击分享查看原文

点击收藏

阅读更多本刊最新论文