当前位置: X-MOL 学术Int. J. Comput. Vis. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Just Recognizable Distortion for Machine Vision Oriented Image and Video Coding
International Journal of Computer Vision ( IF 19.5 ) Pub Date : 2021-08-13 , DOI: 10.1007/s11263-021-01505-4
Qi Zhang 1 , Shanshe Wang 1 , Siwei Ma 1 , Wen Gao 1, 2 , Xinfeng Zhang 3
Affiliation  

Machine visual intelligence has exploded in recent years. Large-scale, high-quality image and video datasets significantly empower learning-based machine vision models, especially deep-learning models. However, images and videos are usually compressed before being analyzed in practical situations where transmission or storage is limited, leading to a noticeable performance loss of vision models. In this work, we broadly investigate the impact on the performance of machine vision from image and video coding. Based on the investigation, we propose Just Recognizable Distortion (JRD) to present the maximum distortion caused by data compression that will reduce the machine vision model performance to an unacceptable level. A large-scale JRD-annotated dataset containing over 340,000 images is built for various machine vision tasks, where the factors for different JRDs are studied. Furthermore, an ensemble-learning-based framework is established to predict the JRDs for diverse vision tasks under few- and non-reference conditions, which consists of multiple binary classifiers to improve the prediction accuracy. Experiments prove the effectiveness of the proposed JRD-guided image and video coding to significantly improve compression and machine vision performance. Applying predicted JRD is able to achieve remarkably better machine vision task accuracy and save a large number of bits.



中文翻译:

面向机器视觉的图像和视频编码的可识别失真

近年来,机器视觉智能呈爆炸式增长。大规模、高质量的图像和视频数据集显着增强了基于学习的机器视觉模型,尤其是深度学习模型。然而,在传输或存储受限的实际情况下,图像和视频在分析之前通常会被压缩,从而导致视觉模型的明显性能损失。在这项工作中,我们广泛研究了图像和视频编码对机器视觉性能的影响。基于调查,我们提出了 Just Recognizable Distortion (JRD) 来呈现由数据压缩引起的最大失真,这会将机器视觉模型的性能降低到不可接受的水平。为各种机器视觉任务构建了一个包含超过 340,000 张图像的大规模 JRD 注释数据集,其中研究了不同 JRD 的因素。此外,建立了一个基于集成学习的框架,用于在很少和非参考条件下预测各种视觉任务的 JRD,该框架由多个二元分类器组成,以提高预测精度。实验证明了所提出的 JRD 引导的图像和视频编码的有效性,可显着提高压缩和机器视觉性能。应用预测的 JRD 能够显着提高机器视觉任务的准确性并节省大量比特。实验证明了所提出的 JRD 引导的图像和视频编码的有效性,可显着提高压缩和机器视觉性能。应用预测的 JRD 能够显着提高机器视觉任务的准确性并节省大量比特。实验证明了所提出的 JRD 引导的图像和视频编码的有效性,可显着提高压缩和机器视觉性能。应用预测的 JRD 能够显着提高机器视觉任务的准确性并节省大量比特。

更新日期:2021-08-13
down
wechat
bug