Dirty Pixels: Towards End-to-end Image Processing and Perception,ACM Transactions on Graphics

当前位置： X-MOL 学术 › ACM Trans. Graph. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Dirty Pixels: Towards End-to-end Image Processing and Perception
ACM Transactions on Graphics ( IF 7.8 ) Pub Date : 2021-05-06 , DOI: 10.1145/3446918
Steven Diamond ₁ , Vincent Sitzmann ₂ , Frank Julca-Aguilar ₃ , Stephen Boyd ₁ , Gordon Wetzstein ₁ , Felix Heide ₄

Affiliation

Real-world, imaging systems acquire measurements that are degraded by noise, optical aberrations, and other imperfections that make image processing for human viewing and higher-level perception tasks challenging. Conventional cameras address this problem by compartmentalizing imaging from high-level task processing. As such, conventional imaging involves processing the RAW sensor measurements in a sequential pipeline of steps, such as demosaicking, denoising, deblurring, tone-mapping, and compression. This pipeline is optimized to obtain a visually pleasing image. High-level processing, however, involves steps such as feature extraction, classification, tracking, and fusion. While this silo-ed design approach allows for efficient development, it also dictates compartmentalized performance metrics without knowledge of the higher-level task of the camera system. For example, today’s demosaicking and denoising algorithms are designed using perceptual image quality metrics but not with domain-specific tasks such as object detection in mind. We propose an end-to-end differentiable architecture that jointly performs demosaicking, denoising, deblurring, tone-mapping, and classification (see Figure 1). The architecture does not require any intermediate losses based on perceived image quality and learns processing pipelines whose outputs differ from those of existing ISPs optimized for perceptual quality, preserving fine detail at the cost of increased noise and artifacts. We show that state-of-the-art ISPs discard information that is essential in corner cases, such as extremely low-light conditions, where conventional imaging and perception stacks fail. We demonstrate on captured and simulated data that our model substantially improves perception in low light and other challenging conditions, which is imperative for real-world applications such as autonomous driving, robotics, and surveillance. Finally, we found that the proposed model also achieves state-of-the-art accuracy when optimized for image reconstruction in low-light conditions, validating the architecture itself as a potentially useful drop-in network for reconstruction and analysis tasks beyond the applications demonstrated in this work. Our proposed models, datasets, and calibration data are available at https://github.com/princeton-computational-imaging/DirtyPixels .

中文翻译：

脏像素：走向端到端图像处理和感知

在现实世界中，成像系统获取的测量值会因噪声、光学像差和其他缺陷而退化，这使得用于人类观看和更高级别感知任务的图像处理具有挑战性。传统相机通过将成像与高级任务处理区分开来解决这个问题。因此，传统成像涉及在一系列步骤中处理 RAW 传感器测量值，例如去马赛克、去噪、去模糊、色调映射和压缩。该管道经过优化以获得视觉上令人愉悦的图像。然而，高级处理涉及特征提取、分类、跟踪和融合等步骤。虽然这种孤立的设计方法可以实现高效的开发，它还规定了在不了解相机系统的更高级别任务的情况下划分的性能指标。例如，今天的去马赛克和去噪算法是使用感知图像质量指标设计的，但没有考虑到特定领域的任务，例如目标检测。我们提出了一种端到端的可微架构，它联合执行去马赛克、去噪、去模糊、色调映射和分类（见图 1）。该架构不需要任何基于感知图像质量的中间损失，并且学习处理管道，其输出不同于针对感知质量优化的现有 ISP，以增加噪声和伪影为代价保留精细细节。我们表明，最先进的 ISP 会丢弃在极端情况下必不可少的信息，例如极低光照条件下，传统的成像和感知堆栈失效。我们通过捕获和模拟的数据证明，我们的模型显着改善了在弱光和其他具有挑战性的条件下的感知，这对于自动驾驶、机器人和监控等现实世界的应用来说是必不可少的。最后，我们发现所提出的模型在针对弱光条件下的图像重建进行优化时也达到了最先进的精度，验证了架构本身作为一个潜在有用的插入式网络，用于重建和分析任务之外的应用演示在这项工作中。我们提出的模型、数据集和校准数据可在我们通过捕获和模拟的数据证明，我们的模型显着改善了在弱光和其他具有挑战性的条件下的感知，这对于自动驾驶、机器人和监控等现实世界的应用来说是必不可少的。最后，我们发现所提出的模型在针对弱光条件下的图像重建进行优化时也达到了最先进的精度，验证了架构本身作为一个潜在有用的插入式网络，用于重建和分析任务之外的应用演示在这项工作中。我们提出的模型、数据集和校准数据可在我们通过捕获和模拟的数据证明，我们的模型显着改善了在弱光和其他具有挑战性的条件下的感知，这对于自动驾驶、机器人和监控等现实世界的应用来说是必不可少的。最后，我们发现所提出的模型在针对弱光条件下的图像重建进行优化时也达到了最先进的精度，验证了架构本身作为一个潜在有用的插入式网络，用于重建和分析任务之外的应用演示在这项工作中。我们提出的模型、数据集和校准数据可在我们发现，当针对低光条件下的图像重建进行优化时，所提出的模型也达到了最先进的精度，验证了架构本身作为一个潜在有用的插入式网络，用于重建和分析任务，超出了本文中展示的应用程序工作。我们提出的模型、数据集和校准数据可在我们发现，当针对低光条件下的图像重建进行优化时，所提出的模型也达到了最先进的精度，验证了架构本身作为一个潜在有用的插入式网络，用于重建和分析任务，超出了本文中展示的应用程序工作。我们提出的模型、数据集和校准数据可在https://github.com/princeton-computational-imaging/DirtyPixels.

更新日期：2021-05-06

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11