Rate–distortion optimization of multi-exposure image coding for high dynamic range image coding

https://doi.org/10.1016/j.image.2021.116238Get rights and content

Highlights

  • A scheme for efficient multi-exposure images coding is proposed.

  • Modified rate-distortion optimization for multi-exposure image coding is presented.

  • The proposed scheme produces both high-quality HDR and LDR images in the decoder.

Abstract

High dynamic range (HDR) images have many practical applications because they offer an extended dynamic range and a more realistic visual experience. A HDR image is usually stored in floating-point format, so pre-processing is required to make the HDR image compatible with coding standards. A transfer function is also used to achieve better coding efficiency. Typically, HDR images are generated using several low dynamic range (LDR) images with different exposures. Instead of compressing the HDR image when it is generated from images with multiple exposures, this study proposes a technique to compress the multi-exposure images. The HDR image generation, as well as the multi-exposure images fusion, can be realized in the decoder. The proposed framework encodes the multi-exposure images using MV-HEVC where the inter-view redundancy is well exploited when an accurate intensity-mapping function between the multi-exposure images has been established. Multi-exposure image coding is used to produce a high-quality HDR image so the rate–distortion optimization (RDO) is modified by considering both the reconstruction quality of the current block and its effect on the multi-exposure fused image. A Lagrange multiplier is modified to maintain a balance between the rate and the modified distortion during the RDO process. Compared to encoding the generated HDR image using HEVC range extension, the experimental results show that the proposed technique achieves significant bitrate savings for equivalent quality in terms of HDR-VDP-2.

Introduction

In past decades, significant progress has been made in image capture, in terms of resolution and quality. Ultra HD (High Definition) images, such as 4K (3840×2160), are common. To allow more realistic visual perception, particularly for home entertainment, a higher resolution and a wider dynamic range are necessary. The dynamic range of a conventional camera and the display is much narrower than that of the human eye. The dynamic range of light that is perceived by the human eye with adaption is 14 orders of magnitude, from 10−6 to 108 cd/m2 [1]. Conventional images are usually captured and then saved in a format that uses 8–14 bits for each color channel. For a common traditional display, the support brightness is between 0.3 and 300 cd/m2. This is not wide enough to simulate reality so high dynamic range (HDR) image/video, which allows an enhanced visual experience by giving a much wider range of luminance [1], [2], has been the subject of many studies. HDR and wide color gamut (WCG) video coding have been a focus for the JCT-VC (Joint Collaborative Team on Video Coding) [3], which established the HEVC (High Efficiency Video Coding) [4] standard.

Several tools are used for HDR video in HEVC, including the signaling of the HDR video format, the definition of SEI message that is used to display HDR content and the high bit-depth coding. The scalable extension of HEVC, called SHVC (Scalable High efficient Video Coding) [5], includes bit-depth scalability to support backward compatibility with SDR (Standard Dynamic Range) content [6]. MPEG has also incorporated HDR and WCG video coding since November 2013 and there have been recent developments in HDR video compression within MPEG.

One study [7] reports results regarding the call for evidence (CfE) related to HDR and WCG coding. Two types of solutions are proposed: a single-layer scheme and a SDR-backward-compatible scheme. Currently, HDR and wide color gamut (WCG) video coding are also supported by HEVC range extension. The JVET (Joint Video Exploration Team) [8] has also launched a standard for future video coding and views HDR video coding as an important technology.

A HDR image has a wider dynamic range so it requires much more data than a LDR image. The data format and coding technique are also different. A number of formats have been developed to store HDR images in the floating-point format to y represent HDR images: Radiance RGBE [9], LogLuv TIFF [10] and OpenEXR [11]. Current video and image coding standards are suited to compressing HDR images but state-of-art video and image coding standards, such as H.264/AVC [12], HEVC and JPEG, support high bit-depth video and images with integer format, so the HDR image/video must be converted into an integer format before encoding. Perceptual transfer is a non-linear mapping method [13], [14], [15] that mimics the human visual and then quantization produces the integer version of the HDR image. One image coding standard, JPEG-XT [16], supports floating-point HDR image coding. This standard is backward compatible with the legacy JPEG standard.

HDR cameras are not yet widely available and HDR images are usually synthesized using several low dynamic range (LDR) images with different exposures. Many algorithms have been proposed to produce high-quality HDR images using multi-exposure LDR images.

There are two methods of generating HDR images from multi-exposure LDR images. One method identifies the relationship between the luminance intensities and the exposure for several LDR images and a non-linear function, namely the camera response function (CRF), is constructed [17], [18], [19] An irradiance map is then obtained and this served as the HDR image. The second method uses a fusion-based HDR image synthesis [20], [21], [22], [23], [24]. The characteristics of each LDR image are determined and an appropriate weight value is assigned to each pixel in the LDR image. The multi-exposure LDR images are then fused to generate a LDR image with enhanced contrast, particularly in the over-saturated and under-saturated regions. One study [20] proposes a pixel-based fusion technique that uses contrast, saturation and well-exposedness. A patch-wise approach has been proposed for the fusion of images with multiple exposures [23]. This decomposes each patch in each image into three components and a fused patch is obtained by computing the representative components, which are then integrated. Another generates a HDR image from a single LDR [25], [26], [27], [28]. A study initially estimates pseudo-exposure images [25] and deep-learning-based architectures [26], [27], [28] are reported to achieve acceptable performance.

Fig. 1 shows the capture and display of HDR images. HDR images with a floating-point format can be displayed on the HDR monitor. If only a LDR device is available, tone mapping [29], [30], [31] is used to transfer the HDR images into LDR images. Generally, the quality of LDR images is determined by the quality of the HDR image and by the tone mapping technique that is used [32]. Quality assessment of a tone-mapped image is presented in one study [33]. For multi-exposure images, there are two possible image results: a multi-exposure fused (MEF) LDR image that is displayed on the LDR image and a HDR image that is generated using the HDR imaging technique.

This study proposes a technique to encode multi-exposure images using Multi-view HEVC (MV-HEVC) [34] because a HDR image is usually generated using multi-exposure images. The properties of these images with multiple exposures are explored to ensure good coding performance. LDR images are captured at different exposures so good coding performance is achievable if these images are encoded using multi-view coding architecture with accurate inter-view prediction.

This study first addresses the variation in luminance among multi-exposure images. Multi-exposure images are coded to produce a high-quality HDR image so rate–distortion optimization (RDO) [35] is modified by considering the reconstruction quality of the current view and its effect on the MEF image. The proposed technique allows the viewer to watch a HDR or LDR image, depending on the display device that is used. In the decoder, the HDR image is reconstructed using the CRF and is displayed on a HDR monitor. Multi-exposure fusion can be also used to generate LDR images with enhanced contrast if only a conventional display is available. The proposed technique makes three main contributions to this field of study:

  • 1.

    A coding scheme for multi-exposure image coding is proposed that outperforms the conventional method, which encodes the generated HDR image using HEVC.

  • 2.

    Rate–distortion optimization for the multi-exposure images coding is presented. The distortion is modified by considering also the distortion in the MEF image. The corresponding Lagrange multiplier is also theoretically derived.

  • 3.

    The proposed scheme produces a high-quality HDR image and a LDR image with enhanced contrast in the decoder.

The remainder of this paper is organized as follows. Related works are reviewed in Section 2. The proposed scheme is described in Section 3 and the experimental results are presented in Section 4.

Section snippets

Related works

There are two methods to compress a HDR image: single-layer coding includes the side metadata but backward compatibility is not guaranteed for legacy SDR devices and the scalable coding, which supports backward compatibility and allows both a SDR and a HDR image output. The related works for both methods are reviewed in the following.

Proposed framework

The multi-exposure images are not used for display. They are fused into a HDR image or a contrast-enhanced LDR image. Some strategies allow the compression of these images to produce a high-quality HDR image using MV-HEVC [55], [56]. Fig. 2 shows the framework for the proposed scheme. The stack of LDR images with various exposures is encoded using MV-HEVC. Each LDR image is treated as a view image and the intensity mapping function (IMF) between views is computed and used to enhance the

Results and discussion

The proposed coding scheme is implemented in MV-HEVC and the coding order is shown in Fig. 3. The multi-exposure images are originally in RGB format and are converted into YCbCr 420 format before encoding using the procedures mentioned in [57]. When the non-base view is encoded, the intensity of the reference view is converted and the new RDO is used for the luminance component. The IMF is stored as a look-up table (LUT) with 256 elements, each of which contains an 8-bit integer. Two LUTs are

Conclusions

This study proposes a framework to encode images with multiple exposures. The multi-view image coding scheme encodes each LDR image as a view image. A high-quality HDR image must be generated in the decoder so the distortion in the multi-exposure fused image is used to encode the non-base view. The Lagrange multiplier is modified to maintain a balance between the rate and the modified distortion during the RDO process.

The experimental results show that the proposed scheme performs well in terms

CRediT authorship contribution statement

Jui-Chiu Chiang: Conceptualization, Methodology, Visualization, Investigation, Writing - original draft, Writing - review & editing, Supervision. Wen-Hsien Shih: Software, Data curation, Validation, Formal analysis. Jhih-You Deng: Software, Data curation, Validation, Formal analysis.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (62)

  • L. Kerofsky, Y. Ye, Y. He, Recent developments from MPEG in HDR video compression, in: Proc. IEEE International...
  • ...
  • WardG.

    The logluv encoding for full gamut, high dynamic range images

    J. Graph. Tools

    (1998)
  • KainzF. et al.

    The OpenEXR image file format, in SIGGRAPH Technical Sketches

    (2003)
  • ...
  • MillerS. et al.

    Perceptual signal coding for more efficient usages of bit codes

    SMPTE Motion Imaging J.

    (2013)
  • MantiukR. et al.

    Perception-motivated high dynamic range video encoding

    ACM Trans. Graph.

    (2004)
  • R. Mantiuk, K. Myszkowski, H. Seidel, Lossy compression of high dynamic range images and video, in: Proc. SPIE 6057,...
  • ...
  • E. Debevec, J. Malik, Recovering high dynamic range radiance maps from photographs, in Proc. SIGGRAPH 97, 1997, pp....
  • GrossbergM.D. et al.

    Determine the camera response from images: what is knowable?

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2003)
  • OğuzAkyüzA. et al.

    A reality check for radiometric camera response recovery algorithms

    Comput. Graph.

    (2013)
  • T. Mertens, J. Kautz, F. Van Reeth, Exposure fusion, in: Proc. Pacific Conf. on Computer Graphics and Applications,...
  • ShenR. et al.

    Generalized random walks for fusion of multi-exposure images

    IEEE Trans. Image Process.

    (2011)
  • K. Ma, Z. Wang, Multi-exposure image fusion: a patch-wise approach, in: Proc. of IEEE International Conference on Image...
  • SongM. et al.

    Probabilistic exposure fusion

    IEEE Trans. Image Process.

    (2012)
  • WangT.-H. et al.

    Pseudo-multiple-exposure-based tone fusion with local region adjustment

    IEEE Trans. Multimedia

    (2015)
  • EilertsenG. et al.

    HDR image reconstruction from a single exposure using deep CNNs

    ACM Trans. Graph.

    (2017)
  • EndoY. et al.

    Deep reverse tone mapping

    ACM Trans. Graph.

    (2017)
  • S. Ning, H. Xu, L. Song, R. Xie, W. Zhang, Learning an inverse tone mapping network with a generative adversarial...
  • ReinhardE. et al.

    Photographic tone reproduction for digital images

    ACM TOG

    (2002)
  • Cited by (2)

    View full text