Elsevier

Signal Processing

Volume 189, December 2021, 108278
Signal Processing

Detail-enhanced image inpainting based on discrete wavelet transforms

https://doi.org/10.1016/j.sigpro.2021.108278Get rights and content

Highlights

  • The contents and textures of an image to be inpainted are separately generated by a two-parallel-branch network.

  • A multi-level fusion module is proposed to improve the network capability in semantic understanding.

  • A spatially discounted mask is designed to evaluate the roles of missing pixels with different importance.

Abstract

Deep-learning-based method has made great breakthroughs in image inpainting by generating visually plausible contents with reasonable semantic meaning. However, existing deep learning methods still suffer from distorted structures or blurry textures. To mitigate this problem, completing semantic structure and enhancing textural details should be considered simultaneously. To this end, we propose a two-parallel-branch completion network, where the first branch fills semantic content in spatial domain, and the second branch helps to generate high-frequency details in wavelet domain. To reconstruct an inpainted image, the output of the first branch is also decomposed by discrete wavelet transform, and the resulting low-frequency wavelet subband is used jointly with the output of the second branch. In addition, for improving the network capability in semantic understanding, a multi-level fusion module (MLFM) is designed in the first branch to enlarge the receptive field. Furthermore, drawing lessons from some traditional exemplar-based inpainting methods, we develop a free-form spatially discounted mask (SD-mask) to assign different importance priorities for the missing pixels based on their positions, enabling our method to handle missing regions with arbitrary shapes. Extensive experiments on several public datasets demonstrate that the proposed approach outperforms current state-of-the-art ones. The codes are public available at https://github.com/media-sec-lab/DWT_Inpainting.

Introduction

Image inpainting [1] is a kind of delicate image processing technique that reconstructs the lost or deteriorated parts within images so as to improve the visual quality. This technology can be used in many applications, such as image editing, old photo restoration, etc. As a kind of imaging inverse problems, image inpainting can be performed by using some model-based image restoration methods [2], [3]. And, in the past two decades, great progress has been achieved in image inpainting through various types of tailored approaches, for example, diffusion-based ones [1], [4], [5], exemplar-based ones [6], [7], [8], and deep-learning-based ones [9], [10], [11], [12], [13], [14], [15]. Different from the conventional approaches that try to propagate known information or find similar or patches within the defective image to fill the missing parts, deep-learning-based approaches learn high-level deep feature representation from training data and complete the missing regions with reasonable structures and textures. As a consequence, deep inpainting approaches can achieve amazing visual effects. Typically, an encoder-decoder structure [16] based on Convolutional Neural Network (CNN) and a Generative Adversarial Network (GAN) mechanism [17] are working together to perform the deep inpainting task. In specific, a generation network constructed in an encoder-decoder structure equipped with a well-designed training loss function is used for missing area completion, while a discriminator is employed for providing adversarial loss to ensure the inpainted images have indistinguishable visual appearance compared to pristine images.

In general, both reasonable semantic contents and fine details are needed to be synthesized when performing inpainting. Some methods [9], [10], [11], [12], [13] solve these two goals in a single network, and some [14], [15] use two serial networks, i.e., a coarse network and a refinement network, for dealing with coarse and fine contents, respectively. However, they still suffer from distorted structures and/or blurry textures, implying that a different inpainting architecture may be needed. In this paper, we address the above mentioned problem by designing a deep inpainting architecture based on a two-parallel-branch completion network, namely a content network and a texture network. The content branch fills semantic content in spatial domain, and the texture branch generates high-frequency details in wavelet domain. In specific, the content branch with a U-net structure takes an image with missing parts as input and outputs the spatially inpainted image. A multi-level fusion module based on dilated gated convolution [15] is proposed to expand the network receptive field so as to improve its capability in semantic understanding. On the other hand, the texture branch takes the discrete wavelet transform (DWT) high-frequency subbands as input and processes the high-frequency part of the inpainted image in wavelet domain. In this way, the first branch can focus more on the semantic contents while the second one can learn better image textural details. To synchronize the outputs of these two branches, the low-frequency wavelet subband of the spatially inpainted image from the content branch and the high-frequency wavelet subbands from the texture branch jointly reconstruct an inpainted image through inverse discrete wavelet transform (IDWT). Furthermore, drawing lessons from some traditional exemplar-based inpainting methods where the missing pixels closer to the known areas have higher inpainting priorities, we develop a spatially discounted mask (SD-mask) for arbitrary missing shape in the loss function to evaluate the roles of missing pixels with different importance. Hence, the missing pixels on the boundaries have higher impact for the loss so that they can be recovered better to make the boundary less abrupt.

The contributions of this paper are summarized as follows.

  • We propose a two-parallel-branch network to complete image structure and fill high-frequency details based on DWT, which can produce reasonable and sharp image contents.

  • We design a multi-level fusion module based on dilated gated convolution to expand the receptive field of the content branch, enabling the network to learn image semantic contents at different scales.

  • We develop the form of spatially discounted mask to evaluate the roles of missing pixels with different importance, which can be applied to missing areas with arbitrary shapes.

Section snippets

Related works

The existing inpainting works can be categorized into two types. The first type, developed with the traditional paradigms, adopts the known information within the given image to fill lost contents at image pixel/patch level. The second type, exploiting the outstanding learning capability of CNN, predicts and fills the missing contents at feature level.

The traditional inpainting works include diffusion-based [1], [4], [5], sparsity-based [3], [18], [19], and exemplar-based approaches [6], [7],

Methodology

As depicted in Fig. 1, the proposed image inpainting framework consists of a completion network G and a discriminator network D, and each network is comprised of two parts. G consists of a content branch Gcon and a texture branch Gtxt, while D consists of a global discriminator Dglb and a local discriminator Dloc. In the training phase, both G and D work together, while in the deployment phase, only G is employed for image completion.

Given a damaged image Im, which can be considered as a

Datasets

We evaluated the proposed method on five datasets, i.e., CelebA-HQ [30], Describable Textures Dataset (DTD) [31], Facade [32], Paris Street View (PSV) [9] and Places2 [33]. We divided the first three datasets into training, validation, and testing sets with the portions of 70%, 10%, and 20%. The PSV dataset has already been divided into a training set and a testing set, so we randomly selected 90% of the training data for training and used the rest 10% for validation. The Places2 dataset has

Conclusion

In this paper, we proposed a novel detail-enhanced image inpainting method based on DWT. Specifically, given a damaged image, a content branch is used to fill semantic content in spatial domain and a texture branch is adopted to generate high-frequency details in wavelet domain. The output of both branches are combined to reconstruct an inpainted image via IDWT. To improve the capability in semantic understanding, a multi-level fusion module is designed to enlarge the receptive field of the

CRediT authorship contribution statement

Bin Li: Conceptualization, Methodology, Formal analysis, Writing – original draft. Bowei Zheng: Software, Investigation, Data curation, Writing – original draft. Haodong Li: Conceptualization, Methodology, Validation, Writing – review & editing. Yanran Li: Resources, Writing – review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was supported in part by NSFC under Grants 61802262 and 61872244, Guangdong Basic and Applied Basic Research Foundation under Grant 2019B151502001, and Shenzhen R&D Program under Grants JCYJ20200109105008228 and JCYJ20180305124325555.

References (39)

  • S. Iizuka et al.

    Globally and locally consistent image completion

    ACM Trans. Graph.

    (2017)
  • G. Liu et al.

    Image inpainting for irregular holes using partial convolutions

    Proceedings of the European Conference on Computer Vision

    (2018)
  • C. Zheng et al.

    Pluralistic image completion

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    (2019)
  • Y. Zeng et al.

    Learning pyramid-context encoder network for high-quality image inpainting

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    (2019)
  • J. Yu et al.

    Generative image inpainting with contextual attention

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    (2018)
  • J. Yu et al.

    Free-form image inpainting with gated convolution

    Proceedings of the IEEE International Conference on Computer Vision

    (2019)
  • G.E. Hinton et al.

    Autoencoders, minimum description length and Helmholtz free energy

    Proceedings of the Conference Neural Information Processing Systems

    (1994)
  • I. Goodfellow et al.

    Generative adversarial networks

    Commun. ACM

    (2020)
  • F. Li et al.

    A universal variational framework for sparsity-based image inpainting

    IEEE Trans. Image Process.

    (2014)
  • Cited by (12)

    • Data-augmented wavelet capsule generative adversarial network for rolling bearing fault diagnosis

      2022, Knowledge-Based Systems
      Citation Excerpt :

      It is evident that the discrete wavelet transform avoids the massive information redundancy resulting from the continuous change of scale and shift compared to the CWT, which provides sufficient information and saves computation time. The DWT characterizes the signal accurately through wavelet basis functions transformed at different scales and focuses continuously on arbitrary signal details [33,34]. The Capsule network [35] was proposed by Hinton, whose inputs and outputs are vectors.

    • Deep Learning-Based Image and Video Inpainting: A Survey

      2024, International Journal of Computer Vision
    • Divide-and-Conquer Completion Network for Video Inpainting

      2023, IEEE Transactions on Circuits and Systems for Video Technology
    View all citing articles on Scopus
    View full text