Detail-enhanced image inpainting based on discrete wavelet transforms

doi:10.1016/j.sigpro.2021.108278

Signal Processing

Volume 189, December 2021, 108278

https://doi.org/10.1016/j.sigpro.2021.108278 Get rights and content

Highlights

•
The contents and textures of an image to be inpainted are separately generated by a two-parallel-branch network.
•
A multi-level fusion module is proposed to improve the network capability in semantic understanding.
•
A spatially discounted mask is designed to evaluate the roles of missing pixels with different importance.

Abstract

Deep-learning-based method has made great breakthroughs in image inpainting by generating visually plausible contents with reasonable semantic meaning. However, existing deep learning methods still suffer from distorted structures or blurry textures. To mitigate this problem, completing semantic structure and enhancing textural details should be considered simultaneously. To this end, we propose a two-parallel-branch completion network, where the first branch fills semantic content in spatial domain, and the second branch helps to generate high-frequency details in wavelet domain. To reconstruct an inpainted image, the output of the first branch is also decomposed by discrete wavelet transform, and the resulting low-frequency wavelet subband is used jointly with the output of the second branch. In addition, for improving the network capability in semantic understanding, a multi-level fusion module (MLFM) is designed in the first branch to enlarge the receptive field. Furthermore, drawing lessons from some traditional exemplar-based inpainting methods, we develop a free-form spatially discounted mask (SD-mask) to assign different importance priorities for the missing pixels based on their positions, enabling our method to handle missing regions with arbitrary shapes. Extensive experiments on several public datasets demonstrate that the proposed approach outperforms current state-of-the-art ones. The codes are public available at https://github.com/media-sec-lab/DWT_Inpainting.

Introduction

Image inpainting [1] is a kind of delicate image processing technique that reconstructs the lost or deteriorated parts within images so as to improve the visual quality. This technology can be used in many applications, such as image editing, old photo restoration, etc. As a kind of imaging inverse problems, image inpainting can be performed by using some model-based image restoration methods [2], [3]. And, in the past two decades, great progress has been achieved in image inpainting through various types of tailored approaches, for example, diffusion-based ones [1], [4], [5], exemplar-based ones [6], [7], [8], and deep-learning-based ones [9], [10], [11], [12], [13], [14], [15]. Different from the conventional approaches that try to propagate known information or find similar or patches within the defective image to fill the missing parts, deep-learning-based approaches learn high-level deep feature representation from training data and complete the missing regions with reasonable structures and textures. As a consequence, deep inpainting approaches can achieve amazing visual effects. Typically, an encoder-decoder structure [16] based on Convolutional Neural Network (CNN) and a Generative Adversarial Network (GAN) mechanism [17] are working together to perform the deep inpainting task. In specific, a generation network constructed in an encoder-decoder structure equipped with a well-designed training loss function is used for missing area completion, while a discriminator is employed for providing adversarial loss to ensure the inpainted images have indistinguishable visual appearance compared to pristine images.

In general, both reasonable semantic contents and fine details are needed to be synthesized when performing inpainting. Some methods [9], [10], [11], [12], [13] solve these two goals in a single network, and some [14], [15] use two serial networks, i.e., a coarse network and a refinement network, for dealing with coarse and fine contents, respectively. However, they still suffer from distorted structures and/or blurry textures, implying that a different inpainting architecture may be needed. In this paper, we address the above mentioned problem by designing a deep inpainting architecture based on a two-parallel-branch completion network, namely a content network and a texture network. The content branch fills semantic content in spatial domain, and the texture branch generates high-frequency details in wavelet domain. In specific, the content branch with a U-net structure takes an image with missing parts as input and outputs the spatially inpainted image. A multi-level fusion module based on dilated gated convolution [15] is proposed to expand the network receptive field so as to improve its capability in semantic understanding. On the other hand, the texture branch takes the discrete wavelet transform (DWT) high-frequency subbands as input and processes the high-frequency part of the inpainted image in wavelet domain. In this way, the first branch can focus more on the semantic contents while the second one can learn better image textural details. To synchronize the outputs of these two branches, the low-frequency wavelet subband of the spatially inpainted image from the content branch and the high-frequency wavelet subbands from the texture branch jointly reconstruct an inpainted image through inverse discrete wavelet transform (IDWT). Furthermore, drawing lessons from some traditional exemplar-based inpainting methods where the missing pixels closer to the known areas have higher inpainting priorities, we develop a spatially discounted mask (SD-mask) for arbitrary missing shape in the loss function to evaluate the roles of missing pixels with different importance. Hence, the missing pixels on the boundaries have higher impact for the loss so that they can be recovered better to make the boundary less abrupt.

The contributions of this paper are summarized as follows.

•
We propose a two-parallel-branch network to complete image structure and fill high-frequency details based on DWT, which can produce reasonable and sharp image contents.
•
We design a multi-level fusion module based on dilated gated convolution to expand the receptive field of the content branch, enabling the network to learn image semantic contents at different scales.
•
We develop the form of spatially discounted mask to evaluate the roles of missing pixels with different importance, which can be applied to missing areas with arbitrary shapes.

Section snippets

Related works

The existing inpainting works can be categorized into two types. The first type, developed with the traditional paradigms, adopts the known information within the given image to fill lost contents at image pixel/patch level. The second type, exploiting the outstanding learning capability of CNN, predicts and fills the missing contents at feature level.

The traditional inpainting works include diffusion-based [1], [4], [5], sparsity-based [3], [18], [19], and exemplar-based approaches [6], [7],

Methodology

As depicted in Fig. 1, the proposed image inpainting framework consists of a completion network G and a discriminator network D, and each network is comprised of two parts. G consists of a content branch $G_{c o n}$ and a texture branch $G_{t x t}$ , while D consists of a global discriminator $D_{g l b}$ and a local discriminator $D_{l o c}$ . In the training phase, both G and D work together, while in the deployment phase, only G is employed for image completion.

Given a damaged image $I_{m}$ , which can be considered as a

Datasets

We evaluated the proposed method on five datasets, i.e., CelebA-HQ [30], Describable Textures Dataset (DTD) [31], Facade [32], Paris Street View (PSV) [9] and Places2 [33]. We divided the first three datasets into training, validation, and testing sets with the portions of 70%, 10%, and 20%. The PSV dataset has already been divided into a training set and a testing set, so we randomly selected 90% of the training data for training and used the rest 10% for validation. The Places2 dataset has

Conclusion

In this paper, we proposed a novel detail-enhanced image inpainting method based on DWT. Specifically, given a damaged image, a content branch is used to fill semantic content in spatial domain and a texture branch is adopted to generate high-frequency details in wavelet domain. The output of both branches are combined to reconstruct an inpainted image via IDWT. To improve the capability in semantic understanding, a multi-level fusion module is designed to enlarge the receptive field of the

CRediT authorship contribution statement

Bin Li: Conceptualization, Methodology, Formal analysis, Writing – original draft. Bowei Zheng: Software, Investigation, Data curation, Writing – original draft. Haodong Li: Conceptualization, Methodology, Validation, Writing – review & editing. Yanran Li: Resources, Writing – review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was supported in part by NSFC under Grants 61802262 and 61872244, Guangdong Basic and Applied Basic Research Foundation under Grant 2019B151502001, and Shenzhen R&D Program under Grants JCYJ20200109105008228 and JCYJ20180305124325555.

References (39)

L. He et al.
Single image restoration through $ℓ 2$ -relaxed truncated $ℓ 0$ analysis-based sparse optimization in tight frames
Neurocomputing
(2021)
L. He et al.
A support-denoiser-driven framework for single image restoration
J. Comput. Appl. Math.
(2021)
T.F. Chan et al.
Nontexture inpainting by curvature-driven diffusions
J. Vis. Commun. Image Represent.
(2001)
J. Wang et al.
Image inpainting based on multi-frequency probabilistic inference model
Proceedings of the ACM International Conference on Multimedia
(2020)
M. Bertalmio et al.
Image inpainting
Proceedings of the Annual Conference on Computer Graphics and Interactive Techniques
(2000)
A. Levin et al.
Learning how to inpaint from global image statistics
Proceedings of the IEEE International Conference on Computer Vision
(2003)
A. Criminisi et al.
Region filling and object removal by exemplar-based image inpainting
IEEE Trans. Image Process.
(2004)
T. Ružić et al.
Context-aware patch-based image inpainting using Markov random field modeling
IEEE Trans. Image Process.
(2014)
C. Barnes et al.
Patchmatch: a randomized correspondence algorithm for structural image editing
ACM Trans. Graph.
(2009)
D. Pathak et al.
Context encoders: feature learning by inpainting
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
(2016)

S. Iizuka et al.

Globally and locally consistent image completion

ACM Trans. Graph.

(2017)

G. Liu et al.

Image inpainting for irregular holes using partial convolutions

Proceedings of the European Conference on Computer Vision

(2018)

C. Zheng et al.

Pluralistic image completion

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

(2019)

Y. Zeng et al.

Learning pyramid-context encoder network for high-quality image inpainting

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

(2019)

J. Yu et al.

Generative image inpainting with contextual attention

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

(2018)

J. Yu et al.

Free-form image inpainting with gated convolution

Proceedings of the IEEE International Conference on Computer Vision

(2019)

G.E. Hinton et al.

Autoencoders, minimum description length and Helmholtz free energy

Proceedings of the Conference Neural Information Processing Systems

(1994)

I. Goodfellow et al.

Generative adversarial networks

Commun. ACM

(2020)

F. Li et al.

A universal variational framework for sparsity-based image inpainting

IEEE Trans. Image Process.

(2014)

Cited by (12)

An efficient image inpainting algorithm based on a modified Gray–Scott model
2024, Signal Processing
In this paper, we proposes an image inpainting algorithm based on the modified Gray–Scott (GS) model. We have added a fidelity term to control the edge and stability of pixel value evolution during the image inpainting process. We verify the effectiveness and robustness of the proposed model through various experiments. By constructing binary images and complex pixel images with a bit depth of 8, we demonstrate the effectiveness of the model on damaged images with Gaussian noise and locally missing pixels. Additionally, we use metrics such as root mean square error (RMSE) and peak signal-to-noise ratio (PSNR) to evaluate the modified GS model. We compared the model with other methods in terms of both visual and computational metrics, and our proposed model outperforms other methods.
Data-augmented wavelet capsule generative adversarial network for rolling bearing fault diagnosis
2022, Knowledge-Based Systems
Citation Excerpt :
It is evident that the discrete wavelet transform avoids the massive information redundancy resulting from the continuous change of scale and shift compared to the CWT, which provides sufficient information and saves computation time. The DWT characterizes the signal accurately through wavelet basis functions transformed at different scales and focuses continuously on arbitrary signal details [33,34]. The Capsule network [35] was proposed by Hinton, whose inputs and outputs are vectors.
Rolling bearing fault diagnosis with limited imbalance data is significant and challenging. It is a nice attempt to generate data for balancing datasets. In this paper, a wavelet capsule generative adversarial network (WCGAN) is proposed to address this issue. Firstly, the Harr wavelet is introduced into GAN to construct wavelet transform GAN (WTGAN). It keeps convolutional neural networks (CNNs) shift-invariant to extract the deep features of the data. Secondly, WCGAN is developed to alleviate CNNs’ incomplete analysis of signal information, which replaces part of CNNs in WTGAN with capsule networks. Thirdly, a novel loss function is designed for WCGAN to maintain a smooth training process and improve the quality of the generated data. Furthermore, various experiments are conducted in multiple ways to confirm the effectiveness and accuracy of the novel method. Results indicate that the proposed method balances the dataset and surpasses other advanced approaches in imbalanced data diagnosis with potential.
Object Visibility Improvement in Haze Weather by Polarimetric Image Dehazing and Refinements
2024, SSRN
Deep Learning-Based Image and Video Inpainting: A Survey
2024, International Journal of Computer Vision
A deep learning image inpainting method based on stationary wavelet transform
2023, Multimedia Systems
Divide-and-Conquer Completion Network for Video Inpainting
2023, IEEE Transactions on Circuits and Systems for Video Technology

View all citing articles on Scopus

View full text

Detail-enhanced image inpainting based on discrete wavelet transforms

Highlights

Abstract

Introduction

Section snippets

Related works

Methodology

Datasets

Conclusion

CRediT authorship contribution statement

Declaration of Competing Interest

Acknowledgments

Neurocomputing

J. Comput. Appl. Math.

J. Vis. Commun. Image Represent.

Image inpainting

Proceedings of the Annual Conference on Computer Graphics and Interactive Techniques

Learning how to inpaint from global image statistics

Proceedings of the IEEE International Conference on Computer Vision

Region filling and object removal by exemplar-based image inpainting

IEEE Trans. Image Process.

Context-aware patch-based image inpainting using Markov random field modeling

IEEE Trans. Image Process.

Patchmatch: a randomized correspondence algorithm for structural image editing

ACM Trans. Graph.

Context encoders: feature learning by inpainting

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

Globally and locally consistent image completion

ACM Trans. Graph.

Image inpainting for irregular holes using partial convolutions

Proceedings of the European Conference on Computer Vision

Pluralistic image completion

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

Learning pyramid-context encoder network for high-quality image inpainting

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

Generative image inpainting with contextual attention

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

Free-form image inpainting with gated convolution

Proceedings of the IEEE International Conference on Computer Vision

Autoencoders, minimum description length and Helmholtz free energy

Proceedings of the Conference Neural Information Processing Systems

Generative adversarial networks

Commun. ACM

A universal variational framework for sparsity-based image inpainting

IEEE Trans. Image Process.