Improved dual-scale residual network for image super-resolution

doi:10.1016/j.neunet.2020.08.008

Neural Networks

Volume 132, December 2020, Pages 84-95

https://doi.org/10.1016/j.neunet.2020.08.008 Get rights and content

Abstract

In recent years, convolutional neural networks have been successfully applied to single image super-resolution (SISR) tasks, making breakthrough progress both in accuracy and speed. In this work, an improved dual-scale residual network (IDSRN), achieving promising reconstruction performance without sacrificing too much calculations, is proposed for SISR. The proposed network extracts features through two independent parallel branches: dual-scale feature extraction branch and texture attention branch. The improved dual-scale residual block (IDSRB) combined with active weighted mapping strategy constitutes the dual-scale feature extraction branch, which aims to capture dual-scale features of the image. As regards the texture attention branch, an encoder–decoder network employing symmetric full convolutional-deconvolution structure acts as a feature selector to enhance the high-frequency details. The integration of two branches reaches the goal of capturing dual-scale features with high-frequency information. Comparative experiments and extensive studies indicate that the proposed IDSRN can catch up with the state-of-the-art approaches in terms of accuracy and efficiency.

Introduction

The single image super-resolution (SR), as the term suggests, is to recover the corresponding high-resolution (HR) counterpart from the given low-resolution (LR) image. SR has always been a hot spot in the field of image processing research, and was widely applied in many fields (Wang, Chen, & Hoi, 2019), including medical imaging (Greenspan, 2009, Huang et al., 2017b), security (Lin et al., 2007, Zhang et al., 2010), satellite imagery (Jiang, Wang, Yi, and Jiang, 2018, Jiang, et al., 2018), and so on. In the past researches for SR, plenty of classic methods have been extensively explored, such as image statistics based methods (Kim and Kwon, 2010, Xiong et al., 2010), patch-based methods (Freeman et al., 2002, Glasner et al., 2009), and sparse representation methods (Peleg and Elad, 2014, Yang et al., 2010) etc.

With the continuous development of deep learning, deep convolutional neural networks (CNNs) are attracting more and more attention. A few years ago, a super-resolution convolutional neural network (SRCNN) with three convolution layers was proposed by Dong, Loy, He, and Tang (2016), and applied to the direct learning of the end-to-end mapping between LR and HR images based on the universal approximation property of feed-forward neural networks (Cybenko, 1989, Hornik et al., 1989), which achieved a better performance compared with previous methods. Essentially, CNN-based SR models utilize their own powerful learning ability to effectively learn the nonlinear mapping from LR image to HR image by training a large number of parameters.

Since SRCNN (Dong, Loy, He, & Tang, 2016) firstly introduced CNN into super-resolution reconstruction tasks, deep learning-based SR models have been actively explored. A fast super-resolution convolutional neural network (FSRCNN) (Dong, Loy, & Tang, 2016) and an efficient sub-pixel convolutional neural network (ESPCN) (Shi, et al., 2016), two relatively shallow networks, were further proposed based on SRCNN and superior to SRCNN both in accuracy and speed. However, these shallow networks still did not seem to meet people’s requirements for reconstruction effects, and more research interest was shifted to the construction of deeper network models. After residual network (ResNet) (He, Zhang, Ren, & Sun, 2016) overcame the difficulty of training deeper networks, a very deep convolutional network for SR (VDSR) (Kim, Lee, & Lee, 2016a) and a deeply-recursive convolutional network (DRCN) (Kim, Lee, & Lee, 2016b) then knocked on the door of the deep networks in SR. They adopted the idea of residual learning to make new breakthroughs in network depth and reconstruction effects. Subsequently, a variety of deep networks have sprung up, ranging from deep Laplacian pyramid networks (LapSRN) (Lai, Huang, Ahuja, & Yang, 2017), to generative adversarial network (SRGAN) (Ledig, et al., 2017), and then to enhanced deep residual networks (EDSR) (Lim, Son, Kim, Nah, & Lee, 2017). They are different from each other in the perspective of network structures, loss functions, and learning strategies. However, these models all assume that an LR image is obtained from an high-resolution image through bicubic downsampling, lacking scalability in learning a single model. To handle multiple and even spatially variant degradations, a super-resolution network for multiple degradations (SRMD) (Zhang, Zuo, & Zhang, 2018c) was proposed. Unlike traditional generative adversarial models, SinGAN (Shaham, Dekel, & Michaeli, 2019) considers estimating the distribution of a single image, which could solve some different visual tasks through only one image. Recently, a new SR method called wide activation for efficient and accurate image super-resolution (WDSR) (Yu, et al., 2018) demonstrated that increasing the number of convolution kernels before the activation function to increase the width of the feature map under the same parameter complexity could yield superior results. The proposed wider activation block with weight normalization worked well even on the benchmark dataset for a large upscale factor.

On one hand, the deepening of the network depth improves the characteristic learning ability of the network. On the other hand, it also weakens the role of the shallow features in the front. It is not advisable to blindly increase the number of network layers without considering the influence of low-level features. To deal with this contradiction, approaches like dense convolutional network (DenseNet) (Huang, Liu, Der Maaten, & Weinberger, 2017a), dense connected convolutional network (SRDenseNet) (Tong, Li, Liu, & Gao, 2017), a persistent memory network (MemNet) (Tai, Yang, Liu, & Xu, 2017), and residual dense network (RDN) (Zhang, Tian, Kong, Zhong, & Fu, 2018b) that utilized multiple identity mapping and concatenation operations appeared. Instead of simply increasing the depth of networks, another brick in the wall is to expand the network width. In light of convolution kernels at different sizes correspond to different receptive fields, cascaded multi-scale cross network (CMSC) (Hu, Gao, Li, Huang, & Wang, 2018) and multi-scale residual network (MSRN) (Li, Fang, Mei, & Zhang, 2018) introduced different sizes of kernels to merge feature information on multiple scales. They made full use of complementary multi-scale information by stacking multiple blocks, effectively improving cross-layer information flow.

In recent years, the performance of deep network models in the SR field tends to be remarkable in terms of efficiency and effectiveness. Most models are committed to pushing peak signal-to-noise (PSNR) to a high value while often ignoring the real high frequency details of the image itself. For example, the widely-used mean squared error (MSE), sometimes returning the mean of possible solutions while pursuing high PSNR, will lead to blurry or smooth images. To address this problem, perceptual loss (Johnson, Alahi, & Feifei, 2016) was proposed to encourage generating sharper images. Similarly, combinations of adversarial network and perceptual loss or texture loss were dedicated to reconstruct the high resolution image with more minute details (Ledig, et al., 2017, Sajjadi et al., 2017). However, it brought about undesirable noise in addition to the ideal textures. Most recently, an image super-resolution technology called SRNTT (Zhang, Wang, Lin, & Qi, 2019), based on neural texture transfer, was proposed. This model can adaptively migrate textures from reference images to enrich the details of super-resolution images according to texture similarity.

In this work, a novel network architecture was constructed, consisting of a dual-scale feature extraction branch and a texture attention branch. The dual-scale feature extraction branch integrates multiple improved dual-scale residual blocks to merge dual-scale information on different levels, while the texture attention branch captures the high frequency details through an encoder–decoder model. Particularly, an improved dual-scale residual block (IDSRB) using different convolution kernels combined with active weighted mapping (AWM) was developed. Involved skip connections contribute to the network training and the convergence of information flow on different levels. It is worth mentioning that the AWM strategy is introduced to effectively improve skip connections by revisiting different paths. As regard to the texture attention branch, a novel encoder–decoder model is designed to concentrate on the high frequency details. It plays the role of feature enhancement that selects the areas full of details. The addition of this model emphasizes high frequency details and tiny textures, which is beneficial to generate sharper images.

In summary, we establish a novel network composed of a dual-scale feature extraction branch and a texture attention branch, which achieves a good balance between PSNR and high frequency details in a sense. The main contributions of this paper can be summarized as follows:

•
A new improved dual-scale residual block, namely IDSRB, is proposed to richly capture the features on different scales. Where the AWM mechanism is firstly introduced into single image super-resolution tasks to bias the parallel feature extraction paths. From this way, our IDSRB can improve the information flow across the layers.
•
An encoder–decoder model is developed to pay attention to the textures and high frequency details of the image. We improved the conventional U-Net architecture to construct an encoder–decoder network for specialized feature enhancement in the area of single image super-resolution.
•
A novel improved dual-scale residual network combining the dual-scale feature extraction branch with the texture attention branch is presented, which yields a good balance for accuracy and high frequency details.

The remainder of this paper is organized as follows: Section 2 briefly presents the related works and existing representative neural network models from four aspects. Simultaneously, the key technologies involved in this work are described in detail. Section 3 introduces the proposed IDSRB and the entire network architecture in detail, containing mathematical characterizations. In Section 4, experimental comparison with some state-of-the-art methods and results discussion are conducted. Finally, Section 5 concludes the paper with observations and analysis.

Section snippets

Single image super-resolution

Since SRCNN (Dong, Loy, He, & Tang, 2016) firstly utilized CNN to learn the end-to-end mapping relationship from LR space to HR space, CNN-based methods have attracted much attention in SR. Dozens of deep CNN-based models have appeared after ResNet (He et al., 2016) solved the problem of network degradation. Up to this day, a variety of CNN-based models have achieved high levels both in metrics (i.e., PSNR and structure similarity (SSIM)) (Wang, Bovik, Sheikh, & Simoncelli, 2004) and visual

Improved dual-scale residual block

We introduce the AWM strategy (Jung et al., 2018) into the dual-scale residual block to bias the parallel feature extraction paths (i.e., convolutional unit and shortcut connection). The proposed IDSRB is shown as Fig. 4, where lines in different colors indicate different weight values. Above each convolution layer lies the corresponding number of channels.

It is clearly seen that the channels of the front convolution layer are three times as many as the next layer in the first two-path phase.

Experiments results and discussion

In this part, some experimental comparisons and results analysis are principally given. Firstly we describe the details of implementation and training. Then the contributions of different components of the entire network are analyzed. After that, qualitative and visual comparisons of the proposed model with other state-of-the-art methods on five benchmark datasets are conducted. Finally, we evaluate the execution time of the proposed model with some representative methods.

Conclusion

In this work, we propose an improved dual-scale residual network for SR. A dual-scale feature extraction branch and a texture attention branch are combined, responsible for extracting dual-scale features and enhancing high frequency details, respectively. Particularly, the improved dual-scale residual blocks constitute the dual-scale feature extraction branch, which is composed of convolution kernels on different sizes integrated with active weighted mapping strategy. Additionally, the main

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This research was supported by the National Natural Science Foundation of Zhejiang Province, PR China, China under grant LZ20F030001, and the National Natural Science Foundation of China under grant 61672477.

References (70)

HornikK. et al.
Multilayer feedforward networks are universal approximators
Neural Networks
(1989)
ZhangL. et al.
A super-resolution reconstruction algorithm for surveillance images
Signal Processing
(2010)
ArbelaezP. et al.
Contour detection and hierarchical image segmentation
IEEE Transactions on Pattern Analysis and Machine Intelligence
(2011)
Bevilacqua, M., Roumy, A., Guillemot, C., & Alberimorel, M. L. (2012). Low-complexity single-image super-resolution...
Bulat, A., & Tzimiropoulos, G. (2017). How far are we from solving the 2D & 3D face alignment problem? (and a dataset...
Chen, Y., Tai, Y., Liu, X., Shen, C., & Yang, J. (2018). FSRNet: End-to-end learning face super-resolution with facial...
CybenkoG.
Approximation by superpositions of a sigmoidal function
Mathematics of Control, Signals, and Systems
(1989)
DongC. et al.
Image super-resolution using deep convolutional networks
IEEE Transactions on Pattern Analysis and Machine Intelligence
(2016)
Dong, C., Loy, C. C., & Tang, X. (2016). Accelerating the super-resolution convolutional neural network. In European...
FreemanW.T. et al.
Example-based super-resolution
IEEE Computer Graphics and Applications
(2002)

Glasner, D., Bagon, S., & Irani, M. (2009). Super-resolution from a single image. In IEEE conference on computer vision...

GreenspanH.

Super-resolution in medical imaging

The Computer Journal

(2009)

HarisM. et al.

Deep back-projection networks for single image super-resolution

(2019)

He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In IEEE conference on...

HuY. et al.

Single image super-resolution via cascaded multi-scale cross network

(2018)

Hu, X., Mu, H., Zhang, X., Wang, Z., Tan, T., & Sun, J. (2019). Meta-SR: A magnification-arbitrary network for...

Hu, J., Shen, L., & Sun, G. (2018). Squeeze-and-excitation networks. In IEEE international conference on computer...

Huang, G., Liu, Z., Der Maaten, L. V., & Weinberger, K. Q. (2017). Densely connected convolutional networks. In IEEE...

Huang, Y., Shao, L., & Frangi, A. F. (2017). Simultaneous super-resolution and cross-modality synthesis of 3D medical...

Huang, J., Singh, A., & Ahuja, N. (2015). Single image super-resolution from transformed self-exemplars. In Proceedings...

Isola, P., Zhu, J., Zhou, T., & Efros, A. A. (2017). Image-to-image translation with conditional adversarial networks....

JiangK. et al.

A progressively enhanced network for video satellite imagery superresolution

IEEE Signal Processing Letters

(2018)

Jiang, K., Wang, Z., Yi, P., Jiang, J., Xiao, J., & Yao, Y. (2018). Deep distillation recursive network for remote...

Jing, Y., Liu, Q., Zhang, K., Jing, Y., Liu, Q., Zhang, K., Jing, Y., Liu, Q., & Zhang, K. (2017). Stacked hourglass...

Johnson, J., Alahi, A., & Feifei, L. (2016). Perceptual losses for real-time style transfer and super-resolution. In...

JungH. et al.

Residual convolutional neural network revisited with active weighted mapping

(2018)

KimK.I. et al.

Single-image super-resolution using sparse regression and natural image prior

IEEE Transactions on Pattern Analysis and Machine Intelligence

(2010)

Kim, J., Lee, J. K., & Lee, K. M. (2016). Accurate image super-resolution using very deep convolutional networks. In...

Kim, J., Lee, J. K., & Lee, K. M. (2016). Deeply-recursive convolutional network for image super-resolution. in IEEE...

Kingma, D. P., & Ba, J. (2015). Adam: A method for stochastic optimization. In International conference on learning...

Lai, W. S., Huang, J. B., Ahuja, N., & Yang, M. H. (2017). Deep Laplacian pyramid networks for fast and accurate...

Ledig, C., Theis, L., Huszar, F., Caballero, J., Cunningham, A., Acosta, A., Aitken, A. P., Tejani, A., Totz, J., &...

Li, J., Fang, F., Mei, K., & Zhang, G. (2018). Multi-scale residual network for image super-resolution. In European...

Li, Z., Yang, J., Liu, Z., Yang, X., Jeon, G., & Wu, W. (2019). Feedback network for image super-resolution. In IEEE...

Lim, B., Son, S., Kim, H., Nah, S., & Lee, K. M. (2017). Enhanced deep residual networks for single image...

Cited by (15)

An efficient multi-scale learning method for image super-resolution networks
2024, Neural Networks
The image super-resolution (SR) operation holds multiple solutions with the one-to-many mapping from low-resolution (LR) to high-resolution (HR) space. However, the SR of different scales for the same image is usually regarded as independent tasks in the existing SR networks. Therefore, these networks are inflexible to effectively utilize feature learning experience and require much more computing time to recover HR images in higher resolutions. Recent arbitrary scale SR methods still cannot solve these problems. To efficiently and effectively recover HR images, this paper presents an efficient multi-scale learning method for image SR networks based on a novel self-generating (SG) mechanism. This method (briefly named SG-SR) utilizes the feature learning results of SR networks to generate upscale filters by using the novel SG upscale module, which is proposed to replace the traditional upscale module. For each scale factor, the SG upscale module provides the corresponding amount of the spatial weights to filter the LR tensor and then converts filtered tensors with the original tensor to corresponding HR images. The proposed method is evaluated through extensive experiments and compared with state-of-the-art (SOTA) methods on widely used benchmark datasets. The experimental results show that our method has superior performance compared with SOTA methods, and the SG upscale module can improve the performance of existing SR networks effectively. What is more, our module has a much less calculation cost than the other upscale modules.
A novel fuzzy hierarchical fusion attention convolution neural network for medical image super-resolution reconstruction
2023, Information Sciences
Citation Excerpt :
However, the principle of image acquisition is complex for medical devices and improving the resolution of medical images could involve numerous related complex technologies and even more expensive equipment. With the development of information technology, super-resolution (SR) methods have been widely used in the fields of computer vision and image processing [1–11]. They can directly reconstruct a high-resolution (HR) image from a low-resolution (LR) medical image without changing medical imaging equipment.
The clarity of medical images is crucial for doctors to identify and diagnose different diseases. High-resolution images have more detailed information and clearer content than low-resolution images. It is well known that medical images can frequently have some blurred object boundaries, and that traditional deep learning models cannot adequately describe the uncertainty of these blurred boundaries. This paper proposes a new fuzzy metric to characterize the uncertainty of pixels and designs a fuzzy hierarchical fusion attention neural network based on multiscale guided learning. Specifically, a fuzzy neural information-processing block is proposed, which converts an input image into a fuzzy domain using fuzzy membership functions. The uncertainty of the pixels is processed using the proposed fuzzy rules, and then the output of the fuzzy rule layer is fused with the result of the convolution in the neural network. Simultaneously, a multiscale guided-learning dense residual block and pyramidal hierarchical attention module are designed to extract more effective hierarchical image information. Finally, a recurrent memory module with a residual structure is used to process the output features of the hierarchical attention modules. A recursive sub-pixel reconstruction module is used at the tail of the network to reconstruct the images. Compared with existing super-resolution methods using the public COVID-CT dataset, the proposed method demonstrated superior performance in high-resolution medical image reconstruction and reduced the number of parameters and analysis time of the models.
Multi-level landmark-guided deep network for face super-resolution
2022, Neural Networks
Citation Excerpt :
In the method, a progressive global sparse optimization is applied to compress the redundant parameters in the SR network and a sparse-aware attention module is designed to keep comparable performance between the compressed network between the original one. Liu and Cao (2020) proposed an improved dual-scale residual network (IDSRN) that integrates a dual-scale feature extraction branch and a texture attention branch in parallel. Different from SR for generic natural images, FH is a specific-domain SR task because facial images are highly structured with stable facial components such as eyes, noses, mouth, and eyebrows.
Recent years deep learning-based methods incorporating facial prior knowledge for face super-resolution (FSR) are advancing and have gained impressive performance. However, some important priors such as facial landmarks are not fully exploited in existing methods, leading to noticeable artifacts in the resultant SR face images especially under large magnification. In this paper, we propose a novel multi-level landmark-guided deep network (MLGDN) for FSR. More specifically, to fully exploit the dependencies between low and high resolution images and to reduce network parameters as well as capture more reliable feature representation, we introduce a recursive back-projection network with a particular feedback mechanism for coarse-to-fine FSR. Furthermore, we incorporate an attention fusion module in the front of backbone network to strengthen face components and a feature modulation module to refine features in the middle of backbone network. By this way, the facial landmarks extracted from face images can be fully shared by the modules in different levels, which benefit to produce more faithful facial details. Both quantitative and qualitative performance evaluations on two benchmark databases demonstrate that the proposed MLGDN can achieve more impressive SR results than other state-of-the-art competitors. Code will be available at https://github.com/zhuangcheng31/MLG_Face.git/
A novel image super-resolution algorithm based on multi-scale dense recursive fusion network
2022, Neurocomputing
Citation Excerpt :
Qin et al. [52] proposed a Deep Adaptive Dual-network (DADN) bidirectional SR network, in which one branch of the network was trained for focusing simple image regions, and the other was trained for processing hard image regions. In 2020, Liu et al. proposed two lightweight and effective image super-resolution networks in [53,56], which called Attention Based Multi-Scale Residual Network (AMSRN) and Improved Dual-Scale Residual Network (IDSRN). These two networks were mainly multi-scale feature capture methods based on attention mechanism.
With the increasing maturity of convolution neural network (CNN) technology, the image super-resolution reconstruction (SR) method based on CNN is booming and has achieved many remarkable results. Undoubtedly, SR has become the mainstream direction of image reconstruction technology. However, most of the existing SR methods improve the reconstruction performance by increasing the depth of networks, which also increases the number of parameters, number of network computations, and difficulty of training network. To solve the performance complexity dilemma in SR, this paper proposes a network called a multi-scale dense recursive fusion network (MSDRFN). The network is composed of three parts: initial feature extraction module, multi-scale dense fusion group module and recursive reconstruction module. In detail, rough features are first extracted through a shallow feature extraction module, and then are inputted into multi-scale dense fusion blocks (MSDFBs) group. Each MSDFB makes full use of image features in convolution kernels of different sizes to obtain different hierarchical features, and further these output features are inputted into the channel attention mechanism to learn their corresponding weights. All MSDFBs outputs will be restored to high resolution images via the recursive reconstruction module. In addition, the network supplements the information loss with residual learning, which is embodied in one long-jump connection and several short-jump connections. The proposed network is mainly trained in the Pytorch deep learning framework. In comparison experiments on benchmark datasets, the proposed method outperformed the most advanced convolutional methods.
Enhanced image prior for unsupervised remoting sensing super-resolution
2021, Neural Networks
Citation Excerpt :
Further, ESPCNN (Shi et al., 2016) replaced the up-sampling version of input LR patches with the subpixel convolution layer to reduce the checkerboard artifact. With the success of the residual block in recognition tasks (He, Zhang, Ren, & Sun, 2016), various works (Cao, Yao, & Liang, 2020; Lim et al., 2017; Liu & Cao, 2020; Zhang et al., 2018b) have focused on designing deeper networks and combining multi-level feature maps to fully exploit the hierarchical features. For example, the attention mechanism was proposed to improve the CNN performance for various tasks.
Numerous approaches based on training low-high resolution image pairs have been proposed to address the super-resolution (SR) task. Despite their success, low-high resolution image pairs are usually difficult to obtain in certain scenarios, and these methods are limited in the actual scene (unknown or non-ideal image acquisition process). In this paper, we proposed a novel unsupervised learning framework, termed Enhanced Image Prior (EIP), which achieves SR tasks without low/high resolution image pairs. We first feed random noise maps into a designed generative adversarial network (GAN) for satellite image SR reconstruction. Then, we convert the reference image to latent space as the enhanced image prior. Finally, we update the input noise in the latent space with a recurrent updating strategy, and further transfer the texture and structured information from the reference image. Results on extensive experiments on the Draper dataset show that EIP achieves significant improvements over state-of-the-art unsupervised SR methods both quantitatively and qualitatively. Our experiments on satellite (SuperView-1) images reveal the potential of the proposed approach in improving the resolution of remote sensing imagery compared with the supervised algorithms. Source code is available at https://github.com/jiaming-wang/EIP.
Multi-scale Self-attention Recursive Hierarchical Network Based on Improved Residual Conv-LSTM
2023, Research Square

View all citing articles on Scopus

View full text

Improved dual-scale residual network for image super-resolution

Abstract

Introduction

Section snippets

Single image super-resolution

Improved dual-scale residual block

Experiments results and discussion

Conclusion

Declaration of Competing Interest

Acknowledgments

Neural Networks

Signal Processing

Contour detection and hierarchical image segmentation

IEEE Transactions on Pattern Analysis and Machine Intelligence

Approximation by superpositions of a sigmoidal function

Mathematics of Control, Signals, and Systems

Image super-resolution using deep convolutional networks

IEEE Transactions on Pattern Analysis and Machine Intelligence

Example-based super-resolution

IEEE Computer Graphics and Applications

Super-resolution in medical imaging

The Computer Journal

Deep back-projection networks for single image super-resolution

Single image super-resolution via cascaded multi-scale cross network

A progressively enhanced network for video satellite imagery superresolution

IEEE Signal Processing Letters

Residual convolutional neural network revisited with active weighted mapping

Single-image super-resolution using sparse regression and natural image prior

IEEE Transactions on Pattern Analysis and Machine Intelligence