Elsevier

Neural Networks

Volume 132, December 2020, Pages 84-95
Neural Networks

Improved dual-scale residual network for image super-resolution

https://doi.org/10.1016/j.neunet.2020.08.008Get rights and content

Abstract

In recent years, convolutional neural networks have been successfully applied to single image super-resolution (SISR) tasks, making breakthrough progress both in accuracy and speed. In this work, an improved dual-scale residual network (IDSRN), achieving promising reconstruction performance without sacrificing too much calculations, is proposed for SISR. The proposed network extracts features through two independent parallel branches: dual-scale feature extraction branch and texture attention branch. The improved dual-scale residual block (IDSRB) combined with active weighted mapping strategy constitutes the dual-scale feature extraction branch, which aims to capture dual-scale features of the image. As regards the texture attention branch, an encoder–decoder network employing symmetric full convolutional-deconvolution structure acts as a feature selector to enhance the high-frequency details. The integration of two branches reaches the goal of capturing dual-scale features with high-frequency information. Comparative experiments and extensive studies indicate that the proposed IDSRN can catch up with the state-of-the-art approaches in terms of accuracy and efficiency.

Introduction

The single image super-resolution (SR), as the term suggests, is to recover the corresponding high-resolution (HR) counterpart from the given low-resolution (LR) image. SR has always been a hot spot in the field of image processing research, and was widely applied in many fields (Wang, Chen, & Hoi, 2019), including medical imaging (Greenspan, 2009, Huang et al., 2017b), security (Lin et al., 2007, Zhang et al., 2010), satellite imagery (Jiang, Wang, Yi, and Jiang, 2018, Jiang, et al., 2018), and so on. In the past researches for SR, plenty of classic methods have been extensively explored, such as image statistics based methods (Kim and Kwon, 2010, Xiong et al., 2010), patch-based methods (Freeman et al., 2002, Glasner et al., 2009), and sparse representation methods (Peleg and Elad, 2014, Yang et al., 2010) etc.

With the continuous development of deep learning, deep convolutional neural networks (CNNs) are attracting more and more attention. A few years ago, a super-resolution convolutional neural network (SRCNN) with three convolution layers was proposed by Dong, Loy, He, and Tang (2016), and applied to the direct learning of the end-to-end mapping between LR and HR images based on the universal approximation property of feed-forward neural networks (Cybenko, 1989, Hornik et al., 1989), which achieved a better performance compared with previous methods. Essentially, CNN-based SR models utilize their own powerful learning ability to effectively learn the nonlinear mapping from LR image to HR image by training a large number of parameters.

Since SRCNN (Dong, Loy, He, & Tang, 2016) firstly introduced CNN into super-resolution reconstruction tasks, deep learning-based SR models have been actively explored. A fast super-resolution convolutional neural network (FSRCNN) (Dong, Loy, & Tang, 2016) and an efficient sub-pixel convolutional neural network (ESPCN) (Shi, et al., 2016), two relatively shallow networks, were further proposed based on SRCNN and superior to SRCNN both in accuracy and speed. However, these shallow networks still did not seem to meet people’s requirements for reconstruction effects, and more research interest was shifted to the construction of deeper network models. After residual network (ResNet) (He, Zhang, Ren, & Sun, 2016) overcame the difficulty of training deeper networks, a very deep convolutional network for SR (VDSR) (Kim, Lee, & Lee, 2016a) and a deeply-recursive convolutional network (DRCN) (Kim, Lee, & Lee, 2016b) then knocked on the door of the deep networks in SR. They adopted the idea of residual learning to make new breakthroughs in network depth and reconstruction effects. Subsequently, a variety of deep networks have sprung up, ranging from deep Laplacian pyramid networks (LapSRN) (Lai, Huang, Ahuja, & Yang, 2017), to generative adversarial network (SRGAN) (Ledig, et al., 2017), and then to enhanced deep residual networks (EDSR) (Lim, Son, Kim, Nah, & Lee, 2017). They are different from each other in the perspective of network structures, loss functions, and learning strategies. However, these models all assume that an LR image is obtained from an high-resolution image through bicubic downsampling, lacking scalability in learning a single model. To handle multiple and even spatially variant degradations, a super-resolution network for multiple degradations (SRMD) (Zhang, Zuo, & Zhang, 2018c) was proposed. Unlike traditional generative adversarial models, SinGAN (Shaham, Dekel, & Michaeli, 2019) considers estimating the distribution of a single image, which could solve some different visual tasks through only one image. Recently, a new SR method called wide activation for efficient and accurate image super-resolution (WDSR) (Yu, et al., 2018) demonstrated that increasing the number of convolution kernels before the activation function to increase the width of the feature map under the same parameter complexity could yield superior results. The proposed wider activation block with weight normalization worked well even on the benchmark dataset for a large upscale factor.

On one hand, the deepening of the network depth improves the characteristic learning ability of the network. On the other hand, it also weakens the role of the shallow features in the front. It is not advisable to blindly increase the number of network layers without considering the influence of low-level features. To deal with this contradiction, approaches like dense convolutional network (DenseNet) (Huang, Liu, Der Maaten, & Weinberger, 2017a), dense connected convolutional network (SRDenseNet) (Tong, Li, Liu, & Gao, 2017), a persistent memory network (MemNet) (Tai, Yang, Liu, & Xu, 2017), and residual dense network (RDN) (Zhang, Tian, Kong, Zhong, & Fu, 2018b) that utilized multiple identity mapping and concatenation operations appeared. Instead of simply increasing the depth of networks, another brick in the wall is to expand the network width. In light of convolution kernels at different sizes correspond to different receptive fields, cascaded multi-scale cross network (CMSC) (Hu, Gao, Li, Huang, & Wang, 2018) and multi-scale residual network (MSRN) (Li, Fang, Mei, & Zhang, 2018) introduced different sizes of kernels to merge feature information on multiple scales. They made full use of complementary multi-scale information by stacking multiple blocks, effectively improving cross-layer information flow.

In recent years, the performance of deep network models in the SR field tends to be remarkable in terms of efficiency and effectiveness. Most models are committed to pushing peak signal-to-noise (PSNR) to a high value while often ignoring the real high frequency details of the image itself. For example, the widely-used mean squared error (MSE), sometimes returning the mean of possible solutions while pursuing high PSNR, will lead to blurry or smooth images. To address this problem, perceptual loss (Johnson, Alahi, & Feifei, 2016) was proposed to encourage generating sharper images. Similarly, combinations of adversarial network and perceptual loss or texture loss were dedicated to reconstruct the high resolution image with more minute details (Ledig, et al., 2017, Sajjadi et al., 2017). However, it brought about undesirable noise in addition to the ideal textures. Most recently, an image super-resolution technology called SRNTT (Zhang, Wang, Lin, & Qi, 2019), based on neural texture transfer, was proposed. This model can adaptively migrate textures from reference images to enrich the details of super-resolution images according to texture similarity.

In this work, a novel network architecture was constructed, consisting of a dual-scale feature extraction branch and a texture attention branch. The dual-scale feature extraction branch integrates multiple improved dual-scale residual blocks to merge dual-scale information on different levels, while the texture attention branch captures the high frequency details through an encoder–decoder model. Particularly, an improved dual-scale residual block (IDSRB) using different convolution kernels combined with active weighted mapping (AWM) was developed. Involved skip connections contribute to the network training and the convergence of information flow on different levels. It is worth mentioning that the AWM strategy is introduced to effectively improve skip connections by revisiting different paths. As regard to the texture attention branch, a novel encoder–decoder model is designed to concentrate on the high frequency details. It plays the role of feature enhancement that selects the areas full of details. The addition of this model emphasizes high frequency details and tiny textures, which is beneficial to generate sharper images.

In summary, we establish a novel network composed of a dual-scale feature extraction branch and a texture attention branch, which achieves a good balance between PSNR and high frequency details in a sense. The main contributions of this paper can be summarized as follows:

  • A new improved dual-scale residual block, namely IDSRB, is proposed to richly capture the features on different scales. Where the AWM mechanism is firstly introduced into single image super-resolution tasks to bias the parallel feature extraction paths. From this way, our IDSRB can improve the information flow across the layers.

  • An encoder–decoder model is developed to pay attention to the textures and high frequency details of the image. We improved the conventional U-Net architecture to construct an encoder–decoder network for specialized feature enhancement in the area of single image super-resolution.

  • A novel improved dual-scale residual network combining the dual-scale feature extraction branch with the texture attention branch is presented, which yields a good balance for accuracy and high frequency details.

The remainder of this paper is organized as follows: Section 2 briefly presents the related works and existing representative neural network models from four aspects. Simultaneously, the key technologies involved in this work are described in detail. Section 3 introduces the proposed IDSRB and the entire network architecture in detail, containing mathematical characterizations. In Section 4, experimental comparison with some state-of-the-art methods and results discussion are conducted. Finally, Section 5 concludes the paper with observations and analysis.

Section snippets

Single image super-resolution

Since SRCNN (Dong, Loy, He, & Tang, 2016) firstly utilized CNN to learn the end-to-end mapping relationship from LR space to HR space, CNN-based methods have attracted much attention in SR. Dozens of deep CNN-based models have appeared after ResNet (He et al., 2016) solved the problem of network degradation. Up to this day, a variety of CNN-based models have achieved high levels both in metrics (i.e., PSNR and structure similarity (SSIM)) (Wang, Bovik, Sheikh, & Simoncelli, 2004) and visual

Improved dual-scale residual block

We introduce the AWM strategy (Jung et al., 2018) into the dual-scale residual block to bias the parallel feature extraction paths (i.e., convolutional unit and shortcut connection). The proposed IDSRB is shown as Fig. 4, where lines in different colors indicate different weight values. Above each convolution layer lies the corresponding number of channels.

It is clearly seen that the channels of the front convolution layer are three times as many as the next layer in the first two-path phase.

Experiments results and discussion

In this part, some experimental comparisons and results analysis are principally given. Firstly we describe the details of implementation and training. Then the contributions of different components of the entire network are analyzed. After that, qualitative and visual comparisons of the proposed model with other state-of-the-art methods on five benchmark datasets are conducted. Finally, we evaluate the execution time of the proposed model with some representative methods.

Conclusion

In this work, we propose an improved dual-scale residual network for SR. A dual-scale feature extraction branch and a texture attention branch are combined, responsible for extracting dual-scale features and enhancing high frequency details, respectively. Particularly, the improved dual-scale residual blocks constitute the dual-scale feature extraction branch, which is composed of convolution kernels on different sizes integrated with active weighted mapping strategy. Additionally, the main

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This research was supported by the National Natural Science Foundation of Zhejiang Province, PR China, China under grant LZ20F030001, and the National Natural Science Foundation of China under grant 61672477.

References (70)

  • HornikK. et al.

    Multilayer feedforward networks are universal approximators

    Neural Networks

    (1989)
  • ZhangL. et al.

    A super-resolution reconstruction algorithm for surveillance images

    Signal Processing

    (2010)
  • ArbelaezP. et al.

    Contour detection and hierarchical image segmentation

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2011)
  • Bevilacqua, M., Roumy, A., Guillemot, C., & Alberimorel, M. L. (2012). Low-complexity single-image super-resolution...
  • Bulat, A., & Tzimiropoulos, G. (2017). How far are we from solving the 2D & 3D face alignment problem? (and a dataset...
  • Chen, Y., Tai, Y., Liu, X., Shen, C., & Yang, J. (2018). FSRNet: End-to-end learning face super-resolution with facial...
  • CybenkoG.

    Approximation by superpositions of a sigmoidal function

    Mathematics of Control, Signals, and Systems

    (1989)
  • DongC. et al.

    Image super-resolution using deep convolutional networks

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2016)
  • Dong, C., Loy, C. C., & Tang, X. (2016). Accelerating the super-resolution convolutional neural network. In European...
  • FreemanW.T. et al.

    Example-based super-resolution

    IEEE Computer Graphics and Applications

    (2002)
  • Glasner, D., Bagon, S., & Irani, M. (2009). Super-resolution from a single image. In IEEE conference on computer vision...
  • GreenspanH.

    Super-resolution in medical imaging

    The Computer Journal

    (2009)
  • HarisM. et al.

    Deep back-projection networks for single image super-resolution

    (2019)
  • He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In IEEE conference on...
  • HuY. et al.

    Single image super-resolution via cascaded multi-scale cross network

    (2018)
  • Hu, X., Mu, H., Zhang, X., Wang, Z., Tan, T., & Sun, J. (2019). Meta-SR: A magnification-arbitrary network for...
  • Hu, J., Shen, L., & Sun, G. (2018). Squeeze-and-excitation networks. In IEEE international conference on computer...
  • Huang, G., Liu, Z., Der Maaten, L. V., & Weinberger, K. Q. (2017). Densely connected convolutional networks. In IEEE...
  • Huang, Y., Shao, L., & Frangi, A. F. (2017). Simultaneous super-resolution and cross-modality synthesis of 3D medical...
  • Huang, J., Singh, A., & Ahuja, N. (2015). Single image super-resolution from transformed self-exemplars. In Proceedings...
  • Isola, P., Zhu, J., Zhou, T., & Efros, A. A. (2017). Image-to-image translation with conditional adversarial networks....
  • JiangK. et al.

    A progressively enhanced network for video satellite imagery superresolution

    IEEE Signal Processing Letters

    (2018)
  • Jiang, K., Wang, Z., Yi, P., Jiang, J., Xiao, J., & Yao, Y. (2018). Deep distillation recursive network for remote...
  • Jing, Y., Liu, Q., Zhang, K., Jing, Y., Liu, Q., Zhang, K., Jing, Y., Liu, Q., & Zhang, K. (2017). Stacked hourglass...
  • Johnson, J., Alahi, A., & Feifei, L. (2016). Perceptual losses for real-time style transfer and super-resolution. In...
  • JungH. et al.

    Residual convolutional neural network revisited with active weighted mapping

    (2018)
  • KimK.I. et al.

    Single-image super-resolution using sparse regression and natural image prior

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2010)
  • Kim, J., Lee, J. K., & Lee, K. M. (2016). Accurate image super-resolution using very deep convolutional networks. In...
  • Kim, J., Lee, J. K., & Lee, K. M. (2016). Deeply-recursive convolutional network for image super-resolution. in IEEE...
  • Kingma, D. P., & Ba, J. (2015). Adam: A method for stochastic optimization. In International conference on learning...
  • Lai, W. S., Huang, J. B., Ahuja, N., & Yang, M. H. (2017). Deep Laplacian pyramid networks for fast and accurate...
  • Ledig, C., Theis, L., Huszar, F., Caballero, J., Cunningham, A., Acosta, A., Aitken, A. P., Tejani, A., Totz, J., &...
  • Li, J., Fang, F., Mei, K., & Zhang, G. (2018). Multi-scale residual network for image super-resolution. In European...
  • Li, Z., Yang, J., Liu, Z., Yang, X., Jeon, G., & Wu, W. (2019). Feedback network for image super-resolution. In IEEE...
  • Lim, B., Son, S., Kim, H., Nah, S., & Lee, K. M. (2017). Enhanced deep residual networks for single image...
  • Cited by (15)

    • A novel fuzzy hierarchical fusion attention convolution neural network for medical image super-resolution reconstruction

      2023, Information Sciences
      Citation Excerpt :

      However, the principle of image acquisition is complex for medical devices and improving the resolution of medical images could involve numerous related complex technologies and even more expensive equipment. With the development of information technology, super-resolution (SR) methods have been widely used in the fields of computer vision and image processing [1–11]. They can directly reconstruct a high-resolution (HR) image from a low-resolution (LR) medical image without changing medical imaging equipment.

    • Multi-level landmark-guided deep network for face super-resolution

      2022, Neural Networks
      Citation Excerpt :

      In the method, a progressive global sparse optimization is applied to compress the redundant parameters in the SR network and a sparse-aware attention module is designed to keep comparable performance between the compressed network between the original one. Liu and Cao (2020) proposed an improved dual-scale residual network (IDSRN) that integrates a dual-scale feature extraction branch and a texture attention branch in parallel. Different from SR for generic natural images, FH is a specific-domain SR task because facial images are highly structured with stable facial components such as eyes, noses, mouth, and eyebrows.

    • A novel image super-resolution algorithm based on multi-scale dense recursive fusion network

      2022, Neurocomputing
      Citation Excerpt :

      Qin et al. [52] proposed a Deep Adaptive Dual-network (DADN) bidirectional SR network, in which one branch of the network was trained for focusing simple image regions, and the other was trained for processing hard image regions. In 2020, Liu et al. proposed two lightweight and effective image super-resolution networks in [53,56], which called Attention Based Multi-Scale Residual Network (AMSRN) and Improved Dual-Scale Residual Network (IDSRN). These two networks were mainly multi-scale feature capture methods based on attention mechanism.

    • Enhanced image prior for unsupervised remoting sensing super-resolution

      2021, Neural Networks
      Citation Excerpt :

      Further, ESPCNN (Shi et al., 2016) replaced the up-sampling version of input LR patches with the subpixel convolution layer to reduce the checkerboard artifact. With the success of the residual block in recognition tasks (He, Zhang, Ren, & Sun, 2016), various works (Cao, Yao, & Liang, 2020; Lim et al., 2017; Liu & Cao, 2020; Zhang et al., 2018b) have focused on designing deeper networks and combining multi-level feature maps to fully exploit the hierarchical features. For example, the attention mechanism was proposed to improve the CNN performance for various tasks.

    View all citing articles on Scopus
    View full text