Abstract

In this paper, a novel digital video resolution enhancement algorithm based on adaptive directional interpolation is proposed, where the directionality of the edge structure and the nonlocal self-similarity prior within the current frame as well as its adjacent frames are both considered. First, we establish the regularization equation that conforms to the prior model of a video frame and then take the classic bicubic interpolation result as the initial estimation to iteratively solve the restoration equation, in which the edge structures and contours in low resolution (LR) input are reconstructed to estimate and refine the desired high resolution (HR) output. Experimental results show that the proposed algorithm can effectively enhance the clarity of a video frame, with satisfying subjective visual quality and PSNR value.

1. Introduction

Videos and images are the main sources of information for humans. According to statistics, more than 80% of the information we receive from the outside world comes from vision. With the development of digital mobile communication and computer technology, various novel applications such as distance education, video on demand, telemedicine, and multiperson online video conference have appeared, promoting the revolution of productivity and social progress. In the meantime, the image quality of digital video has also been desired higher and higher, where the clarity index comes from standard definition to high definition (HD) and ultrahigh definition, as well as the corresponding resolution index also comes from 480p to 720p, 1080p, and 2160p (4K). On the one hand, these improvements in clarity and resolution can meet the increasing demand of end users and provide better image quality; on the other hand, while high-resolution video provides more details in content, it also adds burdens to the entire production and consumption ecosystem: more expensive capture and storage devices on the image acquisition side, additional computing resource requirement for video editing on the media creation side, and more data transmission pressure on the communication network side. All these above have become important factors that restrict further improvement of video clarity and quality. In order to solve this problem, a common way is to use an image postprocessing procedure where the LR input frame is interpolated by a superresolution method [17], leading to a resolution-enhanced HR one. This software-based technique does not change the existing image acquisition and data transmission systems and thus is of great value in fields of videotelephony, virtual reality, augmented reality, and HD video games.

Natural images are highly structured, which reflects the strong time-spatial redundancy and self-similarity underlying pixels and performs a key role in solving inverse problems such as image denoising, deblurring, inpainting, and superresolution. By considering the fact that the human visual system is sensitive to the image edge structure [711], a novel digital video resolution enhancement algorithm via adaptive directional filtering is proposed in this paper, in which the characteristics of the edge contour and the nonlocal self-similarity within current frame as well as the corresponding adjacent frames are both considered. We first establish the regularization equation that conforms to the prior model of a video frame and then take the classic bicubic interpolation result as the initial estimation to iteratively solve the restoration equation, where the edge structures and contours in LR input are reconstructed to estimate and refine the desired HR output.

The rest of the sections are organized as follows. In Section 2, we introduce the core idea of the proposed adaptive directional interpolation scheme for estimating the missing details of the LR image and then use the nonlocal self-similarity prior to further improve the interpolation performance. The details of the video resolution enhancement algorithm are provided in Section 3. Section 4 presents the experimental validations of the proposed algorithm and comparison with the classic bicubic interpolation method; conclusions are drawn in Section 5.

2. The Core Idea

Directional regularity has widely existed in textures, edges, and contours of natural images (shown as in Figure 1). Denote vector as the image patch centered around the th pixel and with sizes , and as the filter matrix corresponding to the directional filter with angle (in this paper, the directional controllable steerable filter [12] is used). Obviously, the filtered vector is the sparsest (namely, is approximate to zero) when is parallel with the main direction of . Generally, an image patch may include more than one main direction due to its complexity (examples are shown in Figures 1(c) and 1(d)); we can search for these direction angles using the following algorithm:

Main direction searching
Partition f into overlapping patches , and for each patch, do the following steps:
  • Initialization: Set main direction angle set , candidate angle set , the largest number of direction angles . Set start point .
  • Main loop (repeat times):
    - Calculate the filtering result ;
    - Find the best angle ;
    - Update , and for the next iteration.
  • Output: The main direction angle set S of the th image patch .
Algorithm 1:

In our previous works [3, 13], we have shown the details to construct a blurring matrix from its corresponding linear degradation operator (as well as the downsampling matrix ). Here, we simply present the steps to construct the directional filter matrix from a 2-D filter kernel , as follows: (i)Let be a zero matrix;(ii)For each pixel of the filtered image patch : (a)Compute the 2-D coordinate of pixel from its 1-D index ;(b)For each element of filter kernel , set the element .

The structure of filter matrix is presented in Figure 2.

Figure 3 shows the main direction searching results of test images barbara and butterfly using the algorithm above.

Denote as the LR image patch, where is the downsampling matrix [3]. When the downsampling factor is an integer, we have , and the corresponding LR input can be represented as . With the constraint of the directional regularity posed above, the following interpolation equation can be used to estimate the original HR patch that where is the regularization parameter and is the adaptive directional filter matrix. This equation posed above has the well-known closed-form solution

It is easy to know from the structure of the downsampling matrix that is diagonal. For the downsampling factor , we have where and is a zero matrix. Plugging the SVD decomposition into expression (2), this leads to

Recall that , and thus, is approximately singular, implying that one or more singular values of are close to zero, and therefore, the inverse of the restoration kernel is ill-posed that can not be well handled. To solve this problem, we explore the self-similarity prior widely existing in natural images to further improve the interpolation performance. In this paper, the nonlocal autoregressive (NAR) model of images [14] is used to add additional constraint to the restoration kernel and reduce the degree of freedom of desired unknown pixels; this will help to yield a more stable result.

According to our previous works [1517], we show that each patch in an image can be approximatively represented as a linear combination of nonlocal neighbors at different locations (shown as in Figure 4) that

The neighbor set consists of nonlocal patches around , which can be seen as an adaptive local dictionary that refers to the target vector , and the corresponding representation coefficient can be easily computed by ridge regression where the parameter is set manually to lead to the best results. Moreover, we have also proved in [15, 17] that is sparse when the atoms of are similar to in terms of normalized inner products. Considering that sparsity is very powerful that is broadly used in solving various inverse problems and has shown the ability to handle the image superresolution task [3, 6, 14, 15, 18], we here propose the following algorithm (Algorithm 2) to construct the adaptive dictionary :

Adaptive dictionary construction
Partition f into overlapping patches , and for each patch, do the following steps:
  • Initialization: Set nonlocal neighbor number and search window size .
  • Dictionary construction:
    - Sweep over all possible patches over the searching window centered around , and compute the normalized candidate atom set ;
    - Compute the normalized inner product vector ;
    - Select the atoms with the largest values in to construct dictionary .
  • Output: The adaptive dictionary of the th image patch .
Algorithm 2:

Figure 5 shows the dictionary construction results of two patches of test images lena using the algorithm above. For video sequence, the above algorithm is also adapted to construct a dictionary for image patch of frames. At this time, each atom of comes from those nonlocal neighbors belonging to the current frame and its adjacent frames, shown as in Figure 6. Considering that video scene changes smoothly for most time, the differences between neighbor frames are small; this means it will be easier to find more similar candidate patches and thus finally leads to a sparser/better representation coefficient , which helps in improving the interpolation performance further.

Replacing the constraint posed in (5) by an equivalent penalty and adding it to Equation (1), we obtain

Combining this equation with Equation (6), we get the desired HR patch estimator

Contrast the expression above with formula (2), we can see that the restoration kernel is full rank now, while keeping the advantage of diagonal, leading to a cheap computation of matrix inversion.

3. Video Resolution Enhancement Algorithm

To sum up, we use the interpolation algorithm (Algorithm 3) listed below for digital video resolution enhancement:

Resolution enhancement algorithm
For each LR frame of the input digital video sequence, do the following steps:
  • Initialization: Set the bicubic interpolation of .
  • Main loop (repeat times):
    - Use Algorithm 1 to search the main direction for each patch of , calculate the corresponding adaptive directional filter matrix ;
    - Use Algorithm 2 to construct the adaptive dictionary ;
    - Taking as an initial estimation of the desired HR output , use Equation (8) to compute the resolution enhancement result ;
    - Update for the next iteration when all image patches have been restored.
  • Output: The resolution enhanced output .
Algorithm 3:

A graphic demonstration of this algorithm is displayed in Figure 7.

In each interpolation loop, the time consumption mainly consists of three parts, including the main direction searching , the adaptive dictionary constructing , and the HR output estimating , where denotes the size of the LR input frame. That is

For the first term , we know from Algorithm 1 that searching each direction for every target patch needs filtering operations. Considering the fact that filtering a fixed-size image patch with size can surely be done in constant time , therefore

For the second term , we need to sweep over candidate patches around each target LR patch for searching atoms. Similarly, since the normalization and inner product computing can also be finished in constant time , thus

In the above expression, represents the time consumption of selecting the top largest elements from vector, where this task can be simply implemented by a fast ordering algorithm with time complexity , and this leads to

For the last term , the time consumption is mainly determined by the computation of the inverse matrices and . For the reason that the size of , , , and are fixed and indifferent to , thus these operations can also be done in constant time . We have

Plugging Equations (10), (12), and (13) into (9), we obtain

The equation above means that the computational complexity of our proposed interpolation algorithm is proportional to the pixel number () of the LR input frame.

For color video sequence interpolation, the YUV color model can be considered: we start by splitting the input color frame into luminance channel and chrominance channel and then enhance each channel using the proposed algorithm and classic bicubic interpolation, respectively. The final resolution enhanced frame can be obtained by converting these channels back to RGB color space. The diagram is shown in Figure 8.

4. Experimental Results

In this section, several experimental results of the proposed resolution enhancement algorithm are reported to show the performance and compared with the widely used bicubic interpolation method, in terms of subjective image quality and objective PSNR index. The LR input image/video frame is generated by directly decimating the original HR one by a factor of in each axis and then interpolated back to the original size for performance evaluation. The chosen parameters are as follows: , , , , , , , and , the candidate angle set for main direction searching is , and the width of the directional controllable steerable filter is 5 (with Gaussian kernel standard deviation ). According to our tests, performing a interpolation for a single frame costs about 2.3 seconds on Intel Core i7 8750H with 6 cores at 3.9 GHz, Windows 64 bit, Matlab 2017b, accelerated by C-MEX interface in typical settings of and . Using a GPU-accelerated architecture (CUDA or OpenCL) may be helpful to reduce computation time extremely, we shall study this in future research.

Figures 912 present the resolution enhancement results on test still images leaves, airplane, butterfly, and peppers, with factor and . Figures 13 and 14 further show the interpolation results of test video sequences foreman and ice, with reference frame number . From these figures, we see that the proposed algorithm works very well in reconstructing image contours and fine details, with few noticeable staircase artifacts in tiny structures, when compared to the bicubic interpolation method which produces a large amount of aliasing in edges and textures, and thus, the performance is very poor. Moreover, Figure 15 also gives the objective quality evaluation of foreman and ice for the first 50 frames. As expected, our method achieves satisfying PSNR values (with about 2 dBs higher than bicubic on average); this is consistent with the subjective visual quality shown above.

5. Conclusion

In this paper, we present an effective algorithm for enhancing digital video/still image resolution based on the directional regularization and nonlocal self-similarity structure, where the missing pixels of an image patch can be estimated from its nonlocal neighbors via an adaptive directional filtering operation. The appeal of this work is its simplicity, with no requirement of solving complex optimization equations, and is easily implemented. Experimental results show that the proposed algorithm can effectively improve the digital video quality in terms of clarity and resolution and thus will be of great value in theory and application.

Data Availability

Please contact the first author ([email protected]) to obtain the Matlab demo codes.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work is supported by the National Natural Science Foundation of China (No. 62071001), the Anhui Natural Science Foundation of China (Nos. 2008085MF192 and 2008085MF183), the Key Science Project of Anhui Education Department of China (Nos. KJ2018A0012, KJ2019A0023, and KJ2019A0022), and the CERNET Innovation Project of China (Nos. NGII20180612, NGII20180312, and NGII20180624).