Learning to predict the quality of distorted-then-compressed images via a deep neural network☆,☆☆
Introduction
Overall mobile data traffic is expected to grow to 77 exabytes per month by 2022, a sevenfold increase over 2017 [1]. Online service providers like Netflix, YouTube, and Facebook are ingrained in the fabric of people’s life. Enormous visual signals are making their way of acquisition, transmission, and storage to the end-users through mobile devices, high definition television (HDTV), large photo-centric social networking websites, etc. With such a great load of images, quality degradation and perceptual information loss by ubiquitous distortions inevitably exist in these image communication and processing systems, which may exhibit a certain level of annoyance in the viewing experience. However, assessing the perceptual quality merely by humans is impractical for its time-consuming organization, troublesome criterion formulation, and expensive costs. Hence, objective image quality assessment, which measures perceived visual quality automatically by mimicking human perception, has become an effective substitution of subjective methods and is desirable in a wide range of computer vision applications.
According to the degree of dependence on the original information, objective IQA metrics can be generally classified into three mainstream categories: FR, reduced-reference (RR), and NR [2], [3], [4]. With access to high quality reference images like LIVE [5], CSIQ [6] and TID2013 [7] databases, prevalent FR models [8], [9], [10] perform reliably in measuring the perceptual quality of the distorted images. Nonetheless, FR approaches have limited use in most real-world scenarios [11] where neither full nor partial prior knowledge of high-quality references is available. In contrast, NR models do not rely on source information to assess the perceptual quality of images and hence attract a significant amount of research interests.
Particularly, there is a common scenario where hundreds of billions of user-generated images with imperfect quality (pre-distorted images) are continually uploaded onto social media and subsequently compressed (re-compressed images). Previously, we are all aware that the digitization, compression, storage, transmission, and display processes introduce modifications to the original image. But little attention has been paid to the acquisition stage which is affected by several factors such as lens limitations, aperture, lighting, and noise sensitivity. In addition, casual, inexpert users with unstable eyes and hands also produce large numbers of digital pictures with annoying artifacts during acquisition. Encountering these reference images with imperfect perceptual quality and their compressed derivatives, FR algorithms may present inferior performance since the majority of them are targeted on high-quality references. Although NR algorithms can make predictions without referring to the source images, few of them are capable of handling both authentic distortions and synthetic compressions effectively. What’s more, existing NR-IQA solutions suffer from many limitations when used in distorted-then-compressed image quality assessment. On the one hand, traditional NR-IQA methods mainly adapt their features to handle simulated distortions, instead of the authentic distortions. Specifically, the perceived quality of distorted-then-compressed images depends not only on the introduced compression but also on the process of photography. For example, the distortion type identification sub-network of MEON [12] tends to be helpless towards complicated authentic distortions. On the other hand, although existing general-purpose NR-IQA methods can be used to evaluate the perceptual quality of distorted-then-compressed images, their predictions are less reliable in terms of consistency with the corresponding subjective records. Since some underlying assumptions of them are flawed (such as the Gaussian assumption in NIQE [13] and GM-LOG [14]), which may not be applicative for the distorted-then-compressed scenario.
To highlight the above problem, a high-quality and a low-quality reference along with their associated compressed versions sampled from the LIVE Wild Compressed Picture Quality Database (LIVE Compressed) [15] are presented in Fig. 1. These images are displayed with their subjective mean opinion scores (MOS) along with the predicted quality scores by several objective IQA models, where we use MS-SSIM [8] (in a range of [0, 1], higher values indicate better quality) as an exemplar FR module, NIQE, and DB-CNN [16] (both in a range of [0, 100], higher values indicate worse quality) as two exemplar NR modules. We observe that the MS-SSIM, NIQE, and DB-CNN values are all in monotonic agreement with MOSs in Fig. 1(a)-(c) or Fig. 1(d)-(e), which indicates that these IQA models deal with this scenario well. Nevertheless, taking a cross-comparison between Fig. 1(d)-(e), and Fig. 1(e), we find the disagreement that a lower MOS corresponds to a higher MS-SSIM score. Indeed, the ground truths strongly indicate that the perceptual quality of Fig. 1(b) is better than that of Fig. 1(e). Similar observations can be captured from Fig. 1(e) and Fig. 1(f), where the NIQE model also becomes unreliable as MS-SSIM. DB-CNN is no exception in Fig. 1(f), and Fig. 1(d). The reason for these inconsistencies is that most existing general-purpose IQA metrics are designed for images with either simulated or authentic distortions, hence it is troublesome for them to reliably predict the perceptual quality of distorted-then-compressed images. This practical dilemma motivates us to seek fundamental solutions to accurately predict the quality of images.
In this paper, we devise an end-to-end learnable framework for this authentically-distorted then synthetically-compressed scenario. On the one hand, we employ a deep residual neural network [17] initialized on ImageNet [18] and further pre-trained on the IQA database KonIQ-10 k [19] to account for authentic distortions, which serves as a good initialization since it has met enormous realistic images with different quality during the transfer learning process. On the other hand, similar to previous work [12], [16], we establish a large-scale synthetic training set based on the Waterloo Exploration database [20] and KADIS-700 k dataset [21], which takes the compression levels into account. We then establish and pre-train a CNN through a multi-class classification task to identify the severity of synthetic compressions. Next, two activations from the authentic-distortion-aware and synthetic-compression-aware branches are bilinearly pooled to obtain the final image representation. Furthermore, when assessing the ultimately compressed images, we fine-tune the overall architecture from an elaborately-designed auxiliary dataset for a cross-database test to demonstrate the robustness of the proposed algorithm. Last but not the least, we combine the products of the authentic-distortion-aware pipeline on the reference image and the overall architecture on the compressed image to predict the quality in a two-step fashion. Assisted with the above-mentioned strategies, extensive experiments on the distorted-then-compressed scenario show standout efficiency by producing more reliable objective prediction scores against state-of-the-art IQA algorithms. The main contributions of this work can be summarized as follows:
- •
We propose a new stand-alone NR-IQA model for the distorted-then-compressed scenario, without using the corresponding authentically-distorted reference images. It is straightforward to pre-train, and fine-tune the proposed network to accommodate hybrid distortions in this scenario with easily-accessible training data.
- •
We annotate a large-scale self-collected distorted-then-compressed training set with proxy quality scores, which effectively bypass the expensive and time-consuming subjective testing. With this strategy, our model can be trained and make predictions in an opinion-unaware manner, which significantly improves the generalization of our method.
- •
We implement the two-step methodology in a unified framework to further boost the performance, which does not rely on any other FR-IQA or NR-IQA metrics.
The remainder of this paper is organized as follows. In Section 2 we give an overview of related metrics including distortion-related NR-IQA methods and the distorted-then-compressed scenario. Section 3 describes the proposed method in detail. Section 4 exhibits the experimental results, corresponding analysis, and comparisons to related algorithms on mainstream IQA databases. Section 5 concludes the paper.
Section snippets
Related work
In this section, we briefly review the existing closely-related literatures, focusing on distortion-related NR-IQA models and the distorted-then-compressed scenario.
Motivation and framework
Being confronted with the challenge of assessing the quality of authentically-distorted then synthetically-compressed images, our motivation lies in exploiting visual features that are sensitive to both authentic and synthetic distortions and thus can be used to measure the perceptual quality of images in this scenario.
To handle the distortion-then-compression IQA problem, we propose a Two-Stream network for both Authentic and Synthetic distortions (TSAS), which is displayed in Fig. 2. First, a
Description of databases
To validate the effectiveness of the proposed IQA models, we conduct experiments on the LIVE Compressed database [15], which is an authentically-distorted then synthetically-compressed IQA database, also described as distorted-then-compressed database. It contains 80 authentically-distorted reference images selected from the LIVE Challenge database [11] and 320 compressed images generated from the 80 reference images with JPEG compression at four degradation levels. MOS in the range [0, 100]
Conclusion
Real-world reference images may already be corrupted with authentic distortions during the acquisition stage and then be transmitted with different compressions to the end-users through multimedia pipelines. We introduce a two-stream CNN architecture for both authentic distortions and synthetic compressions, yielding a NR-IQA metric (TSAS) towards this scenario. We also develop a two-step framework () that can effectively predict the perceptual quality of realistic
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Bowen Li received the B.E. degree from the Electronic Information School, Jiangsu University of Science and Technology, Zhenjiang, China, in 2015 and the M.S. degree from the Electronic Information School, Wuhan University, Wuhan, China, in 2018.
He is currently pursuing the Ph.D. degree with Wuhan University. His research interests include image and video quality/aesthetics assessment, image enhancement, and image recognition.
References (78)
- Cisco Visual Networking Index, Global mobile data traffic forecast update, 2017–2022. Cisco: San Jose, CA, USA,...
- et al.
A comprehensive performance evaluation of image quality assessment algorithms
IEEE Access
(2019) - et al.
A highly efficient blind image quality assessment metric of 3d-synthesized images using outlier detection
IEEE Trans. Ind. Inform.
(2019) - et al.
Learning a no-reference quality assessment model of enhanced images with big data
IEEE Trans. Neural Netw. Learn. Syst.
(2018) - et al.
A statistical evaluation of recent full reference image quality assessment algorithms
IEEE Trans. Image Process.
(2006) - Eric Cooper Larson, Damon Michael Chandler, Most apparent distortion: full-reference image quality assessment and the...
- Nikolay Ponomarenko, Lina Jin, Oleg Ieremeiev, Vladimir Lukin, Karen Egiazarian, Jaakko Astola, Benoit Vozel, Kacem...
- Zhou Wang, Eero P Simoncelli, Alan C. Bovik, Multiscale structural similarity for image quality assessment, in: Conf....
- et al.
FSIM: A feature similarity index for image quality assessment
IEEE Trans. Image Process.
(2011) - et al.
VSI: A visual saliency-induced index for perceptual image quality assessment
IEEE Trans. Image Process.
(2014)
Massive online crowdsourced study of subjective and objective picture quality
IEEE Trans. Image Process.
End-to-end blind image quality assessment using deep neural networks
IEEE Trans. Image Process.
Making a ‘completely blind’ image quality analyzer
IEEE Signal Process. Lett.
Blind image quality assessment using joint statistics of gradient magnitude and Laplacian features
IEEE Trans. Image Process.
Predicting the quality of images compressed after distortion in two steps
IEEE Trans. Image Process.
Blind image quality assessment using a deep bilinear convolutional neural network
IEEE Trans. Circuits Syst. Video Technol.
Deep residual learning for image recognition
Waterloo Exploration Database: New challenges for image quality assessment models
IEEE Trans. Image Process.
Kadid-10k: A large-scale artificially distorted iqa database
A no-reference objective image sharpness metric based on the notion of just noticeable blur (JNB)
IEEE Trans. Image Process.
The analysis of image contrast: From quality assessment to automatic enhancement
IEEE T. Cybern.
Hybrid no-reference quality metric for singly and multiply distorted images
IEEE Trans. Broadcast.
Using free energy principle for blind image quality assessment
IEEE Trans. Multimedia
No-reference quality assessment for multiply-distorted images in gradient domain
IEEE Signal Process. Lett.
Simultaneous estimation of image quality and distortion via multi-task convolutional neural networks
Deep convolutional neural models for picture-quality prediction: Challenges and solutions to data-driven image quality assessment
IEEE Signal Process. Mag.
On the use of deep learning for blind image quality assessment
Signal, Image, Video Process. (SIViP)
Learning deep features for scene recognition using places database
dipIQ: Blind image quality assessment by learning-to-rank discriminable image pairs
IEEE Trans. Image Process.
Learning to blindly assess image quality in the laboratory and wild
Cited by (6)
Recent advances in image dehazing: Formal analysis to automated approaches
2024, Information FusionDeep belief network for solving the image quality assessment in full reference and no reference model
2022, Neural Computing and ApplicationsBUILDING A MATHEMATICAL MODEL AND AN ALGORITHM FOR TRAINING A NEURAL NETWORK WITH SPARSE DIPOLE SYNAPTIC CONNECTIONS FOR IMAGE RECOGNITION
2021, Eastern-European Journal of Enterprise TechnologiesMapping and deep analysis of image dehazing: Coherent taxonomy, datasets, open challenges, motivations, and recommendations
2021, International Journal of Interactive Multimedia and Artificial Intelligence
Bowen Li received the B.E. degree from the Electronic Information School, Jiangsu University of Science and Technology, Zhenjiang, China, in 2015 and the M.S. degree from the Electronic Information School, Wuhan University, Wuhan, China, in 2018.
He is currently pursuing the Ph.D. degree with Wuhan University. His research interests include image and video quality/aesthetics assessment, image enhancement, and image recognition.
Meng Tian received the B.S. and Ph.D. degrees from the Electronic Information School, Wuhan University, Wuhan, China, in 2011 and 2016, respectively. He holds a Postdoctoral position with Wuhan University and a Visiting Scholar with Southern Methodist University.
His research interests include security of cyber physical power systems and cascading failures of multilayer networks.
Weixia Zhang received the B.E. degree from the Wuhan University, Wuhan, China, in 2011 and the M.S. degree in electrical and computer engineering from the University of Rochester, NY, USA, in 2013. He then received the Ph.D. degree from the Wuhan University, Wuhan, China, in 2018.
He is currently a Postdoctoral Fellow with the Artificial Intelligence Institute, Shanghai Jiao Tong University. His research interests include perceptual image processing and computer vision.
Hongtai Yao received the B.S. degree from the School of Mathematics and Statistics, Zhengzhou University, Zhengzhou, China, in 2014, and the M.S. degree from the School of Mathematics and Statistics, Henan University, Kaifeng, in 2018. He is currently pursuing the Ph.D. degree with Wuhan University.
His research interests include remote sensing image segmentation and computer vision.
Xianpei Wang received the B.S. degree from North China Electric Power University, in 1984, and the M.S. and Ph.D. degrees from Wuhan University, in 1991 and 1999, respectively.
He is currently a Professor with the Electronic Information School, Wuhan University. He has authored or co-authored over 100 papers in international and domestic journals. His research interests include computer vision, intelligent monitoring technique for power system, system reliability analysis, and fault diagnosis of high-voltage equipment.
- ☆
This paper has been recommended for acceptance by Zicheng Liu.