Learning to predict the quality of distorted-then-compressed images via a deep neural network,☆☆

https://doi.org/10.1016/j.jvcir.2020.103004Get rights and content

Highlights

  • A stand-alone NR-IQA model is proposed for the distorted-then-compressed scenario.

  • It is straightforward to accommodate hybrid distortions with easily-accessible data.

  • A self-collected set bypasses the expensive and time-consuming subjective testing.

  • The proposed model can be trained and make predictions in an opinion-unaware manner.

  • The two-step methodology is realized in a unified framework without other metrics.

Abstract

Being captured by amateur photographers, reciprocally propagated through multimedia pipelines, and compressed with different levels, real-world images usually suffer from a wide variety of hybrid distortions. Faced with this scenario, full-reference (FR) image quality assessment (IQA) algorithms can not deliver promising predictions due to the inferior references. Meanwhile, existing no-reference (NR) IQA algorithms remain limited in their efficacy to deal with different distortion types. To address this obstacle, we explore a NR-IQA metric by predicting the perceptual quality of distorted-then-compressed images using a deep neural network (DNN). First, we propose a novel two-stream DNN to handle both authentic distortions and synthetic compressions and adopt effective strategies to pre-train the two branches of the network. Specifically, we transfer the knowledge learned from in-the-wild images to account for authentic distortions by utilizing a pre-trained deep convolutional neural network (CNN) to provide meaningful initializations. Meanwhile, we build a CNN for synthetic compressions and pre-train it on a dataset including synthetic compressed images. Subsequently, we bilinearly pool these two sets of features as the image representation. The overall network is fine-tuned on an elaborately-designed auxiliary dataset, which is annotated by a reliable objective quality metric. Furthermore, we integrate the output of the authentic-distortion-aware branch with that of the overall network following a two-step prediction manner to boost the prediction performance, which can be applied in the distorted-then-compressed scenario when the reference image is available. Extensive experimental results on several databases especially on the LIVE Wild Compressed Picture Quality Database show that the proposed method achieves state-of-the-art performance with good generalizability and moderate computational complexity.

Introduction

Overall mobile data traffic is expected to grow to 77 exabytes per month by 2022, a sevenfold increase over 2017 [1]. Online service providers like Netflix, YouTube, and Facebook are ingrained in the fabric of people’s life. Enormous visual signals are making their way of acquisition, transmission, and storage to the end-users through mobile devices, high definition television (HDTV), large photo-centric social networking websites, etc. With such a great load of images, quality degradation and perceptual information loss by ubiquitous distortions inevitably exist in these image communication and processing systems, which may exhibit a certain level of annoyance in the viewing experience. However, assessing the perceptual quality merely by humans is impractical for its time-consuming organization, troublesome criterion formulation, and expensive costs. Hence, objective image quality assessment, which measures perceived visual quality automatically by mimicking human perception, has become an effective substitution of subjective methods and is desirable in a wide range of computer vision applications.

According to the degree of dependence on the original information, objective IQA metrics can be generally classified into three mainstream categories: FR, reduced-reference (RR), and NR [2], [3], [4]. With access to high quality reference images like LIVE [5], CSIQ [6] and TID2013 [7] databases, prevalent FR models [8], [9], [10] perform reliably in measuring the perceptual quality of the distorted images. Nonetheless, FR approaches have limited use in most real-world scenarios [11] where neither full nor partial prior knowledge of high-quality references is available. In contrast, NR models do not rely on source information to assess the perceptual quality of images and hence attract a significant amount of research interests.

Particularly, there is a common scenario where hundreds of billions of user-generated images with imperfect quality (pre-distorted images) are continually uploaded onto social media and subsequently compressed (re-compressed images). Previously, we are all aware that the digitization, compression, storage, transmission, and display processes introduce modifications to the original image. But little attention has been paid to the acquisition stage which is affected by several factors such as lens limitations, aperture, lighting, and noise sensitivity. In addition, casual, inexpert users with unstable eyes and hands also produce large numbers of digital pictures with annoying artifacts during acquisition. Encountering these reference images with imperfect perceptual quality and their compressed derivatives, FR algorithms may present inferior performance since the majority of them are targeted on high-quality references. Although NR algorithms can make predictions without referring to the source images, few of them are capable of handling both authentic distortions and synthetic compressions effectively. What’s more, existing NR-IQA solutions suffer from many limitations when used in distorted-then-compressed image quality assessment. On the one hand, traditional NR-IQA methods mainly adapt their features to handle simulated distortions, instead of the authentic distortions. Specifically, the perceived quality of distorted-then-compressed images depends not only on the introduced compression but also on the process of photography. For example, the distortion type identification sub-network of MEON [12] tends to be helpless towards complicated authentic distortions. On the other hand, although existing general-purpose NR-IQA methods can be used to evaluate the perceptual quality of distorted-then-compressed images, their predictions are less reliable in terms of consistency with the corresponding subjective records. Since some underlying assumptions of them are flawed (such as the Gaussian assumption in NIQE [13] and GM-LOG [14]), which may not be applicative for the distorted-then-compressed scenario.

To highlight the above problem, a high-quality and a low-quality reference along with their associated compressed versions sampled from the LIVE Wild Compressed Picture Quality Database (LIVE Compressed) [15] are presented in Fig. 1. These images are displayed with their subjective mean opinion scores (MOS) along with the predicted quality scores by several objective IQA models, where we use MS-SSIM [8] (in a range of [0, 1], higher values indicate better quality) as an exemplar FR module, NIQE, and DB-CNN [16] (both in a range of [0, 100], higher values indicate worse quality) as two exemplar NR modules. We observe that the MS-SSIM, NIQE, and DB-CNN values are all in monotonic agreement with MOSs in Fig. 1(a)-(c) or Fig. 1(d)-(e), which indicates that these IQA models deal with this scenario well. Nevertheless, taking a cross-comparison between Fig. 1(d)-(e), and Fig. 1(e), we find the disagreement that a lower MOS corresponds to a higher MS-SSIM score. Indeed, the ground truths strongly indicate that the perceptual quality of Fig. 1(b) is better than that of Fig. 1(e). Similar observations can be captured from Fig. 1(e) and Fig. 1(f), where the NIQE model also becomes unreliable as MS-SSIM. DB-CNN is no exception in Fig. 1(f), and Fig. 1(d). The reason for these inconsistencies is that most existing general-purpose IQA metrics are designed for images with either simulated or authentic distortions, hence it is troublesome for them to reliably predict the perceptual quality of distorted-then-compressed images. This practical dilemma motivates us to seek fundamental solutions to accurately predict the quality of images.

In this paper, we devise an end-to-end learnable framework for this authentically-distorted then synthetically-compressed scenario. On the one hand, we employ a deep residual neural network [17] initialized on ImageNet [18] and further pre-trained on the IQA database KonIQ-10 k [19] to account for authentic distortions, which serves as a good initialization since it has met enormous realistic images with different quality during the transfer learning process. On the other hand, similar to previous work [12], [16], we establish a large-scale synthetic training set based on the Waterloo Exploration database [20] and KADIS-700 k dataset [21], which takes the compression levels into account. We then establish and pre-train a CNN through a multi-class classification task to identify the severity of synthetic compressions. Next, two activations from the authentic-distortion-aware and synthetic-compression-aware branches are bilinearly pooled to obtain the final image representation. Furthermore, when assessing the ultimately compressed images, we fine-tune the overall architecture from an elaborately-designed auxiliary dataset for a cross-database test to demonstrate the robustness of the proposed algorithm. Last but not the least, we combine the products of the authentic-distortion-aware pipeline on the reference image and the overall architecture on the compressed image to predict the quality in a two-step fashion. Assisted with the above-mentioned strategies, extensive experiments on the distorted-then-compressed scenario show standout efficiency by producing more reliable objective prediction scores against state-of-the-art IQA algorithms. The main contributions of this work can be summarized as follows:

  • We propose a new stand-alone NR-IQA model for the distorted-then-compressed scenario, without using the corresponding authentically-distorted reference images. It is straightforward to pre-train, and fine-tune the proposed network to accommodate hybrid distortions in this scenario with easily-accessible training data.

  • We annotate a large-scale self-collected distorted-then-compressed training set with proxy quality scores, which effectively bypass the expensive and time-consuming subjective testing. With this strategy, our model can be trained and make predictions in an opinion-unaware manner, which significantly improves the generalization of our method.

  • We implement the two-step methodology in a unified framework to further boost the performance, which does not rely on any other FR-IQA or NR-IQA metrics.

The remainder of this paper is organized as follows. In Section 2 we give an overview of related metrics including distortion-related NR-IQA methods and the distorted-then-compressed scenario. Section 3 describes the proposed method in detail. Section 4 exhibits the experimental results, corresponding analysis, and comparisons to related algorithms on mainstream IQA databases. Section 5 concludes the paper.

Section snippets

Related work

In this section, we briefly review the existing closely-related literatures, focusing on distortion-related NR-IQA models and the distorted-then-compressed scenario.

Motivation and framework

Being confronted with the challenge of assessing the quality of authentically-distorted then synthetically-compressed images, our motivation lies in exploiting visual features that are sensitive to both authentic and synthetic distortions and thus can be used to measure the perceptual quality of images in this scenario.

To handle the distortion-then-compression IQA problem, we propose a Two-Stream network for both Authentic and Synthetic distortions (TSAS), which is displayed in Fig. 2. First, a

Description of databases

To validate the effectiveness of the proposed IQA models, we conduct experiments on the LIVE Compressed database [15], which is an authentically-distorted then synthetically-compressed IQA database, also described as distorted-then-compressed database. It contains 80 authentically-distorted reference images selected from the LIVE Challenge database [11] and 320 compressed images generated from the 80 reference images with JPEG compression at four degradation levels. MOS in the range [0, 100]

Conclusion

Real-world reference images may already be corrupted with authentic distortions during the acquisition stage and then be transmitted with different compressions to the end-users through multimedia pipelines. We introduce a two-stream CNN architecture for both authentic distortions and synthetic compressions, yielding a NR-IQA metric (TSAS) towards this scenario. We also develop a two-step framework (TSASf) that can effectively predict the perceptual quality of realistic

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Bowen Li received the B.E. degree from the Electronic Information School, Jiangsu University of Science and Technology, Zhenjiang, China, in 2015 and the M.S. degree from the Electronic Information School, Wuhan University, Wuhan, China, in 2018.

He is currently pursuing the Ph.D. degree with Wuhan University. His research interests include image and video quality/aesthetics assessment, image enhancement, and image recognition.

References (78)

  • Cisco Visual Networking Index, Global mobile data traffic forecast update, 2017–2022. Cisco: San Jose, CA, USA,...
  • Shahrukh Athar et al.

    A comprehensive performance evaluation of image quality assessment algorithms

    IEEE Access

    (2019)
  • Vinit Jakhetiya et al.

    A highly efficient blind image quality assessment metric of 3d-synthesized images using outlier detection

    IEEE Trans. Ind. Inform.

    (2019)
  • Ke Gu et al.

    Learning a no-reference quality assessment model of enhanced images with big data

    IEEE Trans. Neural Netw. Learn. Syst.

    (2018)
  • Hamid R. Sheikh et al.

    A statistical evaluation of recent full reference image quality assessment algorithms

    IEEE Trans. Image Process.

    (2006)
  • Eric Cooper Larson, Damon Michael Chandler, Most apparent distortion: full-reference image quality assessment and the...
  • Nikolay Ponomarenko, Lina Jin, Oleg Ieremeiev, Vladimir Lukin, Karen Egiazarian, Jaakko Astola, Benoit Vozel, Kacem...
  • Zhou Wang, Eero P Simoncelli, Alan C. Bovik, Multiscale structural similarity for image quality assessment, in: Conf....
  • Lin Zhang et al.

    FSIM: A feature similarity index for image quality assessment

    IEEE Trans. Image Process.

    (2011)
  • Lin Zhang et al.

    VSI: A visual saliency-induced index for perceptual image quality assessment

    IEEE Trans. Image Process.

    (2014)
  • Deepti Ghadiyaram et al.

    Massive online crowdsourced study of subjective and objective picture quality

    IEEE Trans. Image Process.

    (2016)
  • Kede Ma et al.

    End-to-end blind image quality assessment using deep neural networks

    IEEE Trans. Image Process.

    (Mar. 2018)
  • Anish Mittal et al.

    Making a ‘completely blind’ image quality analyzer

    IEEE Signal Process. Lett.

    (2013)
  • Wufeng Xue et al.

    Blind image quality assessment using joint statistics of gradient magnitude and Laplacian features

    IEEE Trans. Image Process.

    (2014)
  • Yu. Xiangxu et al.

    Predicting the quality of images compressed after distortion in two steps

    IEEE Trans. Image Process.

    (2019)
  • Weixia Zhang et al.

    Blind image quality assessment using a deep bilinear convolutional neural network

    IEEE Trans. Circuits Syst. Video Technol.

    (2020)
  • Kaiming He et al.

    Deep residual learning for image recognition

  • Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, Fei-Fei Li, Imagenet: A large-scale hierarchical image database,...
  • Hanhe Lin, Vlad Hosu, Dietmar Saupe, KonIQ-10K: Towards an ecologically valid and large-scale iqa database, arXiv...
  • Kede Ma et al.

    Waterloo Exploration Database: New challenges for image quality assessment models

    IEEE Trans. Image Process.

    (2017)
  • Hanhe Lin et al.

    Kadid-10k: A large-scale artificially distorted iqa database

  • Zhou Wang, Hamid R. Sheikh, Alan C. Bovik, No-reference perceptual quality assessment of jpeg compressed images, in:...
  • Hamid R. Sheikh, Alan C. Bovik, Lawrence Cormack, No-reference quality assessment using natural scene statistics: Jpeg...
  • Nabil G. Sadaka, Lina J. Karam, Rony Ferzli, Glen P. Abousleman, A no-reference perceptual image sharpness metric based...
  • Rony Ferzli et al.

    A no-reference objective image sharpness metric based on the notion of just noticeable blur (JNB)

    IEEE Trans. Image Process.

    (2009)
  • Ke Gu, Guangtao Zhai, Xiaokang Yang, Wenjun Zhang, Min Liu, Subjective and objective quality assessment for images with...
  • Gu. Ke et al.

    The analysis of image contrast: From quality assessment to automatic enhancement

    IEEE T. Cybern.

    (2016)
  • Dinesh Jayaraman, Anish Mittal, Anush K. Moorthy, Alan C. Bovik, Objective quality assessment of multiply distorted...
  • Ke Gu, Guangtao Zhai, Min Liu, Xiaokang Yang, Wenjun Zhang, Xianghui Sun, Wanhong Chen, Ying Zuo, FISBLIM: A FIve-Step...
  • Gu. Ke et al.

    Hybrid no-reference quality metric for singly and multiply distorted images

    IEEE Trans. Broadcast.

    (Sep. 2014)
  • Ke Gu et al.

    Using free energy principle for blind image quality assessment

    IEEE Trans. Multimedia

    (Jan. 2015)
  • Qiaohong Li et al.

    No-reference quality assessment for multiply-distorted images in gradient domain

    IEEE Signal Process. Lett.

    (Apr. 2016)
  • Le Kang et al.

    Simultaneous estimation of image quality and distortion via multi-task convolutional neural networks

  • Jongyoo Kim et al.

    Deep convolutional neural models for picture-quality prediction: Challenges and solutions to data-driven image quality assessment

    IEEE Signal Process. Mag.

    (2017)
  • Simone Bianco et al.

    On the use of deep learning for blind image quality assessment

    Signal, Image, Video Process. (SIViP)

    (2018)
  • Bolei Zhou et al.

    Learning deep features for scene recognition using places database

  • Xialei Liu, Joost van de Weijer, Andrew D. Bagdanov, RankIQA: Learning from rankings for no-reference image quality...
  • Kede Ma et al.

    dipIQ: Blind image quality assessment by learning-to-rank discriminable image pairs

    IEEE Trans. Image Process.

    (Aug. 2017)
  • Weixia Zhang et al.

    Learning to blindly assess image quality in the laboratory and wild

  • Cited by (6)

    Bowen Li received the B.E. degree from the Electronic Information School, Jiangsu University of Science and Technology, Zhenjiang, China, in 2015 and the M.S. degree from the Electronic Information School, Wuhan University, Wuhan, China, in 2018.

    He is currently pursuing the Ph.D. degree with Wuhan University. His research interests include image and video quality/aesthetics assessment, image enhancement, and image recognition.

    Meng Tian received the B.S. and Ph.D. degrees from the Electronic Information School, Wuhan University, Wuhan, China, in 2011 and 2016, respectively. He holds a Postdoctoral position with Wuhan University and a Visiting Scholar with Southern Methodist University.

    His research interests include security of cyber physical power systems and cascading failures of multilayer networks.

    Weixia Zhang received the B.E. degree from the Wuhan University, Wuhan, China, in 2011 and the M.S. degree in electrical and computer engineering from the University of Rochester, NY, USA, in 2013. He then received the Ph.D. degree from the Wuhan University, Wuhan, China, in 2018.

    He is currently a Postdoctoral Fellow with the Artificial Intelligence Institute, Shanghai Jiao Tong University. His research interests include perceptual image processing and computer vision.

    Hongtai Yao received the B.S. degree from the School of Mathematics and Statistics, Zhengzhou University, Zhengzhou, China, in 2014, and the M.S. degree from the School of Mathematics and Statistics, Henan University, Kaifeng, in 2018. He is currently pursuing the Ph.D. degree with Wuhan University.

    His research interests include remote sensing image segmentation and computer vision.

    Xianpei Wang received the B.S. degree from North China Electric Power University, in 1984, and the M.S. and Ph.D. degrees from Wuhan University, in 1991 and 1999, respectively.

    He is currently a Professor with the Electronic Information School, Wuhan University. He has authored or co-authored over 100 papers in international and domestic journals. His research interests include computer vision, intelligent monitoring technique for power system, system reliability analysis, and fault diagnosis of high-voltage equipment.

    This paper has been recommended for acceptance by Zicheng Liu.

    ☆☆

    This document is the results of the research project funded in part by the National Natural Science Foundation of China under Grant 61901262 and Grant 51707135.

    View full text