Learning to predict the quality of distorted-then-compressed images via a deep neural network

doi:10.1016/j.jvcir.2020.103004

Journal of Visual Communication and Image Representation

Volume 76, April 2021, 103004

https://doi.org/10.1016/j.jvcir.2020.103004 Get rights and content

Highlights

•
A stand-alone NR-IQA model is proposed for the distorted-then-compressed scenario.
•
It is straightforward to accommodate hybrid distortions with easily-accessible data.
•
A self-collected set bypasses the expensive and time-consuming subjective testing.
•
The proposed model can be trained and make predictions in an opinion-unaware manner.
•
The two-step methodology is realized in a unified framework without other metrics.

Abstract

Being captured by amateur photographers, reciprocally propagated through multimedia pipelines, and compressed with different levels, real-world images usually suffer from a wide variety of hybrid distortions. Faced with this scenario, full-reference (FR) image quality assessment (IQA) algorithms can not deliver promising predictions due to the inferior references. Meanwhile, existing no-reference (NR) IQA algorithms remain limited in their efficacy to deal with different distortion types. To address this obstacle, we explore a NR-IQA metric by predicting the perceptual quality of distorted-then-compressed images using a deep neural network (DNN). First, we propose a novel two-stream DNN to handle both authentic distortions and synthetic compressions and adopt effective strategies to pre-train the two branches of the network. Specifically, we transfer the knowledge learned from in-the-wild images to account for authentic distortions by utilizing a pre-trained deep convolutional neural network (CNN) to provide meaningful initializations. Meanwhile, we build a CNN for synthetic compressions and pre-train it on a dataset including synthetic compressed images. Subsequently, we bilinearly pool these two sets of features as the image representation. The overall network is fine-tuned on an elaborately-designed auxiliary dataset, which is annotated by a reliable objective quality metric. Furthermore, we integrate the output of the authentic-distortion-aware branch with that of the overall network following a two-step prediction manner to boost the prediction performance, which can be applied in the distorted-then-compressed scenario when the reference image is available. Extensive experimental results on several databases especially on the LIVE Wild Compressed Picture Quality Database show that the proposed method achieves state-of-the-art performance with good generalizability and moderate computational complexity.

Introduction

Overall mobile data traffic is expected to grow to 77 exabytes per month by 2022, a sevenfold increase over 2017 [1]. Online service providers like Netflix, YouTube, and Facebook are ingrained in the fabric of people’s life. Enormous visual signals are making their way of acquisition, transmission, and storage to the end-users through mobile devices, high definition television (HDTV), large photo-centric social networking websites, etc. With such a great load of images, quality degradation and perceptual information loss by ubiquitous distortions inevitably exist in these image communication and processing systems, which may exhibit a certain level of annoyance in the viewing experience. However, assessing the perceptual quality merely by humans is impractical for its time-consuming organization, troublesome criterion formulation, and expensive costs. Hence, objective image quality assessment, which measures perceived visual quality automatically by mimicking human perception, has become an effective substitution of subjective methods and is desirable in a wide range of computer vision applications.

According to the degree of dependence on the original information, objective IQA metrics can be generally classified into three mainstream categories: FR, reduced-reference (RR), and NR [2], [3], [4]. With access to high quality reference images like LIVE [5], CSIQ [6] and TID2013 [7] databases, prevalent FR models [8], [9], [10] perform reliably in measuring the perceptual quality of the distorted images. Nonetheless, FR approaches have limited use in most real-world scenarios [11] where neither full nor partial prior knowledge of high-quality references is available. In contrast, NR models do not rely on source information to assess the perceptual quality of images and hence attract a significant amount of research interests.

Particularly, there is a common scenario where hundreds of billions of user-generated images with imperfect quality (pre-distorted images) are continually uploaded onto social media and subsequently compressed (re-compressed images). Previously, we are all aware that the digitization, compression, storage, transmission, and display processes introduce modifications to the original image. But little attention has been paid to the acquisition stage which is affected by several factors such as lens limitations, aperture, lighting, and noise sensitivity. In addition, casual, inexpert users with unstable eyes and hands also produce large numbers of digital pictures with annoying artifacts during acquisition. Encountering these reference images with imperfect perceptual quality and their compressed derivatives, FR algorithms may present inferior performance since the majority of them are targeted on high-quality references. Although NR algorithms can make predictions without referring to the source images, few of them are capable of handling both authentic distortions and synthetic compressions effectively. What’s more, existing NR-IQA solutions suffer from many limitations when used in distorted-then-compressed image quality assessment. On the one hand, traditional NR-IQA methods mainly adapt their features to handle simulated distortions, instead of the authentic distortions. Specifically, the perceived quality of distorted-then-compressed images depends not only on the introduced compression but also on the process of photography. For example, the distortion type identification sub-network of MEON [12] tends to be helpless towards complicated authentic distortions. On the other hand, although existing general-purpose NR-IQA methods can be used to evaluate the perceptual quality of distorted-then-compressed images, their predictions are less reliable in terms of consistency with the corresponding subjective records. Since some underlying assumptions of them are flawed (such as the Gaussian assumption in NIQE [13] and GM-LOG [14]), which may not be applicative for the distorted-then-compressed scenario.

To highlight the above problem, a high-quality and a low-quality reference along with their associated compressed versions sampled from the LIVE Wild Compressed Picture Quality Database (LIVE Compressed) [15] are presented in Fig. 1. These images are displayed with their subjective mean opinion scores (MOS) along with the predicted quality scores by several objective IQA models, where we use MS-SSIM [8] (in a range of [0, 1], higher values indicate better quality) as an exemplar FR module, NIQE, and DB-CNN [16] (both in a range of [0, 100], higher values indicate worse quality) as two exemplar NR modules. We observe that the MS-SSIM, NIQE, and DB-CNN values are all in monotonic agreement with MOSs in Fig. 1(a)-(c) or Fig. 1(d)-(e), which indicates that these IQA models deal with this scenario well. Nevertheless, taking a cross-comparison between Fig. 1(d)-(e), and Fig. 1(e), we find the disagreement that a lower MOS corresponds to a higher MS-SSIM score. Indeed, the ground truths strongly indicate that the perceptual quality of Fig. 1(b) is better than that of Fig. 1(e). Similar observations can be captured from Fig. 1(e) and Fig. 1(f), where the NIQE model also becomes unreliable as MS-SSIM. DB-CNN is no exception in Fig. 1(f), and Fig. 1(d). The reason for these inconsistencies is that most existing general-purpose IQA metrics are designed for images with either simulated or authentic distortions, hence it is troublesome for them to reliably predict the perceptual quality of distorted-then-compressed images. This practical dilemma motivates us to seek fundamental solutions to accurately predict the quality of images.

In this paper, we devise an end-to-end learnable framework for this authentically-distorted then synthetically-compressed scenario. On the one hand, we employ a deep residual neural network [17] initialized on ImageNet [18] and further pre-trained on the IQA database KonIQ-10 k [19] to account for authentic distortions, which serves as a good initialization since it has met enormous realistic images with different quality during the transfer learning process. On the other hand, similar to previous work [12], [16], we establish a large-scale synthetic training set based on the Waterloo Exploration database [20] and KADIS-700 k dataset [21], which takes the compression levels into account. We then establish and pre-train a CNN through a multi-class classification task to identify the severity of synthetic compressions. Next, two activations from the authentic-distortion-aware and synthetic-compression-aware branches are bilinearly pooled to obtain the final image representation. Furthermore, when assessing the ultimately compressed images, we fine-tune the overall architecture from an elaborately-designed auxiliary dataset for a cross-database test to demonstrate the robustness of the proposed algorithm. Last but not the least, we combine the products of the authentic-distortion-aware pipeline on the reference image and the overall architecture on the compressed image to predict the quality in a two-step fashion. Assisted with the above-mentioned strategies, extensive experiments on the distorted-then-compressed scenario show standout efficiency by producing more reliable objective prediction scores against state-of-the-art IQA algorithms. The main contributions of this work can be summarized as follows:

•
We propose a new stand-alone NR-IQA model for the distorted-then-compressed scenario, without using the corresponding authentically-distorted reference images. It is straightforward to pre-train, and fine-tune the proposed network to accommodate hybrid distortions in this scenario with easily-accessible training data.
•
We annotate a large-scale self-collected distorted-then-compressed training set with proxy quality scores, which effectively bypass the expensive and time-consuming subjective testing. With this strategy, our model can be trained and make predictions in an opinion-unaware manner, which significantly improves the generalization of our method.
•
We implement the two-step methodology in a unified framework to further boost the performance, which does not rely on any other FR-IQA or NR-IQA metrics.

The remainder of this paper is organized as follows. In Section 2 we give an overview of related metrics including distortion-related NR-IQA methods and the distorted-then-compressed scenario. Section 3 describes the proposed method in detail. Section 4 exhibits the experimental results, corresponding analysis, and comparisons to related algorithms on mainstream IQA databases. Section 5 concludes the paper.

Section snippets

Related work

In this section, we briefly review the existing closely-related literatures, focusing on distortion-related NR-IQA models and the distorted-then-compressed scenario.

Motivation and framework

Being confronted with the challenge of assessing the quality of authentically-distorted then synthetically-compressed images, our motivation lies in exploiting visual features that are sensitive to both authentic and synthetic distortions and thus can be used to measure the perceptual quality of images in this scenario.

To handle the distortion-then-compression IQA problem, we propose a Two-Stream network for both Authentic and Synthetic distortions (TSAS), which is displayed in Fig. 2. First, a

Description of databases

To validate the effectiveness of the proposed IQA models, we conduct experiments on the LIVE Compressed database [15], which is an authentically-distorted then synthetically-compressed IQA database, also described as distorted-then-compressed database. It contains 80 authentically-distorted reference images selected from the LIVE Challenge database [11] and 320 compressed images generated from the 80 reference images with JPEG compression at four degradation levels. MOS in the range [0, 100]

Conclusion

Real-world reference images may already be corrupted with authentic distortions during the acquisition stage and then be transmitted with different compressions to the end-users through multimedia pipelines. We introduce a two-stream CNN architecture for both authentic distortions and synthetic compressions, yielding a NR-IQA metric (TSAS) towards this scenario. We also develop a two-step framework ( ${TSAS}_{f}$ ) that can effectively predict the perceptual quality of realistic

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Bowen Li received the B.E. degree from the Electronic Information School, Jiangsu University of Science and Technology, Zhenjiang, China, in 2015 and the M.S. degree from the Electronic Information School, Wuhan University, Wuhan, China, in 2018.

He is currently pursuing the Ph.D. degree with Wuhan University. His research interests include image and video quality/aesthetics assessment, image enhancement, and image recognition.

References (78)

Cisco Visual Networking Index, Global mobile data traffic forecast update, 2017–2022. Cisco: San Jose, CA, USA,...
Shahrukh Athar et al.
A comprehensive performance evaluation of image quality assessment algorithms
IEEE Access
(2019)
Vinit Jakhetiya et al.
A highly efficient blind image quality assessment metric of 3d-synthesized images using outlier detection
IEEE Trans. Ind. Inform.
(2019)
Ke Gu et al.
Learning a no-reference quality assessment model of enhanced images with big data
IEEE Trans. Neural Netw. Learn. Syst.
(2018)
Hamid R. Sheikh et al.
A statistical evaluation of recent full reference image quality assessment algorithms
IEEE Trans. Image Process.
(2006)
Eric Cooper Larson, Damon Michael Chandler, Most apparent distortion: full-reference image quality assessment and the...
Nikolay Ponomarenko, Lina Jin, Oleg Ieremeiev, Vladimir Lukin, Karen Egiazarian, Jaakko Astola, Benoit Vozel, Kacem...
Zhou Wang, Eero P Simoncelli, Alan C. Bovik, Multiscale structural similarity for image quality assessment, in: Conf....
Lin Zhang et al.
FSIM: A feature similarity index for image quality assessment
IEEE Trans. Image Process.
(2011)
Lin Zhang et al.
VSI: A visual saliency-induced index for perceptual image quality assessment
IEEE Trans. Image Process.
(2014)

Deepti Ghadiyaram et al.

Massive online crowdsourced study of subjective and objective picture quality

IEEE Trans. Image Process.

(2016)

Kede Ma et al.

End-to-end blind image quality assessment using deep neural networks

IEEE Trans. Image Process.

(Mar. 2018)

Anish Mittal et al.

Making a ‘completely blind’ image quality analyzer

IEEE Signal Process. Lett.

(2013)

Wufeng Xue et al.

Blind image quality assessment using joint statistics of gradient magnitude and Laplacian features

IEEE Trans. Image Process.

(2014)

Yu. Xiangxu et al.

Predicting the quality of images compressed after distortion in two steps

IEEE Trans. Image Process.

(2019)

Weixia Zhang et al.

Blind image quality assessment using a deep bilinear convolutional neural network

IEEE Trans. Circuits Syst. Video Technol.

(2020)

Kaiming He et al.

Deep residual learning for image recognition

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, Fei-Fei Li, Imagenet: A large-scale hierarchical image database,...

Hanhe Lin, Vlad Hosu, Dietmar Saupe, KonIQ-10K: Towards an ecologically valid and large-scale iqa database, arXiv...

Kede Ma et al.

Waterloo Exploration Database: New challenges for image quality assessment models

IEEE Trans. Image Process.

(2017)

Hanhe Lin et al.

Kadid-10k: A large-scale artificially distorted iqa database

Zhou Wang, Hamid R. Sheikh, Alan C. Bovik, No-reference perceptual quality assessment of jpeg compressed images, in:...

Hamid R. Sheikh, Alan C. Bovik, Lawrence Cormack, No-reference quality assessment using natural scene statistics: Jpeg...

Nabil G. Sadaka, Lina J. Karam, Rony Ferzli, Glen P. Abousleman, A no-reference perceptual image sharpness metric based...

Rony Ferzli et al.

A no-reference objective image sharpness metric based on the notion of just noticeable blur (JNB)

IEEE Trans. Image Process.

(2009)

Ke Gu, Guangtao Zhai, Xiaokang Yang, Wenjun Zhang, Min Liu, Subjective and objective quality assessment for images with...

Gu. Ke et al.

The analysis of image contrast: From quality assessment to automatic enhancement

IEEE T. Cybern.

(2016)

Dinesh Jayaraman, Anish Mittal, Anush K. Moorthy, Alan C. Bovik, Objective quality assessment of multiply distorted...

Ke Gu, Guangtao Zhai, Min Liu, Xiaokang Yang, Wenjun Zhang, Xianghui Sun, Wanhong Chen, Ying Zuo, FISBLIM: A FIve-Step...

Gu. Ke et al.

Hybrid no-reference quality metric for singly and multiply distorted images

IEEE Trans. Broadcast.

(Sep. 2014)

Ke Gu et al.

Using free energy principle for blind image quality assessment

IEEE Trans. Multimedia

(Jan. 2015)

Qiaohong Li et al.

No-reference quality assessment for multiply-distorted images in gradient domain

IEEE Signal Process. Lett.

(Apr. 2016)

Le Kang et al.

Simultaneous estimation of image quality and distortion via multi-task convolutional neural networks

Jongyoo Kim et al.

Deep convolutional neural models for picture-quality prediction: Challenges and solutions to data-driven image quality assessment

IEEE Signal Process. Mag.

(2017)

Simone Bianco et al.

On the use of deep learning for blind image quality assessment

Signal, Image, Video Process. (SIViP)

(2018)

Bolei Zhou et al.

Learning deep features for scene recognition using places database

Xialei Liu, Joost van de Weijer, Andrew D. Bagdanov, RankIQA: Learning from rankings for no-reference image quality...

Kede Ma et al.

dipIQ: Blind image quality assessment by learning-to-rank discriminable image pairs

IEEE Trans. Image Process.

(Aug. 2017)

Weixia Zhang et al.

Learning to blindly assess image quality in the laboratory and wild

Cited by (6)

Recent advances in image dehazing: Formal analysis to automated approaches
2024, Information Fusion
Images captured in hazy environments need to be processed to increase their contrast and colour integrity. Dehazing, sometimes referred to as haze removal is an important pre-processing step for image processing and computer vision applications. Numerous methods for dehazing images have been proposed in the literature. This study provides a complete evaluation of numerous image dehazing techniques and their notable standards. This study extracted and presented an important direction of the many algorithms to handle the challenges of dehazing such as model-based methods, transform domain methods, variational-based algorithms, learning-based algorithm and transformer-based algorithms. This study attempts to compile and evaluate the most important studies in the domain of image dehazing. A variety of factors have been considered necessary to provide a detailed information in this study. These factors include datasets that utilised in the literature, challenges faced by the prior researchers, motivations, and recommendations for reducing the drawbacks in the available literature. The systematic rules are utilized for searching all relevant papers on image dehazing using several keyword diversities along with a glance for assessment and the benchmark studies. Image dehazing, which generally eliminates undesirable pictographic effects is often considered as an image enhancement method. A completely automated process, a valid assessment strategy, and databases based on diverse settings are needed for it to operate under real-time applications. Many relevant studies are conducted in order to achieve these substantial goals. We examined numerous image dehazing methods and assessed the objectivity of the results. The results of our study precisely reflect numerous observations on image dehazing regions in contrast to other review articles. We believe that the findings of the study can be a helpful set of recommendations for professionals looking for a full understanding of image dehazing. In addition, we present a thorough examination of the methods used to evaluate image quality using the full-reference category and the no-reference category. Several standard evaluation metrics are utilised to compare the results of the well-known dehazing techniques. A list of standard haze image datasets is also included in this study so that different dehazing methods can be compared. In our opinion, the findings of this study could act as a valuable guide for experts in quest of a detailed understanding of image dehazing.
Deep belief network for solving the image quality assessment in full reference and no reference model
2022, Neural Computing and Applications
Multi-dimensional feature fusion network for no-reference quality assessment of in-the-wild videos
2021, Sensors
No‐reference quality assessment for 3d synthesized images based on visual‐entropy‐guided multi‐layer features analysis
2021, Entropy
BUILDING A MATHEMATICAL MODEL AND AN ALGORITHM FOR TRAINING A NEURAL NETWORK WITH SPARSE DIPOLE SYNAPTIC CONNECTIONS FOR IMAGE RECOGNITION
2021, Eastern-European Journal of Enterprise Technologies
Mapping and deep analysis of image dehazing: Coherent taxonomy, datasets, open challenges, motivations, and recommendations
2021, International Journal of Interactive Multimedia and Artificial Intelligence

He is currently pursuing the Ph.D. degree with Wuhan University. His research interests include image and video quality/aesthetics assessment, image enhancement, and image recognition.

Meng Tian received the B.S. and Ph.D. degrees from the Electronic Information School, Wuhan University, Wuhan, China, in 2011 and 2016, respectively. He holds a Postdoctoral position with Wuhan University and a Visiting Scholar with Southern Methodist University.

His research interests include security of cyber physical power systems and cascading failures of multilayer networks.

Weixia Zhang received the B.E. degree from the Wuhan University, Wuhan, China, in 2011 and the M.S. degree in electrical and computer engineering from the University of Rochester, NY, USA, in 2013. He then received the Ph.D. degree from the Wuhan University, Wuhan, China, in 2018.

He is currently a Postdoctoral Fellow with the Artificial Intelligence Institute, Shanghai Jiao Tong University. His research interests include perceptual image processing and computer vision.

Hongtai Yao received the B.S. degree from the School of Mathematics and Statistics, Zhengzhou University, Zhengzhou, China, in 2014, and the M.S. degree from the School of Mathematics and Statistics, Henan University, Kaifeng, in 2018. He is currently pursuing the Ph.D. degree with Wuhan University.

His research interests include remote sensing image segmentation and computer vision.

Xianpei Wang received the B.S. degree from North China Electric Power University, in 1984, and the M.S. and Ph.D. degrees from Wuhan University, in 1991 and 1999, respectively.

He is currently a Professor with the Electronic Information School, Wuhan University. He has authored or co-authored over 100 papers in international and domestic journals. His research interests include computer vision, intelligent monitoring technique for power system, system reliability analysis, and fault diagnosis of high-voltage equipment.

^☆: This paper has been recommended for acceptance by Zicheng Liu.

^☆☆: This document is the results of the research project funded in part by the National Natural Science Foundation of China under Grant 61901262 and Grant 51707135.

View full text

Learning to predict the quality of distorted-then-compressed images via a deep neural network☆,☆☆

Highlights

Abstract

Introduction

Section snippets

Related work

Motivation and framework

Description of databases

Conclusion

Declaration of Competing Interest

A comprehensive performance evaluation of image quality assessment algorithms

IEEE Access

A highly efficient blind image quality assessment metric of 3d-synthesized images using outlier detection

IEEE Trans. Ind. Inform.

Learning a no-reference quality assessment model of enhanced images with big data

IEEE Trans. Neural Netw. Learn. Syst.

A statistical evaluation of recent full reference image quality assessment algorithms

IEEE Trans. Image Process.

FSIM: A feature similarity index for image quality assessment

IEEE Trans. Image Process.

VSI: A visual saliency-induced index for perceptual image quality assessment

IEEE Trans. Image Process.

Massive online crowdsourced study of subjective and objective picture quality

IEEE Trans. Image Process.

End-to-end blind image quality assessment using deep neural networks

IEEE Trans. Image Process.

Making a ‘completely blind’ image quality analyzer

IEEE Signal Process. Lett.

Blind image quality assessment using joint statistics of gradient magnitude and Laplacian features

IEEE Trans. Image Process.

Predicting the quality of images compressed after distortion in two steps

IEEE Trans. Image Process.

Blind image quality assessment using a deep bilinear convolutional neural network

IEEE Trans. Circuits Syst. Video Technol.

Deep residual learning for image recognition

Waterloo Exploration Database: New challenges for image quality assessment models

IEEE Trans. Image Process.

Kadid-10k: A large-scale artificially distorted iqa database

A no-reference objective image sharpness metric based on the notion of just noticeable blur (JNB)

IEEE Trans. Image Process.

The analysis of image contrast: From quality assessment to automatic enhancement

IEEE T. Cybern.

Hybrid no-reference quality metric for singly and multiply distorted images

IEEE Trans. Broadcast.

Using free energy principle for blind image quality assessment

IEEE Trans. Multimedia

No-reference quality assessment for multiply-distorted images in gradient domain

IEEE Signal Process. Lett.

Simultaneous estimation of image quality and distortion via multi-task convolutional neural networks

Deep convolutional neural models for picture-quality prediction: Challenges and solutions to data-driven image quality assessment

IEEE Signal Process. Mag.

On the use of deep learning for blind image quality assessment

Signal, Image, Video Process. (SIViP)

Learning deep features for scene recognition using places database

dipIQ: Blind image quality assessment by learning-to-rank discriminable image pairs

IEEE Trans. Image Process.

Learning to blindly assess image quality in the laboratory and wild