Skip to main content
Log in

To what extent do DNN-based image classification models make unreliable inferences?

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Deep Neural Network (DNN) models are widely used for image classification. While they offer high performance in terms of accuracy, researchers are concerned about if these models inappropriately make inferences using features irrelevant to the target object in a given image. To address this concern, we propose a metamorphic testing approach that assesses if a given inference is made based on irrelevant features. Specifically, we propose two metamorphic relations (MRs) to detect such unreliable inferences. These relations expect (a) the classification results with different labels or the same labels but less certainty from models after corrupting the relevant features of images, and (b) the classification results with the same labels after corrupting irrelevant features. The inferences that violate the metamorphic relations are regarded as unreliable inferences. Our evaluation demonstrated that our approach can effectively identify unreliable inferences for single-label classification models with an average precision of 64.1% and 96.4% for the two MRs, respectively. As for multi-label classification models, the corresponding precision for MR-1 and MR-2 is 78.2% and 86.5%, respectively. Further, we conducted an empirical study to understand the problem of unreliable inferences in practice. Specifically, we applied our approach to 18 pre-trained single-label image classification models and 3 multi-label classification models, and then examined their inferences on the ImageNet and COCO datasets. We found that unreliable inferences are pervasive. Specifically, for each model, more than thousands of correct classifications are actually made using irrelevant features. Next, we investigated the effect of such pervasive unreliable inferences, and found that they can cause significant degradation of a model’s overall accuracy. After including these unreliable inferences from the test set, the model’s accuracy can be significantly changed. Therefore, we recommend that developers should pay more attention to these unreliable inferences during the model evaluations. We also explored the correlation between model accuracy and the size of unreliable inferences. We found the inferences of the input with smaller objects are easier to be unreliable. Lastly, we found that the current model training methodologies can guide the models to learn object-relevant features to certain extent, but may not necessarily prevent the model from making unreliable inferences. We encourage the community to propose more effective training methodologies to address this issue.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. The latter study refers this concept as “prediction confidence”

  2. https://github.com/yqtianust/PaperUnreliableInference

  3. in the Chi-square test, it is usually referred to as Cramér’s V (Cramer 1946)

  4. TResNet-L: https://github.com/Alibaba-MIIL/ASL, ResNet-50: https://github.com/ARiSE-Lab/DeepInspect

  5. https://github.com/pytorch/examples

  6. https://pytorch.org/docs/stable/torchvision/models.html

  7. The MR-3/4/5/6 are just our initial proposals. The detailed definition should be polished and their effectiveness should be thoroughly evaluated.

References

  • Aggarwal A, Lohia P, Nagar S, Dey K, Saha D (2019) Black box fairness testing of machine learning models. In: Proceedings of the 2019 27th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering, association for computing machinery, ESEC/FSE 2019, New York, NY, USA, pp 625–635. https://doi.org/10.1145/3338906.3338937

  • Barr ET, Harman M, McMinn P, Shahbaz M, Yoo S (2015) The oracle problem in software testing: A survey. IEEE Trans Softw Eng 41 (5):507–525

    Article  Google Scholar 

  • Ben-Baruch E, Ridnik T, Zamir N, Noy A, Friedman I, Protter M, Zelnik-Manor L (2020) Asymmetric loss for multi-label classification. arXiv:2009.14119

  • Benesty J, Chen J, Huang Y, Cohen I (2009) Pearson correlation coefficient. In: Noise reduction in speech processing. Springer, pp 1–4

  • Carlini N, Wagner DA (2017) Towards evaluating the robustness of neural networks. In: 2017 IEEE symposium on security and privacy, SP 2017, May 22-26, 2017. IEEE Computer Society, San Jose, CA, USA, pp 39–57. https://doi.org/10.1109/SP.2017.49

  • Chen TY, Cheung SC, Yiu SM (1998) Metamorphic testing: a new approach for generating next test cases. Tech. Rep. HKUST-CS98-01 Department of Computer Science, Hong Kong University of Science and Technology, Hong Kong

  • Chen TY, Kuo FC, Liu H, Poon PL, Towey D, Tse TH, Zhou ZQ (2018) Metamorphic testing: A review of challenges and opportunities. ACM Comput Surv 51(1):4:1–4:27. https://doi.org/10.1145/3143561

    Article  Google Scholar 

  • Chollet F, et al. (2015a) Keras. https://keras.io

  • Chollet F, et al. (2015b) Keras applications. https://keras.io/api/applications/

  • Cochran W (1963) Sampling techniques, 2nd edn. [Wiley Publications in Statistics.], John Wiley & Sons, New York

    MATH  Google Scholar 

  • Cramer H (1946) Mathematical methods of statistics. Princeton University Press, Princeton

    MATH  Google Scholar 

  • Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) ImageNet: A large-scale hierarchical image database. In: CVPR09

  • Devlin J, Chang M, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein J, Doran C, Solorio T (eds) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), Association for Computational Linguistics, pp 4171–4186. https://doi.org/10.18653/v1/n19-1423

  • Ding J, Kang X, Hu X (2017) Validating a deep learning framework by metamorphic testing. In: 2017 IEEE/ACM 2nd international workshop on metamorphic testing (MET), pp 28–34. https://doi.org/10.1109/MET.2017.2

  • Dwarakanath A, Ahuja M, Sikand S, Rao RM, Bose RPJC, Dubash N, Podder S (2018) Identifying implementation bugs in machine learning based image classifiers using metamorphic testing. In: Proceedings of the 27th ACM SIGSOFT international symposium on software testing and analysis, ISSTA 2018. ACM, New York, NY, USA, pp 118–128. https://doi.org/10.1145/3213846.3213858

  • Fahmy H, Pastore F, Bagherzadeh M, Briand L (2020) Supporting dnn safety analysis and retraining through heatmap-based unsupervised learning. arXiv:2002.00863

  • Fellbaum C (2006) Wordnet(s). In: Brown K (ed) Encyclopedia of language & linguistics. 2nd edn. Elsevier, Oxford, pp 665–670. https://doi.org/10.1016/B0-08-044854-2/00946-9http://www.sciencedirect.com/science/article/pii/B0080448542009469

  • Freund Y, Schapire RE (1995) A desicion-theoretic generalization of on-line learning and an application to boosting. In: Vitányi P (ed) Theory, computational learning. Springer Berlin Heidelberg, Berlin, Heidelberg, pp 23–37

  • FRS KP (1900) X. on the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. Lond Edinb Dublin Philos Mag J Sci 50(302):157–175. https://doi.org/10.1080/14786440009463897

    Article  Google Scholar 

  • Geirhos R, Rubisch P, Michaelis C, Bethge M, Wichmann FA, Brendel W (2019) Imagenet-trained cnns are biased towards texture; increasing shape bias improves accuracy and robustness. In: 7th International conference on learning representations, ICLR 2019, May 6-9, 2019, OpenReview.net, New Orleans, LA, USA. https://openreview.net/forum?id=Bygh9j09KX

  • Gu T, Liu K, Dolan-Gavitt B, Garg S (2019) Badnets: Evaluating backdooring attacks on deep neural networks. IEEE Access 7:47230–47244. https://doi.org/10.1109/ACCESS.2019.2909068

    Article  Google Scholar 

  • Guo J, Jiang Y, Zhao Y, Chen Q, Sun J (2018) Dlfuzz: Differential fuzzing testing of deep learning systems. In: Proceedings of the 2018 26th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering, association for computing machinery, ESEC/FSE 2018, New York, NY, USA, pp 739–743. https://doi.org/10.1145/3236024.3264835

  • He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 770–778. https://doi.org/10.1109/CVPR.2016.90

  • Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861

  • Huang G, Liu Z, van der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: 2017 IEEE conference on computer vision and pattern recognition, CVPR 2017, July 21-26, 2017. IEEE Computer Society, Honolulu, HI, USA, pp 2261–2269. https://doi.org/10.1109/CVPR.2017.243

  • Krasin I, Duerig T, Alldrin N, Ferrari V, Abu-El-Haija S, Kuznetsova A, Rom H, Uijlings J, Popov S, Kamali S, Malloci M, Pont-Tuset J, Veit A, Belongie S, Gomes V, Gupta A, Sun C, Chechik G, Cai D, Feng Z, Narayanan D, Murphy K (2017) Openimages: A public dataset for large-scale multi-label and multi-class image classification. Dataset available from https://storagegoogleapiscom/openimages/web/indexhtml

  • Krizhevsky A, Nair V, Hinton G (2009) The cifar-10 dataset. http://www.cs.toronto.edu/~kriz/cifar.html

  • Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Pereira F, Burges CJC, Bottou L, Weinberger KQ (eds) Advances in neural information processing systems 25. Curran Associates, Inc., pp 1097–1105. http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf

  • Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics 33(1):159–174

    Article  Google Scholar 

  • LeCun Y, Cortes C (2010) MNIST handwritten digit database. http://yann.lecun.com/exdb/mnist/

  • Lin T, Maire M, Belongie SJ, Bourdev LD, Girshick RB, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft COCO: common objects in context. arXiv:1405.0312

  • Lin Y, Lv F, Zhu S, Yang M, Cour T, Yu K, Cao L, Huang T (2011) Large-scale image classification: Fast feature extraction and svm training. In: CVPR 2011, pp 1689–1696

  • Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: Single shot multibox detector. In: Leibe B, Matas J, Sebe N, Welling M (eds) Computer vision – ECCV 2016. Springer International Publishing, Cham, pp 21–37

  • Ma L, Juefei-Xu F, Zhang F, Sun J, Xue M, Li B, Chen C, Su T, Li L, Liu Y, Zhao J, Wang Y (2018a) Deepgauge: Multi-granularity testing criteria for deep learning systems. In: Proceedings of the 33rd ACM/IEEE international conference on automated software engineering, ASE 2018. ACM, New York, NY, USA, pp 120–131. https://doi.org/10.1145/3238147.3238202

  • Ma L, Zhang F, Sun J, Xue M, Li B, Juefei-Xu F, Xie C, Li L, Liu Y, Zhao J, Wang Y (2018b) Deepmutation: Mutation testing of deep learning systems. In: Ghosh S, Natella R, Cukic B, Poston R, Laranjeiro N (eds) 29th IEEE international symposium on software reliability engineering, ISSRE 2018, October 15-18, 2018. IEEE Computer Society, Memphis, TN, USA, pp 100–111. https://doi.org/10.1109/ISSRE.2018.00021

  • Ma S, Liu Y, Lee WC, Zhang X, Grama A (2018c) Mode: Automated neural network model debugging via state differential analysis and input selection. In: Proceedings of the 2018 26th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering, association for computing machinery, ESEC/FSE 2018, New York, NY, USA, pp 175–186. https://doi.org/10.1145/3236024.3236082

  • Montavon G, Binder A, Lapuschkin S, Samek W, Müller KR (2019) Layer-wise relevance propagation: an overview. In: Explainable AI: interpreting, explaining and visualizing deep learning. Springer, pp 193–209

  • Moosavi-Dezfooli S, Fawzi A, Frossard P (2016) Deepfool: A simple and accurate method to fool deep neural networks. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 2574–2582. https://doi.org/10.1109/CVPR.2016.282

  • Nejadgholi M, Yang J (2019) A study of oracle approximations in testing deep learning libraries. In: 2019 34th IEEE/ACM international conference on automated software engineering (ASE), pp 785–796. https://doi.org/10.1109/ASE.2019.00078

  • Odena A, Olsson C, Andersen D, Goodfellow IJ (2019) Tensorfuzz: Debugging neural networks with coverage-guided fuzzing. In: Chaudhuri K, Salakhutdinov R (eds) Proceedings of the 36th international conference on machine learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA, PMLR, Proceedings of machine learning research, vol 97, pp 4901–4911. http://proceedings.mlr.press/v97/odena19a.html

  • Pei K, Cao Y, Yang J, Jana S (2017) Deepxplore: Automated whitebox testing of deep learning systems. In: Proceedings of the 26th symposium on operating systems principles, SOSP ’17. ACM, New York, NY, USA, pp 1–18. https://doi.org/10.1145/3132747.3132785

  • Pham HV, Lutellier T, Qi W, Tan L (2019) CRADLE: cross-backend validation to detect and localize bugs in deep learning libraries. In: Proceedings of the 41st international conference on software engineering, ICSE ’19. IEEE Press, pp 1027–1038. https://doi.org/10.1109/ICSE.2019.00107

  • Qin G, Vrusias B, Gillam L (2010) Background filtering for improving of object detection in images. In: 2010 20th international conference on pattern recognition, pp 922–925. https://doi.org/10.1109/ICPR.2010.231

  • Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 779–788. https://doi.org/10.1109/CVPR.2016.91

  • Ren S, He K, Girshick R, Sun J (2017) Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031

    Article  Google Scholar 

  • Ribeiro MT, Singh S, Guestrin C (2016) “why should I trust you?”: Explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, August 13-17, 2016, San Francisco, CA, USA, pp 1135–1144

  • Roobaert D, Zillich M, Eklundh J (2001) A pure learning approach to background-invariant object recognition using pedagogical support vector learning. In: Proceedings of the 2001 IEEE computer society conference on computer vision and pattern recognition. CVPR 2001, vol 2, pp II–II. https://doi.org/10.1109/CVPR.2001.990982

  • Rosenfeld A, Zemel RS, Tsotsos JK (2018) The elephant in the room. arXiv:1808.03305

  • Sanchez J, Perronnin F (2011) High-dimensional signature compression for large-scale image classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, CVPR ’11. IEEE Computer Society, USA, pp 1665–1672. https://doi.org/10.1109/CVPR.2011.5995504

  • Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2017) Grad-cam: Visual explanations from deep networks via gradient-based localization. In: IEEE international conference on computer vision, ICCV 2017, October 22-29, 2017. IEEE Computer Society, Venice, Italy, pp 618–626. https://doi.org/10.1109/ICCV.2017.74

  • Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: Bengio Y, LeCun Y (eds) 3rd international conference on learning representations, ICLR 2015, May 7-9, 2015, conference track proceedings, San Diego, CA, USA

  • Stock P, Cissé M (2018) Convnets and imagenet beyond accuracy: Understanding mistakes and uncovering biases. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y (eds) Computer Vision - ECCV 2018 - 15th european conference, September 8-14, 2018, Proceedings, Part VI, Lecture Notes in Computer Science, vol 11210. Springer, Munich, Germany, pp 504–519. https://doi.org/10.1007/978-3-030-01231-1_31

  • Tian Y, Pei K, Jana S, Ray B (2018) Deeptest: Automated testing of deep-neural-network-driven autonomous cars. In: Proceedings of the 40th international conference on software engineering, ICSE ’18. ACM, New York, NY, USA, pp 303–314. https://doi.org/10.1145/3180155.3180220

  • Tian Y, Zeng Z, Wen M, Liu Y, Kuo Ty, Cheung SC (2020a) Evaldnn: A toolbox for evaluating deep neural network models. In: Proceedings of the ACM/IEEE 42nd international conference on software engineering: companion proceedings, association for computing machinery, ICSE ’20, New York, NY, USA, pp 45–48. https://doi.org/10.1145/3377812.3382133

  • Tian Y, Zhong Z, Ordonez V, Kaiser G, Ray B (2020b) Testing dnn image classifiers for confusion & bias errors. In: Proceedings of the ACM/IEEE 42nd international conference on software engineering, association for computing machinery, ICSE ’20, New York, NY, USA, pp 1122–1134. https://doi.org/10.1145/3377811.3380400

  • Tramèr F, Atlidakis V, Geambasu R, Hsu D, Hubaux J, Humbert M, Juels A, Lin H (2017) Fairtest: Discovering unwarranted associations in data-driven applications. In: 2017 IEEE european symposium on security and privacy (EuroS P), pp 401–416. https://doi.org/10.1109/EuroSP.2017.29

  • Wang S, Su Z (2020) Metamorphic object insertion for testing object detection systems. In: Proceedings of the 35th ACM/IEEE international conference on automated software engineering, ASE 2020. ACM, New York, NY, USA, pp 1053–1065. https://doi.org/10.1145/3324884.3416584

  • Wilcoxon F (1945) Individual comparisons by ranking methods. Biom Bull 1(6):80–83

    Article  Google Scholar 

  • Wu G, Zhu J (2020) Multi-label classification: do hamming loss and subset accuracy really conflict with each other? In: Larochelle H, Ranzato M, Hadsell R, Balcan M, Lin H (eds) Advances in neural information processing systems 33: Annual conference on neural information processing systems 2020, NeurIPS 2020, December 6-12, 2020, virtual. https://proceedings.neurips.cc/paper/2020/hash/20479c788fb27378c2c99eadcf207e7f-Abstract.html

  • Xie X, Ho JW, Murphy C, Kaiser G, Xu B, Chen TY (2011) Testing and validating machine learning classifiers by metamorphic testing. J Syst Softw 84(4):544–558, the Ninth International Conference on Quality Software. https://doi.org/10.1016/j.jss.2010.11.920http://www.sciencedirect.com/science/article/pii/S0164121210003213

    Article  Google Scholar 

  • Xie X, Ma L, Juefei-Xu F, Xue M, Chen H, Liu Y, Zhao J, Li B, Yin J, See S (2019a) Deephunter: a coverage-guided fuzz testing framework for deep neural networks. In: Møller A, Zhang D (eds) Proceedings of the 28th ACM SIGSOFT international symposium on software testing and analysis, ISSTA 2019, July 15-19, 2019. ACM, Beijing, China, pp 146–157. https://doi.org/10.1145/3293882.3330579

  • Xie X, Ma L, Wang H, Li Y, Liu Y, Li X (2019b) Diffchaser: Detecting disagreements for deep neural networks. In: Proceedings of the twenty-eighth international joint conference on artificial intelligence, IJCAI-19, International joint conferences on artificial intelligence organization, pp 5772–5778. https://doi.org/10.24963/ijcai.2019/800

  • Yu J, Lin Z, Yang J, Shen X, Lu X, Huang TS (2018) Generative image inpainting with contextual attention. In: 2018 IEEE conference on computer vision and pattern recognition, CVPR 2018, June 18-22, 2018. IEEE Computer Society, Salt Lake City, UT, USA, pp 5505–5514. https://doi.org/10.1109/CVPR.2018.00577

  • Zhang JM, Harman M, Ma L, Liu Y (2020) Machine learning testing: Survey, landscapes and horizons. IEEE Trans Softw Eng, pp 1–1. https://doi.org/10.1109/TSE.2019.2962027

  • Zhang M, Zhang Y, Zhang L, Liu C, Khurshid S (2018) Deeproad: Gan-based metamorphic testing and input validation framework for autonomous driving systems. In: Proceedings of the 33rd ACM/IEEE international conference on automated software engineering, ASE 2018. ACM, New York, NY, USA, pp 132–142. https://doi.org/10.1145/3238147.3238187

  • Zhang P, Wang J, Sun J, Dong G, Wang X, Wang X, Dong JS, Ting D (2020a) White-box fairness testing through adversarial sampling. In: Proceedings of the 42nd international conference on software engineering, association for computing machinery, ICSE ’20, New York, NY, USA

  • Zhang X, Xie X, Ma L, Du X, Hu Q, Liu Y, Zhao J, Sun M (2020b) Towards characterizing adversarial defects of deep learning software from the lens of uncertainty. In: Proceedings of the ACM/IEEE 42nd international conference on software engineering, association for computing machinery, ICSE ’20, New York, NY, USA, pp 739–751. https://doi.org/10.1145/3377811.3380368

  • Zhao J, Wang T, Yatskar M, Ordonez V, Chang KW (2017) Men also like shopping: Reducing gender bias amplification using corpus-level constraints. In: Proceedings of the 2017 conference on empirical methods in natural language processing, pp 2941–2951. https://www.aclweb.org/anthology/D17-1319

  • Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A (2016) Learning deep features for discriminative localization. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 2921–2929. https://doi.org/10.1109/CVPR.2016.319

  • Zhou ZQ, Sun L (2019) Metamorphic testing of driverless cars. Commun ACM 62(3):61–67. https://doi.org/10.1145/3241979

    Article  Google Scholar 

  • Zoph B, Vasudevan V, Shlens J, Le QV (2018) Learning transferable architectures for scalable image recognition. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 8697–8710. https://doi.org/10.1109/CVPR.2018.00907

Download references

Acknowledgement

We want to thank all reviewers for their constructive comments and suggestions for the manuscript. We would also like to thank the editors’ coordination. We would like to express our deep gratitude to Miss Yao Feng for her significant contribution to the manual check. Besides, we appreciate the proofreading by our labmates, Mr. Wuqi Zhang, Mr. Meiziniu Li, Mr. Hao Guan and Miss Lei Liu.

Funding

This work was supported by the National Key Research and Development Program of China (Grant No. 2019YFE0198100), National Natural Science Foundation of China (Grant No. 61932021, 62002125 and 61802164), Guangdong Provincial Key Laboratory (Grant No. 2020B121201001), Hong Kong RGC/RIF (Grant No. R5034-18), Hong Kong ITF (Grant No: MHP/055/19), Hong Kong PhD Fellowship Scheme, MSRA Collaborative Research Grant, Microsoft Cloud Research Software Fellow Award 2019, NSF 1901242, NSF 1910300, and IARPA TrojAI W911NF19S0012. Any opinions, findings, and conclusions in this paper are those of the authors only and do not necessarily reflect the views of our sponsors.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shing-Chi Cheung.

Ethics declarations

Conflict of interests

The authors declare that they have no conflict of interest.

Additional information

Communicated by: Shin Yoo

Availability of data and material

Data is available at here: https://github.com/yqtianust/PaperUnreliableInference.

Code availability

Code is available at here: https://github.com/yqtianust/PaperUnreliableInference

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tian, Y., Ma, S., Wen, M. et al. To what extent do DNN-based image classification models make unreliable inferences?. Empir Software Eng 26, 84 (2021). https://doi.org/10.1007/s10664-021-09985-1

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10664-021-09985-1

Keywords

Navigation