Abstract
Sharing a pre-trained machine learning model, particularly a deep neural network via prediction APIs, is becoming a common practice on machine learning as a service (MLaaS) platforms nowadays. Although deep neural networks (DNN) have shown remarkable successes in many tasks, they are also criticized for the lack of interpretability and transparency. Interpreting a shared DNN model faces two additional challenges compared with interpreting a general model. (1) Limited training data can be disclosed to users. (2) The internal structure of the models may not be available. These two challenges impede the application of most existing interpretability approaches, such as saliency maps or influence functions, for DNN models. Case-based reasoning methods have been used for interpreting decisions; however, how to select and organize the data points under the constraints of shared DNN models is not discussed. Moreover, simply providing cases as explanations may not be sufficient for supporting instance level interpretability. Meanwhile, existing interpretation methods for DNN models generally lack the means to evaluate the reliability of the interpretation. In this article, we propose a framework named Shared Model INTerpreter (SMINT) to address the above limitations. We propose a new data structure called a boundary graph to organize training points to mimic the predictions of DNN models. We integrate local features, such as saliency maps and interpretable input masks, into the data structure to help users to infer the model decision boundaries. We show that the boundary graph is able to address the reliability issues in many local interpretation methods. We further design an algorithm named hidden-layer aware p-test to measure the reliability of the interpretations. Our experiments show that SMINT is able to achieve above 99% fidelity to corresponding DNN models on both MNIST and ImageNet by sharing only a tiny fraction of training data to make these models interpretable. The human pilot study demonstrates that SMINT provides better interpretability compared with existing methods. Moreover, we demonstrate that SMINT is able to assist model tuning for better performance on different user data.
- Agnar Aamodt and Enric Plaza. 1994. Case-based reasoning: Foundational issues, methodological variations, and system approaches. AI Commun. 7, 1 (1994), 39--59.Google ScholarCross Ref
- Julius Adebayo, Justin Gilmer, Michael Muelly, Ian Goodfellow, Moritz Hardt, and Been Kim. 2018. Sanity checks for saliency maps. In Advances in Neural Information Processing Systems. 9505--9515.Google Scholar
- David Alvarez-Melis and Tommi S. Jaakkola. 2018. On the robustness of interpretability methods. https://arxiv.org/pdf/1806.08049.pdf.Google Scholar
- Amazon. 2017. Machine Learning on AWS. Retrieved July 7, 2017 from https://aws.amazon.com/machine-learning/.Google Scholar
- Alexandr Andoni, Piotr Indyk, Thijs Laarhoven, Ilya Razenshteyn, and Ludwig Schmidt. 2015. Practical and optimal LSH for angular distance. In Advances in Neural Information Processing Systems. 1225--1233.Google Scholar
- Alexandr Andoni and Ilya Razenshteyn. 2015. Optimal data-dependent hashing for approximate near neighbors. In Proceedings of the 47th Annual ACM Symposium on Theory of Computing. ACM, 793--801.Google ScholarDigital Library
- Franz Aurenhammer. 1991. Voronoi diagrams: A survey of a fundamental geometric data structure. ACM Comput. Surv. 23, 3 (1991), 345--405.Google ScholarDigital Library
- Jacob Bien and Robert Tibshirani. 2011. Prototype selection for interpretable classification. Ann. Appl. Stat. 5, 4 (2011), 2403--2424.Google ScholarCross Ref
- Jean-Daniel Boissonnat, Olivier Devillers, and Monique Teillaud. 1993. A semidynamic construction of higher-order voronoi diagrams and its randomized analysis. Algorithmica 9, 4 (1993), 329--356.Google ScholarCross Ref
- Olcay Boz. 2002. Extracting decision trees from trained neural networks. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 456--461.Google ScholarDigital Library
- Nicholas Carlini and David Wagner. 2017. Adversarial examples are not easily detected: Bypassing ten detection methods. In Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security. ACM, 3--14.Google ScholarDigital Library
- Nicholas Carlini and David Wagner. 2017. Towards evaluating the robustness of neural networks. In Proceedings of the 2017 IEEE Symposium on Security and Privacy (SP’17). IEEE, 39--57.Google ScholarCross Ref
- Zhengping Che, Sanjay Purushotham, Robinder Khemani, and Yan Liu. 2016. Interpretable deep models for icu outcome prediction. In Proceedings of the AMIA Annual Symposium Proceedings, Vol. 2016. American Medical Informatics Association, 371.Google Scholar
- Daqing Chen and Phillip Burrell. 2001. Case-based reasoning system and artificial neural networks: A review. Neur. Comput. Appl. 10, 3 (2001), 264--276.Google ScholarCross Ref
- Marvin S. Cohen, Jared T. Freeman, and Steve Wolf. 1996. Metarecognition in time-stressed decision making: Recognizing, critiquing, and correcting. Hum. Factors 38, 2 (1996), 206--219.Google ScholarCross Ref
- Mark Craven and Jude W. Shavlik. 1996. Extracting tree-structured representations of trained networks. In Advances in Neural Information Processing Systems. 24--30.Google Scholar
- Mayur Datar, Nicole Immorlica, Piotr Indyk, and Vahab S. Mirrokni. 2004. Locality-sensitive hashing scheme based on p-stable distributions. In Proceedings of the 20th Annual Symposium on Computational Geometry. ACM, 253--262.Google Scholar
- Ruth Fong and Andrea Vedaldi. 2017. Interpretable explanations of black boxes by meaningful perturbation. In Proceedings of the IEEE International Conference on Computer Vision. 3429--3437.Google ScholarCross Ref
- Nicholas Frosst and Geoffrey Hinton. 2017. Distilling a neural network into a soft decision tree. https://arxiv.org/pdf/1711.09784.pdf.Google Scholar
- Amirata Ghorbani, Abubakar Abid, and James Zou. 2019. Interpretation of neural networks is fragile. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 3681--3688.Google ScholarDigital Library
- Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. 2014. Explaining and harnessing adversarial examples. https://arxiv.org/abs/1412.6572.pdf.Google Scholar
- Alex Graves, Abdel-rahman Mohamed, and Geoffrey Hinton. 2013. Speech recognition with deep recurrent neural networks. In Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’13). IEEE, 6645--6649.Google ScholarCross Ref
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770--778.Google ScholarCross Ref
- Geoffrey Hinton, Li Deng, Dong Yu, George E. Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara N. Sainath, et al. 2012. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Sign. Process. Mag. 29, 6 (2012), 82--97.Google ScholarCross Ref
- Qiang Huang, Jianlin Feng, Qiong Fang, Wilfred Ng, and Wei Wang. 2017. Query-aware locality-sensitive hashing scheme for lp norm. VLDB J. 26, 5 (2017), 683--708.Google ScholarDigital Library
- Mayank Kabra, Alice Robie, and Kristin Branson. 2015. Understanding classifier errors by examining influential neighbors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3917--3925.Google ScholarCross Ref
- Been Kim, Rajiv Khanna, and Oluwasanmi O. Koyejo. 2016. Examples are not enough, learn to criticize! criticism for interpretability. In Advances in Neural Information Processing Systems. 2280--2288.Google Scholar
- Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. 2018. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In Proceedings of the International Conference on Machine Learning. 2673--2682.Google Scholar
- Yoon Kim. 2014. Convolutional neural networks for sentence classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP'14). Association for Computational Linguistics, 1746--1751. DOI:10.3115/v1/D14-1181Google ScholarCross Ref
- Pieter-Jan Kindermans, Sara Hooker, Julius Adebayo, Maximilian Alber, Kristof T. Schütt, Sven Dähne, Dumitru Erhan, and Been Kim. 2017. The (Un) reliability of saliency methods. In Explainable AI: Interpreting, Explaining and Visualizing Deep Learning. Springer, 267--280.Google Scholar
- Pang Wei Koh and Percy Liang. 2017. Understanding black-box predictions via influence functions. In Proceedings of the 34th International Conference on Machine Learning, Vol. 70. JMLR. org, 1885--1894.Google Scholar
- J. Kolodner. 2014. Selecting the best case for a case-based reasoner. In Proceedings of the 11th Annual Conference Cognitive Science Society Pod. 155--162.Google Scholar
- Janet L. Kolodner. 1992. An introduction to case-based reasoning. Artif. Intell. Rev. 6, 1 (1992), 3--34.Google ScholarCross Ref
- Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2017. ImageNet classification with deep convolutional neural networks. Commun. ACM 60, 6 (May 2017), 84--90. DOI:https://doi.org/10.1145/3065386Google ScholarDigital Library
- Brian Kulis and Kristen Grauman. 2009. Kernelized locality-sensitive hashing for scalable image search. In Proceedings of the 2009 IEEE 12th International Conference on Computer Vision. IEEE, 2130--2137.Google ScholarCross Ref
- Der-Tsai Lee and Bruce J. Schachter. 1980. Two algorithms for constructing a Delaunay triangulation. International J. Comput. Inf. Sci. 9, 3 (1980), 219--242.Google ScholarCross Ref
- Oscar Li, Hao Liu, Chaofan Chen, and Cynthia Rudin. 2018. Deep learning for case-based reasoning through prototypes: A neural network that explains its predictions. In Thirty-Second AAAI Conference on Artificial Intelligence.Google Scholar
- Xingjun Ma, Bo Li, Yisen Wang, Sarah M. Erfani, Sudanthi Wijewickrema, Michael E. Houle, Grant Schoenebeck, Dawn Song, and James Bailey. 2018. Characterizing adversarial subspaces using local intrinsic dimensionality. Proceedings of the International Conference on Learning Representations (ICLR’18).Google Scholar
- Charles Mathy, Nate Derbinsky, José Bento, Jonathan Rosenthal, and Jonathan S. Yedidia. 2015. The boundary forest algorithm for online supervised and unsupervised learning. In Proceedings of the AAAI Conference on Artifical Intelligence (AAAI’15). 2864--2870.Google Scholar
- Microsoft. 2017. Microsoft Azure Machine Learning Studio. Retrieved July 7, 2017 from https://studio.azureml.net/.Google Scholar
- Tim Miller. 2018. Explanation in artificial intelligence: Insights from the social sciences. Artificial Intelligence 267 (2019), 1--38.Google Scholar
- Allen Newell, Herbert Alexander Simon, et al. 1972. Human Problem Solving. Vol. 104. Prentice-Hall, Englewood Cliffs, NJ.Google Scholar
- Richard Nock, Marc Sebban, and Didier Bernard. 2003. A simple locally adaptive nearest neighbor rule with Aapplication to pollution forecasting. Int. J. Pattern Recogn. Artif. Intell. 7, 8 (2003), 1369--1382.Google ScholarCross Ref
- Augustus Odena, Catherine Olsson, David Andersen, and Ian Goodfellow. 2019. TensorFuzz: Debugging neural networks with coverage-guided fuzzing. In International Conference on Machine Learning. 4901--4911.Google Scholar
- Bo Pang and Lillian Lee. 2005. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, 115--124.Google ScholarDigital Library
- Nicolas Papernot, Fartash Faghri, Nicholas Carlini, Ian Goodfellow, Reuben Feinman, Alexey Kurakin, Cihang Xie, Yash Sharma, Tom Brown, Aurko Roy, Alexander Matyasko, Vahid Behzadan, Karen Hambardzumyan, Zhishuai Zhang, Yi-Lin Juang, Zhi Li, Ryan Sheatsley, Abhibhav Garg, Jonathan Uesato, Willi Gierke, Yinpeng Dong, David Berthelot, Paul Hendricks, Jonas Rauber, and Rujun Long. 2018. Technical report on the CleverHans v2.1.0 adversarial examples library. https://arxiv.org/pdf/1610.00768.pdf.Google Scholar
- Nicolas Papernot and Patrick McDaniel. 2018. Deep k-nearest neighbors: Towards confident, interpretable and robust deep learning. https://arxiv.org/abs/1803.04765.pdf.Google Scholar
- Nicolas Papernot, Patrick McDaniel, Somesh Jha, Matt Fredrikson, Z. Berkay Celik, and Ananthram Swami. 2016. The limitations of deep learning in adversarial settings. In Proceedings of the 2016 IEEE European Symposium on Security and Privacy (EuroS8P’16). IEEE, 372--387.Google ScholarCross Ref
- Kexin Pei, Yinzhi Cao, Junfeng Yang, and Suman Jana. 2017. Deepxplore: Automated whitebox testing of deep learning systems. In Proceedings of the 26th Symposium on Operating Systems Principles. ACM, 1--18.Google ScholarDigital Library
- Google Cloud Platform. 2018. Cloud Machine Learning Engine. Retrieved February 1, 2018 from https://cloud.google.com/ml-engine/.Google Scholar
- Jim Prentzas and Ioannis Hatzilygeroudis. 2009. Combinations of case-based reasoning with other intelligent methods. Int. J. Hybrid Intell. Syst. 6, 4 (2009), 189--209.Google ScholarDigital Library
- Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. Why should I trust you?: Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1135--1144.Google ScholarDigital Library
- Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. 2015. ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115, 3 (2015), 211--252. DOI:https://doi.org/10.1007/s11263-015-0816-yGoogle ScholarDigital Library
- Gregor P. J. Schmitz, Chris Aldrich, and Francois S. Gouws. 1999. ANN-DT: An algorithm for extraction of decision trees from artificial neural networks. IEEE Trans. Neur. Netw. 10, 6 (1999), 1392--1401.Google ScholarDigital Library
- Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. 2017. Grad-CAM: Visual explanations from deep networks via gradient-based localization. In Proceedings of the International Conference on Computer Vision (ICCV’17). 618--626.Google ScholarCross Ref
- David Silver, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. 2016. Mastering the game of Go with deep neural networks and tree search. Nature 529, 7587 (2016), 484--489.Google Scholar
- J. Springenberg, Alexey Dosovitskiy, Thomas Brox, and M. Riedmiller. 2015. Striving for simplicity: The all convolutional net. In Proceedings of the International Conference on Learning Representations (ICLR’15).Google Scholar
- Pierre Stock and Moustapha Cisse. 2017. Convnets and imagenet beyond accuracy: Explanations, bias detection, adversarial examples and model criticism. https://arxiv.org/abs/1711.11443.pdf.Google Scholar
- Mukund Sundararajan, Ankur Taly, and Qiqi Yan. 2017. Axiomatic attribution for deep networks. In Proceedings of the 34th International Conference on Machine Learning, Vol. 70. 3319--3328.Google Scholar
- Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alexander A. Alemi. 2017. Inception-v4, inception-resnet and the impact of residual connections on learning. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI’17), Vol. 4. 12.Google Scholar
- Tensorflow. 2017. Deep MNIST for Experts. Retrieved July 7, 2017 from https://www.tensorflow.org/get_started/mnist/pros.Google Scholar
- Csaba D. Toth, Joseph O’Rourke, and Jacob E. Goodman. 2004. Handbook of Discrete and Computational Geometry. CRC Press.Google Scholar
- Florian Tramèr, Fan Zhang, Ari Juels, Michael K. Reiter, and Thomas Ristenpart. 2016. Stealing machine learning models via prediction apis. In Proceedings of the USENIX Security Conference.Google Scholar
- Huijun Wu, Chen Wang, Jie Yin, Kai Lu, and Liming Zhu. 2018. Sharing deep neural network models with interpretation. In Proceedings of the 2018 World Wide Web Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 177--186.Google ScholarDigital Library
- Mike Wu, Michael C. Hughes, Sonali Parbhoo, Maurizio Zazzi, Volker Roth, and Finale Doshi-Velez. 2018. Beyond sparsity: Tree regularization of deep models for interpretability. In Thirty-Second AAAI Conference on Artificial Intelligence.Google ScholarCross Ref
- Weilin Xu, David Evans, and Yanjun Qi. 2017. Feature squeezing: Detecting adversarial examples in deep neural networks. Proceedings of the Network and Distributed System Security Symposium (NDSS’18).Google Scholar
- Pierre Stock and Moustapha Cisse. 2018. Convnets and imagenet beyond accuracy: Understanding mistakes and uncovering biases. In Proceedings of the European Conference on Computer Vision (ECCV'18). 498--512.Google ScholarCross Ref
- Matthew D. Zeiler and Rob Fergus. 2014. Visualizing and understanding convolutional networks. In European Conference on Computer Vision. Springer, 818--833.Google Scholar
- Quanshi Zhang, Ying Nian Wu, and Song-Chun Zhu. 2018. Interpretable convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8827--8836.Google ScholarCross Ref
Index Terms
- SMINT: Toward Interpretable and Robust Model Sharing for Deep Neural Networks
Recommendations
Sharing Deep Neural Network Models with Interpretation
WWW '18: Proceedings of the 2018 World Wide Web ConferenceDespite outperforming humans in many tasks, deep neural network models are also criticized for the lack of transparency and interpretability in decision making. The opaqueness results in uncertainty and low confidence when deploying such a model in ...
Decision Boundary of Deep Neural Networks: Challenges and Opportunities
WSDM '20: Proceedings of the 13th International Conference on Web Search and Data MiningOne crucial aspect that yet remains fairly unknown while can inform us about the behavior of deep neural networks is their decision boundaries. Trust can be improved once we understand how and why deep models carve out a particular form of decision ...
A graph-based interpretability method for deep neural networks
AbstractWith the development of artificial intelligence, the most representative deep learning has been applied to various fields, which is greatly influencing human society. However, deep neural networks (DNNs) are still a black-box model, ...
Comments