research-article

SMINT: Toward Interpretable and Robust Model Sharing for Deep Neural Networks

Authors:
Huijun Wu

National University of Defense Technology, China and UNSW, China, UNSW, Sydney, Australia

National University of Defense Technology, China and UNSW, China, UNSW, Sydney, Australia
View Profile

,
Chen Wang

Data61, CSIRO, Garden Street, Eveleigh, NSW, Sydney, Australia

Data61, CSIRO, Garden Street, Eveleigh, NSW, Sydney, Australia
View Profile

,
Richard Nock

Data61, CSIRO, Garden Street, Eveleigh, NSW, Sydney, Australia

Data61, CSIRO, Garden Street, Eveleigh, NSW, Sydney, Australia
View Profile

,
Wei Wang

University of New South Wales, Kensington, NSW, Sydney, Australia

University of New South Wales, Kensington, NSW, Sydney, Australia
View Profile

,
Jie Yin

University of Sydney, NSW, Sydney, Australia

University of Sydney, NSW, Sydney, Australia
View Profile

,
Kai Lu

National University of Defense Technology, Kaifu, Changsha, China

National University of Defense Technology, Kaifu, Changsha, China
View Profile

,
Liming Zhu

Data61, CSIRO and University of New South Wales, Eveleigh, NSW, Sydney, Australia

Data61, CSIRO and University of New South Wales, Eveleigh, NSW, Sydney, Australia
View Profile

Authors Info & Claims

ACM Transactions on the Web Volume 14 Issue 3Article No.: 11pp 1–28https://doi.org/10.1145/3381833

Published:03 May 2020Publication History

ACM Transactions on the Web

Abstract

Sharing a pre-trained machine learning model, particularly a deep neural network via prediction APIs, is becoming a common practice on machine learning as a service (MLaaS) platforms nowadays. Although deep neural networks (DNN) have shown remarkable successes in many tasks, they are also criticized for the lack of interpretability and transparency. Interpreting a shared DNN model faces two additional challenges compared with interpreting a general model. (1) Limited training data can be disclosed to users. (2) The internal structure of the models may not be available. These two challenges impede the application of most existing interpretability approaches, such as saliency maps or influence functions, for DNN models. Case-based reasoning methods have been used for interpreting decisions; however, how to select and organize the data points under the constraints of shared DNN models is not discussed. Moreover, simply providing cases as explanations may not be sufficient for supporting instance level interpretability. Meanwhile, existing interpretation methods for DNN models generally lack the means to evaluate the reliability of the interpretation. In this article, we propose a framework named Shared Model INTerpreter (SMINT) to address the above limitations. We propose a new data structure called a boundary graph to organize training points to mimic the predictions of DNN models. We integrate local features, such as saliency maps and interpretable input masks, into the data structure to help users to infer the model decision boundaries. We show that the boundary graph is able to address the reliability issues in many local interpretation methods. We further design an algorithm named hidden-layer aware p-test to measure the reliability of the interpretations. Our experiments show that SMINT is able to achieve above 99% fidelity to corresponding DNN models on both MNIST and ImageNet by sharing only a tiny fraction of training data to make these models interpretable. The human pilot study demonstrates that SMINT provides better interpretability compared with existing methods. Moreover, we demonstrate that SMINT is able to assist model tuning for better performance on different user data.

References

Agnar Aamodt and Enric Plaza. 1994. Case-based reasoning: Foundational issues, methodological variations, and system approaches. AI Commun. 7, 1 (1994), 39--59.Google ScholarCross Ref
Julius Adebayo, Justin Gilmer, Michael Muelly, Ian Goodfellow, Moritz Hardt, and Been Kim. 2018. Sanity checks for saliency maps. In Advances in Neural Information Processing Systems. 9505--9515.Google Scholar
David Alvarez-Melis and Tommi S. Jaakkola. 2018. On the robustness of interpretability methods. https://arxiv.org/pdf/1806.08049.pdf.Google Scholar
Amazon. 2017. Machine Learning on AWS. Retrieved July 7, 2017 from https://aws.amazon.com/machine-learning/.Google Scholar
Alexandr Andoni, Piotr Indyk, Thijs Laarhoven, Ilya Razenshteyn, and Ludwig Schmidt. 2015. Practical and optimal LSH for angular distance. In Advances in Neural Information Processing Systems. 1225--1233.Google Scholar
Alexandr Andoni and Ilya Razenshteyn. 2015. Optimal data-dependent hashing for approximate near neighbors. In Proceedings of the 47th Annual ACM Symposium on Theory of Computing. ACM, 793--801.Google ScholarDigital Library
Franz Aurenhammer. 1991. Voronoi diagrams: A survey of a fundamental geometric data structure. ACM Comput. Surv. 23, 3 (1991), 345--405.Google ScholarDigital Library
Jacob Bien and Robert Tibshirani. 2011. Prototype selection for interpretable classification. Ann. Appl. Stat. 5, 4 (2011), 2403--2424.Google ScholarCross Ref
Jean-Daniel Boissonnat, Olivier Devillers, and Monique Teillaud. 1993. A semidynamic construction of higher-order voronoi diagrams and its randomized analysis. Algorithmica 9, 4 (1993), 329--356.Google ScholarCross Ref
Olcay Boz. 2002. Extracting decision trees from trained neural networks. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 456--461.Google ScholarDigital Library
Nicholas Carlini and David Wagner. 2017. Adversarial examples are not easily detected: Bypassing ten detection methods. In Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security. ACM, 3--14.Google ScholarDigital Library
Nicholas Carlini and David Wagner. 2017. Towards evaluating the robustness of neural networks. In Proceedings of the 2017 IEEE Symposium on Security and Privacy (SP’17). IEEE, 39--57.Google ScholarCross Ref
Zhengping Che, Sanjay Purushotham, Robinder Khemani, and Yan Liu. 2016. Interpretable deep models for icu outcome prediction. In Proceedings of the AMIA Annual Symposium Proceedings, Vol. 2016. American Medical Informatics Association, 371.Google Scholar
Daqing Chen and Phillip Burrell. 2001. Case-based reasoning system and artificial neural networks: A review. Neur. Comput. Appl. 10, 3 (2001), 264--276.Google ScholarCross Ref
Marvin S. Cohen, Jared T. Freeman, and Steve Wolf. 1996. Metarecognition in time-stressed decision making: Recognizing, critiquing, and correcting. Hum. Factors 38, 2 (1996), 206--219.Google ScholarCross Ref
Mark Craven and Jude W. Shavlik. 1996. Extracting tree-structured representations of trained networks. In Advances in Neural Information Processing Systems. 24--30.Google Scholar
Mayur Datar, Nicole Immorlica, Piotr Indyk, and Vahab S. Mirrokni. 2004. Locality-sensitive hashing scheme based on p-stable distributions. In Proceedings of the 20th Annual Symposium on Computational Geometry. ACM, 253--262.Google Scholar
Ruth Fong and Andrea Vedaldi. 2017. Interpretable explanations of black boxes by meaningful perturbation. In Proceedings of the IEEE International Conference on Computer Vision. 3429--3437.Google ScholarCross Ref
Nicholas Frosst and Geoffrey Hinton. 2017. Distilling a neural network into a soft decision tree. https://arxiv.org/pdf/1711.09784.pdf.Google Scholar
Amirata Ghorbani, Abubakar Abid, and James Zou. 2019. Interpretation of neural networks is fragile. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 3681--3688.Google ScholarDigital Library
Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. 2014. Explaining and harnessing adversarial examples. https://arxiv.org/abs/1412.6572.pdf.Google Scholar
Alex Graves, Abdel-rahman Mohamed, and Geoffrey Hinton. 2013. Speech recognition with deep recurrent neural networks. In Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’13). IEEE, 6645--6649.Google ScholarCross Ref
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770--778.Google ScholarCross Ref
Geoffrey Hinton, Li Deng, Dong Yu, George E. Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara N. Sainath, et al. 2012. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Sign. Process. Mag. 29, 6 (2012), 82--97.Google ScholarCross Ref
Qiang Huang, Jianlin Feng, Qiong Fang, Wilfred Ng, and Wei Wang. 2017. Query-aware locality-sensitive hashing scheme for lp norm. VLDB J. 26, 5 (2017), 683--708.Google ScholarDigital Library
Mayank Kabra, Alice Robie, and Kristin Branson. 2015. Understanding classifier errors by examining influential neighbors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3917--3925.Google ScholarCross Ref
Been Kim, Rajiv Khanna, and Oluwasanmi O. Koyejo. 2016. Examples are not enough, learn to criticize! criticism for interpretability. In Advances in Neural Information Processing Systems. 2280--2288.Google Scholar
Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. 2018. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In Proceedings of the International Conference on Machine Learning. 2673--2682.Google Scholar
Yoon Kim. 2014. Convolutional neural networks for sentence classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP'14). Association for Computational Linguistics, 1746--1751. DOI:10.3115/v1/D14-1181Google ScholarCross Ref
Pieter-Jan Kindermans, Sara Hooker, Julius Adebayo, Maximilian Alber, Kristof T. Schütt, Sven Dähne, Dumitru Erhan, and Been Kim. 2017. The (Un) reliability of saliency methods. In Explainable AI: Interpreting, Explaining and Visualizing Deep Learning. Springer, 267--280.Google Scholar
Pang Wei Koh and Percy Liang. 2017. Understanding black-box predictions via influence functions. In Proceedings of the 34th International Conference on Machine Learning, Vol. 70. JMLR. org, 1885--1894.Google Scholar
J. Kolodner. 2014. Selecting the best case for a case-based reasoner. In Proceedings of the 11th Annual Conference Cognitive Science Society Pod. 155--162.Google Scholar
Janet L. Kolodner. 1992. An introduction to case-based reasoning. Artif. Intell. Rev. 6, 1 (1992), 3--34.Google ScholarCross Ref
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2017. ImageNet classification with deep convolutional neural networks. Commun. ACM 60, 6 (May 2017), 84--90. DOI:https://doi.org/10.1145/3065386Google ScholarDigital Library
Brian Kulis and Kristen Grauman. 2009. Kernelized locality-sensitive hashing for scalable image search. In Proceedings of the 2009 IEEE 12th International Conference on Computer Vision. IEEE, 2130--2137.Google ScholarCross Ref
Der-Tsai Lee and Bruce J. Schachter. 1980. Two algorithms for constructing a Delaunay triangulation. International J. Comput. Inf. Sci. 9, 3 (1980), 219--242.Google ScholarCross Ref
Oscar Li, Hao Liu, Chaofan Chen, and Cynthia Rudin. 2018. Deep learning for case-based reasoning through prototypes: A neural network that explains its predictions. In Thirty-Second AAAI Conference on Artificial Intelligence.Google Scholar
Xingjun Ma, Bo Li, Yisen Wang, Sarah M. Erfani, Sudanthi Wijewickrema, Michael E. Houle, Grant Schoenebeck, Dawn Song, and James Bailey. 2018. Characterizing adversarial subspaces using local intrinsic dimensionality. Proceedings of the International Conference on Learning Representations (ICLR’18).Google Scholar
Charles Mathy, Nate Derbinsky, José Bento, Jonathan Rosenthal, and Jonathan S. Yedidia. 2015. The boundary forest algorithm for online supervised and unsupervised learning. In Proceedings of the AAAI Conference on Artifical Intelligence (AAAI’15). 2864--2870.Google Scholar
Microsoft. 2017. Microsoft Azure Machine Learning Studio. Retrieved July 7, 2017 from https://studio.azureml.net/.Google Scholar
Tim Miller. 2018. Explanation in artificial intelligence: Insights from the social sciences. Artificial Intelligence 267 (2019), 1--38.Google Scholar
Allen Newell, Herbert Alexander Simon, et al. 1972. Human Problem Solving. Vol. 104. Prentice-Hall, Englewood Cliffs, NJ.Google Scholar
Richard Nock, Marc Sebban, and Didier Bernard. 2003. A simple locally adaptive nearest neighbor rule with Aapplication to pollution forecasting. Int. J. Pattern Recogn. Artif. Intell. 7, 8 (2003), 1369--1382.Google ScholarCross Ref
Augustus Odena, Catherine Olsson, David Andersen, and Ian Goodfellow. 2019. TensorFuzz: Debugging neural networks with coverage-guided fuzzing. In International Conference on Machine Learning. 4901--4911.Google Scholar
Bo Pang and Lillian Lee. 2005. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, 115--124.Google ScholarDigital Library
Nicolas Papernot, Fartash Faghri, Nicholas Carlini, Ian Goodfellow, Reuben Feinman, Alexey Kurakin, Cihang Xie, Yash Sharma, Tom Brown, Aurko Roy, Alexander Matyasko, Vahid Behzadan, Karen Hambardzumyan, Zhishuai Zhang, Yi-Lin Juang, Zhi Li, Ryan Sheatsley, Abhibhav Garg, Jonathan Uesato, Willi Gierke, Yinpeng Dong, David Berthelot, Paul Hendricks, Jonas Rauber, and Rujun Long. 2018. Technical report on the CleverHans v2.1.0 adversarial examples library. https://arxiv.org/pdf/1610.00768.pdf.Google Scholar
Nicolas Papernot and Patrick McDaniel. 2018. Deep k-nearest neighbors: Towards confident, interpretable and robust deep learning. https://arxiv.org/abs/1803.04765.pdf.Google Scholar
Nicolas Papernot, Patrick McDaniel, Somesh Jha, Matt Fredrikson, Z. Berkay Celik, and Ananthram Swami. 2016. The limitations of deep learning in adversarial settings. In Proceedings of the 2016 IEEE European Symposium on Security and Privacy (EuroS8P’16). IEEE, 372--387.Google ScholarCross Ref
Kexin Pei, Yinzhi Cao, Junfeng Yang, and Suman Jana. 2017. Deepxplore: Automated whitebox testing of deep learning systems. In Proceedings of the 26th Symposium on Operating Systems Principles. ACM, 1--18.Google ScholarDigital Library
Google Cloud Platform. 2018. Cloud Machine Learning Engine. Retrieved February 1, 2018 from https://cloud.google.com/ml-engine/.Google Scholar
Jim Prentzas and Ioannis Hatzilygeroudis. 2009. Combinations of case-based reasoning with other intelligent methods. Int. J. Hybrid Intell. Syst. 6, 4 (2009), 189--209.Google ScholarDigital Library
Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. Why should I trust you?: Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1135--1144.Google ScholarDigital Library
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. 2015. ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115, 3 (2015), 211--252. DOI:https://doi.org/10.1007/s11263-015-0816-yGoogle ScholarDigital Library
Gregor P. J. Schmitz, Chris Aldrich, and Francois S. Gouws. 1999. ANN-DT: An algorithm for extraction of decision trees from artificial neural networks. IEEE Trans. Neur. Netw. 10, 6 (1999), 1392--1401.Google ScholarDigital Library
Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. 2017. Grad-CAM: Visual explanations from deep networks via gradient-based localization. In Proceedings of the International Conference on Computer Vision (ICCV’17). 618--626.Google ScholarCross Ref
David Silver, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. 2016. Mastering the game of Go with deep neural networks and tree search. Nature 529, 7587 (2016), 484--489.Google Scholar
J. Springenberg, Alexey Dosovitskiy, Thomas Brox, and M. Riedmiller. 2015. Striving for simplicity: The all convolutional net. In Proceedings of the International Conference on Learning Representations (ICLR’15).Google Scholar
Pierre Stock and Moustapha Cisse. 2017. Convnets and imagenet beyond accuracy: Explanations, bias detection, adversarial examples and model criticism. https://arxiv.org/abs/1711.11443.pdf.Google Scholar
Mukund Sundararajan, Ankur Taly, and Qiqi Yan. 2017. Axiomatic attribution for deep networks. In Proceedings of the 34th International Conference on Machine Learning, Vol. 70. 3319--3328.Google Scholar
Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alexander A. Alemi. 2017. Inception-v4, inception-resnet and the impact of residual connections on learning. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI’17), Vol. 4. 12.Google Scholar
Tensorflow. 2017. Deep MNIST for Experts. Retrieved July 7, 2017 from https://www.tensorflow.org/get_started/mnist/pros.Google Scholar
Csaba D. Toth, Joseph O’Rourke, and Jacob E. Goodman. 2004. Handbook of Discrete and Computational Geometry. CRC Press.Google Scholar
Florian Tramèr, Fan Zhang, Ari Juels, Michael K. Reiter, and Thomas Ristenpart. 2016. Stealing machine learning models via prediction apis. In Proceedings of the USENIX Security Conference.Google Scholar
Huijun Wu, Chen Wang, Jie Yin, Kai Lu, and Liming Zhu. 2018. Sharing deep neural network models with interpretation. In Proceedings of the 2018 World Wide Web Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 177--186.Google ScholarDigital Library
Mike Wu, Michael C. Hughes, Sonali Parbhoo, Maurizio Zazzi, Volker Roth, and Finale Doshi-Velez. 2018. Beyond sparsity: Tree regularization of deep models for interpretability. In Thirty-Second AAAI Conference on Artificial Intelligence.Google ScholarCross Ref
Weilin Xu, David Evans, and Yanjun Qi. 2017. Feature squeezing: Detecting adversarial examples in deep neural networks. Proceedings of the Network and Distributed System Security Symposium (NDSS’18).Google Scholar
Pierre Stock and Moustapha Cisse. 2018. Convnets and imagenet beyond accuracy: Understanding mistakes and uncovering biases. In Proceedings of the European Conference on Computer Vision (ECCV'18). 498--512.Google ScholarCross Ref
Matthew D. Zeiler and Rob Fergus. 2014. Visualizing and understanding convolutional networks. In European Conference on Computer Vision. Springer, 818--833.Google Scholar
Quanshi Zhang, Ying Nian Wu, and Song-Chun Zhu. 2018. Interpretable convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8827--8836.Google ScholarCross Ref

Index Terms

SMINT: Toward Interpretable and Robust Model Sharing for Deep Neural Networks

Recommendations

Sharing Deep Neural Network Models with Interpretation
WWW '18: Proceedings of the 2018 World Wide Web Conference

Despite outperforming humans in many tasks, deep neural network models are also criticized for the lack of transparency and interpretability in decision making. The opaqueness results in uncertainty and low confidence when deploying such a model in ...
Read More
Decision Boundary of Deep Neural Networks: Challenges and Opportunities
WSDM '20: Proceedings of the 13th International Conference on Web Search and Data Mining

One crucial aspect that yet remains fairly unknown while can inform us about the behavior of deep neural networks is their decision boundaries. Trust can be improved once we understand how and why deep models carve out a particular form of decision ...
Read More
A graph-based interpretability method for deep neural networks
Abstract
With the development of artificial intelligence, the most representative deep learning has been applied to various fields, which is greatly influencing human society. However, deep neural networks (DNNs) are still a black-box model, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on the Web Volume 14, Issue 3
August 2020
126 pages
ISSN:1559-1131
EISSN:1559-114X
DOI:10.1145/3398019
Editor:
Brian D. Davison
Lehigh University, USA
Issue’s Table of Contents
Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 3 May 2020
- Accepted: 1 February 2020
- Revised: 1 October 2019
- Received: 1 April 2019
Published in tweb Volume 14, Issue 3

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Deep neural networks
decision boundary
interpretability
model sharing
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 5
  Total Citations
  View Citations
- 452
  Total Downloads
- Downloads (Last 12 months)25
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

SMINT: Toward Interpretable and Robust Model Sharing for Deep Neural Networks

ACM Transactions on the Web

Abstract

References

Cited By

Index Terms

Recommendations

Sharing Deep Neural Network Models with Interpretation

Decision Boundary of Deep Neural Networks: Challenges and Opportunities

A graph-based interpretability method for deep neural networks