当前位置: X-MOL 学术Comput. Intell. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Interpretability in healthcare: A comparative study of local machine learning interpretability techniques
Computational Intelligence ( IF 2.8 ) Pub Date : 2020-11-24 , DOI: 10.1111/coin.12410
Radwa ElShawi 1 , Youssef Sherif 1 , Mouaz Al‐Mallah 2 , Sherif Sakr 1
Affiliation  

Although complex machine learning models (eg, random forest, neural networks) are commonly outperforming the traditional and simple interpretable models (eg, linear regression, decision tree), in the healthcare domain, clinicians find it hard to understand and trust these complex models due to the lack of intuition and explanation of their predictions. With the new general data protection regulation (GDPR), the importance for plausibility and verifiability of the predictions made by machine learning models has become essential. Hence, interpretability techniques for machine learning models are an area focus of research. In general, the main aim of these interpretability techniques is to shed light and provide insights into the prediction process of the machine learning models and to be able to explain how the results from the prediction was generated. A major problem in this context is that both the quality of the interpretability techniques and trust of the machine learning model predictions are challenging to measure. In this article, we propose four fundamental quantitative measures for assessing the quality of interpretability techniques—similarity, bias detection, execution time, and trust. We present a comprehensive experimental evaluation of six recent and popular local model agnostic interpretability techniques, namely, LIME, SHAP, Anchors, LORE, ILIME“ and MAPLE on different types of real-world healthcare data. Building on previous work, our experimental evaluation covers different aspects for its comparison including identity, stability, separability, similarity, execution time, bias detection, and trust. The results of our experiments show that MAPLE achieves the highest performance for the identity across all data sets included in this study, while LIME achieves the lowest performance for the identity metric. LIME achieves the highest performance for the separability metric across all data sets. On average, SHAP has the smallest average time to output explanation across all data sets included in this study. For detecting the bias, SHAP and MAPLE enable the participants to better detect the bias. For the trust metric, Anchors achieves the highest performance on all data sets included in this work.

中文翻译:

医疗保健中的可解释性:本地机器学习可解释性技术的比较研究

尽管复杂的机器学习模型(例如,随机森林、神经网络)通常优于传统和简单的可解释模型(例如,线性回归、决策树),但在医疗保健领域,临床医生发现很难理解和信任这些复杂的模型,因为他们的预测缺乏直觉和解释。随着新的通用数据保护条例 (GDPR) 的出台,机器学习模型做出的预测的合理性和可验证性变得至关重要。因此,机器学习模型的可解释性技术是研究的重点领域。一般来说,这些可解释性技术的主要目的是阐明机器学习模型的预测过程并提供见解,并能够解释预测结果是如何产生的。在这种情况下,一个主要问题是可解释性技术的质量和机器学习模型预测的信任度都难以衡量。在这篇文章中,我们提出了四种基本的量化方法来评估可解释性技术的质量——相似性偏差检测执行时间信任我们对六种最近流行的与本地模型无关的可解释性技术进行了全面的实验评估,即LIMESHAPAnchorsLOREILIME “和MAPLE,用于不同类型的真实世界医疗保健数据。在以前的工作的基础上,我们的实验评估涵盖了不同方面的比较,包括同一性稳定性可分离性相似性执行时间偏差检测信任。我们的实验结果表明,MAPLE 在本研究中包含的所有数据集中实现了最高的身份性能,而 LIME 在身份指标上的性能最低。LIME 在所有数据集上实现了可分离性指标的最高性能。平均而言,SHAP 在本研究中包含的所有数据集中输出解释的平均时间最短。为了检测偏差,SHAP 和 MAPLE 使参与者能够更好地检测偏差。对于信任指标,Anchors 在这项工作中包含的所有数据集上都取得了最高的性能。
更新日期:2020-11-24
down
wechat
bug