Exploring the cloud of variable importance for the set of all good models,Nature Machine Intelligence

当前位置： X-MOL 学术 › Nat. Mach. Intell. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Exploring the cloud of variable importance for the set of all good models
Nature Machine Intelligence ( IF 23.8 ) Pub Date : 2020-12-10 , DOI: 10.1038/s42256-020-00264-0
Jiayun Dong , Cynthia Rudin

Variable importance is central to scientific studies, including the social sciences and causal inference, healthcare and other domains. However, current notions of variable importance are often tied to a specific predictive model. This is problematic: what if there were multiple well-performing predictive models, and a specific variable is important to some of them but not to others? In that case, we cannot tell from a single well-performing model if a variable is always important, sometimes important, never important or perhaps only important when another variable is not important. Ideally, we would like to explore variable importance for all approximately equally accurate predictive models within the same model class. In this way, we can understand the importance of a variable in the context of other variables, and for many good models. This work introduces the concept of a variable importance cloud, which maps every variable to its importance for every good predictive model. We show properties of the variable importance cloud and draw connections to other areas of statistics. We introduce variable importance diagrams as a projection of the variable importance cloud into two dimensions for visualization purposes. Experiments with criminal justice, marketing data and image classification tasks illustrate how variables can change dramatically in importance for approximately equally accurate predictive models.

A preprint version of the article is available at ArXiv.

中文翻译：

探索对所有良好模型而言具有不同重要性的云

可变的重要性对于科学研究至关重要，包括社会科学和因果推理，医疗保健和其他领域。但是，当前重要性可变的概念通常与特定的预测模型相关。这是有问题的：如果存在多个表现良好的预测模型，并且特定变量对其中一些变量很重要，而对其他变量却不重要，该怎么办？在那种情况下，我们无法从一个表现良好的模型中判断一个变量是否始终是重要的，有时是重要的，从不重要的，或者仅在另一个变量不重要时才是重要的。理想情况下，我们希望探索同一模型类中所有近似相等准确的预测模型的变量重要性。这样，我们可以了解变量在其他变量的上下文中以及对于许多良好模型的重要性。这项工作介绍了可变重要性云的概念，该概念将每个变量映射到其对于每个好的预测模型的重要性。我们显示了可变重要性云的属性，并绘制了与其他统计领域的联系。为了可视化目的，我们引入可变重要性图作为可变重要性云到二维的投影。刑事司法，市场数据和图像分类任务的实验说明了变量对于近似相等准确的预测模型如何在重要性上发生巨大变化。为了可视化目的，我们引入可变重要性图作为可变重要性云到二维的投影。刑事司法，市场数据和图像分类任务的实验说明了变量对于近似相等准确的预测模型如何在重要性上发生巨大变化。为了可视化目的，我们引入可变重要性图作为可变重要性云到二维的投影。刑事司法，市场数据和图像分类任务的实验说明了变量对于近似相等准确的预测模型如何在重要性上发生巨大变化。

该文章的预印本可从ArXiv获得。

更新日期：2020-12-11

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>