Fast derivation of Shapley based feature importances through feature extraction methods for nanoinformatics,Machine Learning: Science and Technology

当前位置： X-MOL 学术 › Mach. Learn. Sci. Technol. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Fast derivation of Shapley based feature importances through feature extraction methods for nanoinformatics
Machine Learning: Science and Technology ( IF 6.3 ) Pub Date : 2021-07-09 , DOI: 10.1088/2632-2153/ac0167
Tommy Liu , Amanda S Barnard

This work presents an alternative model-agnostic attribution method to compute feature importance rankings for high dimensional data requiring dimension reduction. We make use of Shapley values within the Shapley additive explanation framework to determine the importance values of each of the feature in the data set. We then demonstrate that it is possible to significantly reduce the computational complexity of ranking features in high dimensional spaces by first applying principal component analysis. This transformation into lower dimensional spaces in conjunction with our normalisation approach does not yield a significant loss of information when performing feature selection tasks beyond a threshold. The efficacy of our approach is demonstrated on several examples of nanomaterial data, in particular graphene oxide. Our approach is ideal for the applied physical science communities where datasets are of high dimensionality and computational complexity is a matter for concern.

中文翻译：

通过纳米信息学的特征提取方法快速推导基于 Shapley 的特征重要性

这项工作提出了一种替代模型不可知的归因方法来计算需要降维的高维数据的特征重要性排名。我们利用 Shapley 附加解释框架内的 Shapley 值来确定数据集中每个特征的重要性值。然后，我们证明可以通过首先应用主成分分析来显着降低高维空间中排序特征的计算复杂度。在执行超过阈值的特征选择任务时，这种向低维空间的转换与我们的归一化方法相结合不会产生显着的信息丢失。我们方法的有效性在纳米材料数据的几个例子中得到了证明，特别是氧化石墨烯。

更新日期：2021-07-09

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文