当前位置: X-MOL 学术Artif. Intell. Rev. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
CHIRPS: Explaining random forest classification
Artificial Intelligence Review ( IF 10.7 ) Pub Date : 2020-06-04 , DOI: 10.1007/s10462-020-09833-6
Julian Hatwell , Mohamed Medhat Gaber , R. Muhammad Atif Azad

Modern machine learning methods typically produce “black box” models that are opaque to interpretation. Yet, their demand has been increasing in the Human-in-the-Loop processes, that is, those processes that require a human agent to verify, approve or reason about the automated decisions before they can be applied. To facilitate this interpretation, we propose Collection of High Importance Random Path Snippets (CHIRPS); a novel algorithm for explaining random forest classification per data instance. CHIRPS extracts a decision path from each tree in the forest that contributes to the majority classification, and then uses frequent pattern mining to identify the most commonly occurring split conditions. Then a simple, conjunctive form rule is constructed where the antecedent terms are derived from the attributes that had the most influence on the classification. This rule is returned alongside estimates of the rule’s precision and coverage on the training data along with counter-factual details. An experimental study involving nine data sets shows that classification rules returned by CHIRPS have a precision at least as high as the state of the art when evaluated on unseen data (0.91–0.99) and offer a much greater coverage (0.04–0.54). Furthermore, CHIRPS uniquely controls against under- and over-fitting solutions by maximising novel objective functions that are better suited to the local (per instance) explanation setting.

中文翻译:

CHIRPS:解释随机森林分类

现代机器学习方法通​​常会产生无法解释的“黑匣子”模型。然而,他们在 Human-in-the-Loop 流程中的需求一直在增加,即在应用自动化决策之前需要人工验证、批准或推理的流程。为了便于这种解释,我们建议收集高重要性随机路径片段(CHIRPS);一种解释每个数据实例随机森林分类的​​新算法。CHIRPS 从森林中的每棵树中提取有助于多数分类的决策路径,然后使用频繁模式挖掘来识别最常见的分裂条件。然后一个简单的,构建连接形式规则,其中先行词来自对分类影响最大的属性。该规则与对训练数据的规则精度和覆盖率以及反事实细节的估计一起返回。一项涉及九个数据集的实验研究表明,在对未见数据进行评估时,CHIRPS 返回的分类规则具有至少与现有技术一样高的精度 (0.91-0.99),并提供更大的覆盖范围 (0.04-0.54)。此外,CHIRPS 通过最大化更适合本地(每个实例)解释设置的新目标函数来独特地控制欠拟合和过拟合解决方案。该规则与对训练数据的规则精度和覆盖率以及反事实细节的估计一起返回。一项涉及九个数据集的实验研究表明,在对未见数据进行评估时,CHIRPS 返回的分类规则具有至少与现有技术一样高的精度 (0.91-0.99),并提供更大的覆盖范围 (0.04-0.54)。此外,CHIRPS 通过最大化更适合本地(每个实例)解释设置的新目标函数来独特地控制欠拟合和过拟合解决方案。该规则与对训练数据的规则精度和覆盖率以及反事实细节的估计一起返回。一项涉及九个数据集的实验研究表明,在对未见数据进行评估时,CHIRPS 返回的分类规则具有至少与现有技术一样高的精度 (0.91-0.99),并提供更大的覆盖范围 (0.04-0.54)。此外,CHIRPS 通过最大化更适合本地(每个实例)解释设置的新目标函数来独特地控制欠拟合和过拟合解决方案。
更新日期:2020-06-04
down
wechat
bug