当前位置: X-MOL 学术Expert Syst. Appl. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Learning comprehensible and accurate hybrid trees
Expert Systems with Applications ( IF 7.5 ) Pub Date : 2020-09-16 , DOI: 10.1016/j.eswa.2020.113980
Rok Piltaver , Mitja Luštrek , Sašo Džeroski , Martin Gjoreski , Matjaž Gams

Finding the best classifiers according to different criteria is often performed by a multi-objective machine learning algorithm. This study considers two criteria that are usually treated as the most important when deciding which classifier to apply in practice: comprehensibility and accuracy. A model that offers a broad range of trade-offs between the two criteria is introduced because they conflict; i.e., increasing one decreases the other. The choice of the model is motivated by the fact that domain experts often formalize decisions based on knowledge that can be represented by comprehensible rules and some tacit knowledge. This approach is mimicked by a hybrid tree that consists of comprehensible parts that originate from a regular classification tree and incomprehensible parts that originate from an accurate black-box classifier. An empirical evaluation on 23 UCI datasets shows that the hybrid trees provide trade-offs between the accuracy and comprehensibility that are not possible using traditional machine learning models. A corresponding hybrid-tree comprehensibility metric is also proposed. Furthermore, the paper presents a novel algorithm for learning MAchine LeArning Classifiers with HybrId TrEes (MALACHITE), and it proves that the algorithm finds a complete set of nondominated hybrid trees with regard to their accuracy and comprehensibility. The algorithm is shown to be faster than the well-known multi-objective evolutionary optimization algorithm NSGA-II for trees with moderate size, which is a prerequisite for comprehensibility. On the other hand, the MALACHITE algorithm can generate considerably larger hybrid-trees than a naïve exhaustive search algorithm in a reasonable amount of time. In addition, an interactive iterative data mining process based on the algorithm is proposed that enables inspection of the Pareto set of hybrid trees. In each iteration, the domain expert analyzes the current set of nondominated hybrid trees, infers domain relations, and sets the parameters for the next machine learning step accordingly.



中文翻译:

学习可理解且准确的混合树

根据不同标准找到最佳分类器通常是由多目标机器学习算法执行的。这项研究考虑了在决定在实践中应用哪个分类器时通常被视为最重要的两个标准:可理解性和准确性。引入了一个模型,该模型在两个条件之间提供了广泛的权衡取舍,因为它们相互冲突。也就是说,增加一个减少另一个。选择模型的原因是领域专家经常根据可以理解的规则和一些隐性知识表示的知识来形式化决策。这种方法被混合树模仿,混合树由源自常规分类树的可理解部分和源自精确黑盒分类器的不可理解部分组成。对23个UCI数据集的实证评估表明,混合树提供了使用传统机器学习模型无法实现的准确性和可理解性之间的权衡。还提出了相应的混合树可理解性度量。此外,本文提出了一种新的算法,用于学习带有混合树TrEes的机器学习分类器(MALACHITE),并证明该算法在其准确性和可理解性方面找到了一套完整的非优势混合树。对于中等大小的树,该算法显示出比众所周知的多目标进化优化算法NSGA-II更快,这是可理解性的前提。另一方面,与朴素的穷举搜索算法相比,MALACHITE算法在合理的时间内可以生成更大的混合树。此外,提出了一种基于该算法的交互式迭代数据挖掘过程,该过程能够检查混合树的帕累托集。在每次迭代中,领域专家分析当前的非主导混合树集,推断领域关系,并相应地为下一个机器学习步骤设置参数。

更新日期:2020-09-16
down
wechat
bug