当前位置: X-MOL 学术ACM Comput. Surv. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Practical Tutorial for Decision Tree Induction
ACM Computing Surveys ( IF 16.6 ) Pub Date : 2021-01-21 , DOI: 10.1145/3429739
Víctor Adrián Sosa Hernández 1 , Raúl Monroy 1 , Miguel Angel Medina-Pérez 1 , Octavio Loyola-González 2 , Francisco Herrera 3
Affiliation  

Experts from different domains have resorted to machine learning techniques to produce explainable models that support decision-making. Among existing techniques, decision trees have been useful in many application domains for classification. Decision trees can make decisions in a language that is closer to that of the experts. Many researchers have attempted to create better decision tree models by improving the components of the induction algorithm. One of the main components that have been studied and improved is the evaluation measure for candidate splits. In this article, we introduce a tutorial that explains decision tree induction. Then, we present an experimental framework to assess the performance of 21 evaluation measures that produce different C4.5 variants considering 110 databases, two performance measures, and 10× 10-fold cross-validation. Furthermore, we compare and rank the evaluation measures by using a Bayesian statistical analysis. From our experimental results, we present the first two performance rankings in the literature of C4.5 variants. Moreover, we organize the evaluation measures into two groups according to their performance. Finally, we introduce meta-models that automatically determine the group of evaluation measures to produce a C4.5 variant for a new database and some further opportunities for decision tree models.

中文翻译:

决策树归纳实用教程

来自不同领域的专家已经求助于机器学习技术来生成支持决策的可解释模型。在现有技术中,决策树在许多分类应用领域中都很有用。决策树可以用更接近专家的语言做出决策。许多研究人员试图通过改进归纳算法的组件来创建更好的决策树模型。已经研究和改进的主要组件之一是候选拆分的评估措施。在本文中,我们介绍了一个解释决策树归纳的教程。然后,我们提出了一个实验框架来评估 21 个评估度量的性能,这些评估度量考虑 110 个数据库、两个性能度量、和 10×10 倍交叉验证。此外,我们使用贝叶斯统计分析对评估措施进行比较和排名。根据我们的实验结果,我们展示了 C4.5 变体文献中的前两个性能排名。此外,我们根据绩效将评估措施分为两组。最后,我们介绍了元模型,这些元模型可以自动确定评估措施组,从而为新数据库生成 C4.5 变体,并为决策树模型提供一些进一步的机会。
更新日期:2021-01-21
down
wechat
bug