当前位置: X-MOL 学术arXiv.cs.LG › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Handling Missing Data in Decision Trees: A Probabilistic Approach
arXiv - CS - Machine Learning Pub Date : 2020-06-29 , DOI: arxiv-2006.16341
Pasha Khosravi, Antonio Vergari, YooJung Choi, Yitao Liang, Guy Van den Broeck

Decision trees are a popular family of models due to their attractive properties such as interpretability and ability to handle heterogeneous data. Concurrently, missing data is a prevalent occurrence that hinders performance of machine learning models. As such, handling missing data in decision trees is a well studied problem. In this paper, we tackle this problem by taking a probabilistic approach. At deployment time, we use tractable density estimators to compute the "expected prediction" of our models. At learning time, we fine-tune parameters of already learned trees by minimizing their "expected prediction loss" w.r.t.\ our density estimators. We provide brief experiments showcasing effectiveness of our methods compared to few baselines.

中文翻译:

处理决策树中的缺失数据:一种概率方法

决策树是一个流行的模型系列,因为它们具有吸引人的特性,例如可解释性和处理异构数据的能力。同时,缺失数据是一种普遍现象,会阻碍机器学习模型的性能。因此,处理决策树中的缺失数据是一个经过充分研究的问题。在本文中,我们通过采用概率方法来解决这个问题。在部署时,我们使用易处理的密度估计器来计算我们模型的“预期预测”。在学习时,我们通过最小化已学习树的“预期预测损失”(与我们的密度估计器)来微调参数。我们提供了简短的实验,展示了我们的方法与少数基线相比的有效性。
更新日期:2020-07-01
down
wechat
bug