Handling Missing Data in Decision Trees: A Probabilistic Approach,arXiv - CS - Machine Learning

当前位置： X-MOL 学术 › arXiv.cs.LG › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Handling Missing Data in Decision Trees: A Probabilistic Approach
arXiv - CS - Machine Learning Pub Date : 2020-06-29 , DOI: arxiv-2006.16341
Pasha Khosravi, Antonio Vergari, YooJung Choi, Yitao Liang, Guy Van den Broeck

Decision trees are a popular family of models due to their attractive properties such as interpretability and ability to handle heterogeneous data. Concurrently, missing data is a prevalent occurrence that hinders performance of machine learning models. As such, handling missing data in decision trees is a well studied problem. In this paper, we tackle this problem by taking a probabilistic approach. At deployment time, we use tractable density estimators to compute the "expected prediction" of our models. At learning time, we fine-tune parameters of already learned trees by minimizing their "expected prediction loss" w.r.t.\ our density estimators. We provide brief experiments showcasing effectiveness of our methods compared to few baselines.

中文翻译：

处理决策树中的缺失数据：一种概率方法

决策树是一个流行的模型系列，因为它们具有吸引人的特性，例如可解释性和处理异构数据的能力。同时，缺失数据是一种普遍现象，会阻碍机器学习模型的性能。因此，处理决策树中的缺失数据是一个经过充分研究的问题。在本文中，我们通过采用概率方法来解决这个问题。在部署时，我们使用易处理的密度估计器来计算我们模型的“预期预测”。在学习时，我们通过最小化已学习树的“预期预测损失”（与我们的密度估计器）来微调参数。我们提供了简短的实验，展示了我们的方法与少数基线相比的有效性。

更新日期：2020-07-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文