当前位置: X-MOL 学术IEEE Trans. Knowl. Data. Eng. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
DPPred: An Effective Prediction Framework with Concise Discriminative Patterns
IEEE Transactions on Knowledge and Data Engineering ( IF 8.9 ) Pub Date : 2018-07-01 , DOI: 10.1109/tkde.2017.2757476
Jingbo Shang 1 , Meng Jiang 1 , Wenzhu Tong 1 , Jinfeng Xiao 1 , Jian Peng 1 , Jiawei Han 1 ,
Affiliation  

In the literature, two series of models have been proposed to address prediction problems including classification and regression. Simple models, such as generalized linear models, have ordinary performance but strong interpretability on a set of simple features. The other series, including tree-based models, organize numerical, categorical, and high dimensional features into a comprehensive structure with rich interpretable information in the data. In this paper, we propose a novel Discriminative Pattern-based Prediction framework ( $\sf {DPPred}$ ) to accomplish the prediction tasks by taking their advantages of both effectiveness and interpretability. Specifically, $\sf {DPPred}$ adopts the concise discriminative patterns that are on the prefix paths from the root to leaf nodes in the tree-based models. $\sf {DPPred}$ selects a limited number of the useful discriminative patterns by searching for the most effective pattern combination to fit generalized linear models. Extensive experiments show that in many scenarios, $\sf {DPPred}$ provides competitive accuracy with the state-of-the-art as well as the valuable interpretability for developers and experts. In particular, taking a clinical application dataset as a case study, our $\sf {DPPred}$ outperforms the baselines by using only 40 concise discriminative patterns out of a potentially exponentially large set of patterns.

中文翻译:

DPPred:具有简明判别模式的有效预测框架

在文献中,已经提出了两个系列的模型来解决包括分类和回归在内的预测问题。简单模型,例如广义线性模型,性能一般,但对一组简单特征具有很强的可解释性。其他系列,包括基于树的模型,将数值、分类和高维特征组织成一个综合结构,在数据中具有丰富的可解释信息。在本文中,我们提出了一种新颖的基于判别模式的预测框架( $\sf {DPPred}$ )利用其有效性和可解释性的优势来完成预测任务。具体来说,$\sf {DPPred}$ 在基于树的模型中从根节点到叶节点的前缀路径上采用简洁的判别模式。 $\sf {DPPred}$ 通过搜索最有效的模式组合来拟合广义线性模型,选择有限数量的有用判别模式。大量实验表明,在许多场景中,$\sf {DPPred}$为开发人员和专家提供最先进的具有竞争力的准确性以及宝贵的可解释性。特别是,以临床应用数据集为案例研究,我们的$\sf {DPPred}$ 通过仅使用可能呈指数级增长的一组模式中的 40 个简明判别模式,其性能优于基线。
更新日期:2018-07-01
down
wechat
bug