当前位置: X-MOL 学术Knowl. Inf. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Bayesian network classifiers using ensembles and smoothing
Knowledge and Information Systems ( IF 2.5 ) Pub Date : 2020-03-30 , DOI: 10.1007/s10115-020-01458-z
He Zhang , François Petitjean , Wray Buntine

Bayesian network classifiers are, functionally, an interesting class of models, because they can be learnt out-of-core, i.e. without needing to hold the whole training data in main memory. The selective K-dependence Bayesian network classifier (SKDB) is state of the art in this class of models and has shown to rival random forest (RF) on problems with categorical data. In this paper, we introduce an ensembling technique for SKDB, called ensemble of SKDB (ESKDB). We show that ESKDB significantly outperforms RF on categorical and numerical data, as well as rivalling XGBoost. ESKDB combines three main components: (1) an effective strategy to vary the networks that is built by single classifiers (to make it an ensemble), (2) a stochastic discretization method which allows to both tackle numerical data as well as further increases the variance between different components of our ensemble and (3) a superior smoothing technique to ensure proper calibration of ESKDB’s probabilities. We conduct a large set of experiments with 72 datasets to study the properties of ESKDB (through a sensitivity analysis) and show its competitiveness with the state of the art.

中文翻译:

使用集成和平滑的贝叶斯网络分类器

贝叶斯网络分类器在功能上是一类有趣的模型,因为可以在核心外进行学习,即无需将整个训练数据保存在主存储器中。选择性K依赖贝叶斯网络分类器(SKDB)是此类模型中的最新技术,并且在分类数据问题上已表现出与随机森林(RF)相当的优势。在本文中,我们介绍了SKDB的集成技术,称为SKDB集成(ESKDB)。我们显示,ESKDB在分类和数值数据上明显优于RF,并且可以与XGBoost媲美。ESKDB包含三个主要组成部分:(1)改变单个分类器构建的网络的有效策略(使其成为一个整体),(2)一种随机离散化方法,既可以处理数值数据,又可以进一步增加集合中不同成分之间的方差;(3)一种出色的平滑技术,可确保对ESKDB概率进行正确的校准。我们对72个数据集进行了大量实验,以研究ESKDB的属性(通过敏感性分析),并展示了其与最新技术的竞争力。
更新日期:2020-03-30
down
wechat
bug