Instance-based weighting filter for superparent one-dependence estimators

doi:10.1016/j.knosys.2020.106085

Knowledge-Based Systems

Volume 203, 5 September 2020, 106085

https://doi.org/10.1016/j.knosys.2020.106085 Get rights and content

Abstract

Bayesian network classifiers remain of great interest in recent years, among which semi-naive Bayesian classifiers which utilize superparent one-dependence estimators (SPODEs) have shown superior predictive power. Linear weighting schemes are effective and efficient ones for linearly combining SPODEs, whereas it is a challenging task for averaged one-dependence estimators (AODE) to find globally optimal and fixed weights for its SPODE members. The joint probability distribution of SPODE may not always fit different test instances to the same extent, thus a flexible rather than rigid weighting scheme would be a feasible solution for the final AODE to approximate the true joint probability distribution. Based on this promise, we propose a novel instance-based weighting filter, which can flexibly assign discriminative weights to each single SPODE for different test instances. Meanwhile, the weight considers not only the mutual dependence between the superparent and class variable, but also the conditional dependence between the superparent and non-superparent attributes. Experimental comparison on 30 publicly available datasets shows that SPODE with instance-based weighting filter outperforms state-of-the-art BNCs with and without weighting methods in terms of zero–one loss, bias and variance with minimal additional computation.

Introduction

Classification is one of the most active research area in both machine learning and data mining communities. Researchers have proposed tremendous classification algorithms [1], [2], [3], [4], [5], among which Bayesian network classifiers (BNCs) receive much attention in recent years, especially after the success of naive Bayes (NB) [6]. However, in practice, the conditional independence assumption of NB rarely holds, and as a result its probability estimates may be suboptimal. A large literature addresses approaches to relax NB’s conditional independence assumption, which can be broadly placed into two categories: semi-naive Bayes approaches and model weighting approaches. The first category is aimed at enhancing the accuracy of NB by introducing a limited number of arcs that represent additional dependence relationships [7], [8]. On the other hand, attribute weighting approaches are usually viewed as a means of increasing the influence of highly predictive attributes and discounting attributes that have little predictive value [9], [10].

Superparent one-dependence estimator (SPODE) [11] is a subcategory of one-dependence estimator which allows all attributes to depend on one superparent except the class variable. Since it is a challenging task for SPODE to find a globally optimal superparent, ensemble methods are often applied to use each single attribute as a superparent to build a SPODE and combine multiple SPODEs for classification. Averaged one-dependence estimator (AODE) [12] is the representative of such SPODE, which has demonstrated excellent classification accuracy with very little extra computational cost. However, the averaging strategy of AODE (i.e., each attribute is treated equally) neglects the different roles of attributes in various learning tasks, which may lead to negative effect on the generalization performance. Existing learning approaches addressing this issue can be broadly placed into four categories:

(1) Attribute weighting [9], [10], [13], [14], [15], [16], [17], [18];

(2) Attribute selection [19], [20], [21], [22];

(3) Model selection [23], [24], [25], [26];

(4) Lazy learning [27], [28], [29], [30].

In this paper, we focus our attention on attribute weighting methods and the linear weighting method can be described as follows, $\hat{P} (c, x) = \sum_{i = 1}^{n} w_{i} P_{i} (c, x) .$

Thus the joint probability of weighted AODE is the summation of weighted joint probabilities of its SPODE members. If the SPODE members are assigned with fixed weights, then the extents to which corresponding joint probability distributions of the SPODE members fit different data points should always remain the same. Fixed weights can be regarded as a globally optimal solution to weighting. However, if a globally optimal solution is very sensitive to perturbations in the data then there may be cases where it is not good to use this solution [31]. Take dataset Phoneme for example, Fig. 1 presents the distributions of the joint probability for SPODEs on every instance. As can be seen, the distributions differ greatly for different SPODEs, and the probability distribution of the same SPODE does not fit different instances to the same extent. Thus, we argue that the fixed weights (globally optimal solutions) may result in biased estimate of joint probability distribution of the weighted AODE and then suboptimal classification performance. Finding locally optimal solution to weighting for different test instances is a feasible solution. Thus, in this paper we propose a novel filter weighting approach, called instance-based weighting, which can self-adaptively assign appropriate weights to the same SPODE while classifying different instances. Meanwhile, the method not only considers the mutual dependence relations between the superparent and class variable, but also takes into account the difference among sub-models. This method can be applied in various SPODE and in this paper we take AODE as an example. We demonstrate experimentally that the instance-based weighting AODE (IWAODE) outperforms other state-of-the-art BNCs with and without weighting methods in terms of zero–one loss, bias and variance without incurring high computation on 30 publicly available datasets [32].

The rest of this paper is organized as follows. Section 2 introduces AODE and provides a brief survey of related weighting approaches. Our proposed novel techniques for instance-based weighting are described in Section 3. Section 4 presents experimental results of IWAODE and its comparison with other state-of-the-art learners. Section 5 draws conclusions.

Section snippets

Related work

The structure of BNCs [33] is a directed acyclic graph, where nodes denote the predictive attributes and arcs denote the dependence relationships between child nodes and their parent nodes. NB is the simplest BNCs, which calculates the probability of a class label given data using a conditional independence assumption that all attributes are independent given the label. That is, let each instance x be characterized with $n$ values { $x_{1}, \dots$ , $x_{n}$ } for attributes ${X_{1}, \dots, X_{n}}$ , and class label $c \in {c_{1}, \dots,$ $c_{m}}$

The instance-based weighting filter

Since Kullback–Leibler (KL) is often used to measure the attribute weight, we first tend to explain the difference among the sub-models of AODE from the viewpoint of KL. KL, which is also regarded as relative entropy, can measure the distance between two discrete probability distributions (denoted as $P (d)$ and $Q (d)$ ), which can be defined as follows [40], $K L (P ∥ Q) = \sum_{d} P (d) l o g \frac{P (d)}{Q (d)} = H_{Q} (D) - H_{P} (D)$ where $D = X ⋃ C$ . Let $P (d)$ take the expectation, we can find that KL is the expectation of logarithmic

Experimental results

To illustrate the effectiveness of our proposed instance-based weighting filter, we conduct a number of experiments on 30 datasets from the UCI machine learning repository. The description of these 30 datasets, including the number of attributes, classes and instances, is shown in Table 1. For each dataset, numeric attributes are discretized using Minimum Description Length method [44]. Missing values for qualitative attributes are replaced with values that most frequently appear. For

Conclusions

Linear weighting schemes provide effective and efficient solutions for linearly combining SPODEs. However, the joint probability distribution of SPODE may not always fit different test instances to the same extent, and fixed weights may lead to suboptimal estimate of the joint probability distribution of the final AODE. In this paper we propose a flexible weighting scheme, instance-based weighting filter, which can self-adaptively assign weights to SPODEs while dealing with different test

CRediT authorship contribution statement

Zhiyi Duan: Conceptualization, Methodology, Software, Writing - original draft. Limin Wang: Validation, Writing - review & editing, Visualization. Shenglei Chen: Formal analysis, Investigation, Project administration. Minghui Sun: Resources, Data curation, Supervision, Funding acquisition.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was supported by the National Natural Science Foundation of China , Grant No. 61272209 and No. 61872164.

Zhiyi Duan received the BEng degree in Jilin University, Changchun, China in 2014 and he is current a Ph.D. student in the department of computer science and technology of Jilin University. His research interests include data mining and Bayesian network.

References (51)

SunX. et al.
Feature selection using dynamic weights for classification
Knowl.-Based. Syst.
(2013)
ChenH.L. et al.
A novel bankruptcy prediction model based on an adaptive fuzzy k-nearest neighbor method
Knowl.-Based. Syst.
(2011)
WuJ. et al.
SODE: Self-adaptive one-dependence estimators for classification
Pattern Recognit.
(2016)
JiangL. et al.
Deep feature weighting for naive bayes and its application to text classification
Eng. Appl. Artif. Intell.
(2016)
WuJ. et al.
Self-adaptive attribute weighting for Naive Bayes classification
Expert. Syst. Appl.
(2015)
ChenS.L. et al.
Sample-based attribute selective ande for large data
IEEE Trans. Knowl. Data. Eng.
(2016)
LiuD.Y. et al.
A search problem in complex diagnostic baysian networks
Knowl.-Based. Syst.
(2012)
HallM.A.
A decision tree-based attribute weighting filter for naive Bayes
Knowl.-Based. Syst.
(2007)
ZhangX. et al.
A classification performance measure considering the degree of classification difficulty
Neurocomputing
(2016)
ZhengX.L. et al.
Efficient learning ensemble superparent-one-dependence estimator by maximizing conditional log likelihood
Expert. Syst. Appl.
(2015)

K. Nguyen, T. Le, T.D. Nguyen, D. Phung, G.I. Webb, Robust Bayesian kernel machine via Stein variational gradient...

T. Vandal, E. Kodra, J. Dy, S. Ganguly, R. Nemani, A.R. Ganguly, Qantifying uncertainty in discrete-continuous and...

ZhouL.P. et al.

Learning discriminative Bayesian networks from high-dimensional continuous neuroimaging data

IEEE Trans. Pattern Anal. Mach. Intell.

(2015)

D.D. Lewis, Naive Bayes at forty: The independence assumption in information retrieval, in: ECML-98: Proceedings of the...

M. Sahami, Learning limited dependence Bayesian classifiers, in: Proceedings of second ACM SIGKDD Conference on...

MartinezA.M. et al.

Scalable learning of Bayesian network classifiers

J. Mach. Learn. Res.

(2013)

ZaidiN.A. et al.

Alleviating naive Bayes attribute independence assumption by attribute weighting

J. Mach. Learn. Res.

(2013)

KeoghE.J. et al.

Learning the structure of augmented Bayesian classifiers

Int. J. Artif. Intell. Tools

(2002)

WebbG.I. et al.

Not so naive Bayes: Aggregating one-dependence estimators

Mach. Learn.

(2005)

WuJ. et al.

Attribute weighting via differential evolution algorithm for attribute weighted naive Bayes (WNB)

J. Comput. Inform. Syst.

(2011)

JiangL. et al.

Weighted average of one-dependence estimators

J. Exp. Theor. Artif. Intell.

(2012)

XiangZ.L. et al.

Attribute weighting for averaged one-dependence estimators

Appl. Intell.

(2016)

JiangL.X. et al.

A correlation-based feature weighting filter for naive bayes

IEEE Trans. Knowl. Data Eng.

(2019)

TangB. et al.

Toward optimal feature selection in naive bayes for text categorization

IEEE Trans. Knowl. Data Eng.

(2016)

S.L. Chen, A.M. Martinez, G.I. Webb, Highly scalable attribute selection for averaged one-dependence estimators, in:...

Cited by (25)

Towards more accurate and interpretable model: Fusing multiple knowledge relations into deep knowledge tracing[Formula presented]
2024, Expert Systems with Applications
With the rapid growth of online education, Knowledge tracing (KT) has become a well established problem, which evaluates the knowledge states of students and predicts their performance on new exercises. Recently, more and more works have noticed the importance of relations among knowledge points and proposed to introduce the knowledge relations into KT. However, how to precisely learn the representation of different types of knowledge relations and effectively fuse multiple relations into KT is still challenging. To address this issue, we propose a novel KT model, called Deep Knowledge Tracing with Multiple Relations (DKTMR), which can simultaneously fuse the directed relation and undirected relation into KT. More specifically, casting the knowledge relations as a graph, DKTMR designs to utilize two types of Generative Adversarial Networks (GANs) to learn the representation of knowledge point with different relations via graph representation learning. Then, the Gated Recurrent Unit (GRU) is used to update the students’ knowledge states. Furthermore, to consider the different contribution for each type of relation to the final prediction, an attention-based fusion method is proposed to learn the coefficients for different relations. Compared with several state-of-the-art baselines, the extensive experiments on four real-world datasets demonstrate the effectiveness and interpretability of DKTMR.
Learning high-dependence Bayesian network classifier with robust topology
2024, Expert Systems with Applications
The increase in data variability and quantity makes it urgent for learning complex multivariate probability distributions. The state-of-the-art Tree Augmented Naive Bayes (TAN) classifier uses maximum weighted spanning tree (MWST) to graphically model data with excellent time and space complexity. In this paper, we theoretically prove the feasibility of scaling up one-dependence MWST to model high-dependence relationships, and then propose to apply a heuristic search strategy to improve the fitness of extended topology to data. The newly added edges to each attribute node may provide a local optimal solution. Then ensemble learning is introduced to improve the generalization performance and reduce the sensitivity to variation in training data. The experimental results on 32 benchmark datasets reveal that this highly scalable algorithm inherits the expressive power of TAN and achieves an excellent bias–variance tradeoff, and it also demonstrates competitive classification performance when compared to a range of high-dependence or ensemble learning algorithms.
Route planning model based on multidimensional eigenvector processing in vehicular fog computing
2024, Computer Communications
With the rapid development of information technology, the informationization level of sport tourism has been improved in an all-around way, which makes a large number of data accumulated in the management system of road traffic. However, the traditional association rule algorithm cannot deal with huge data. To make up for the deficiency of predicting suitability from a single perspective, this paper, from the perspective of road daily travel planning, constructs a multi-dimensional characteristic model. Based on eigenvector processing as a research content, by carrying out a multi-dimensional prediction on the whole data set and considering the limitation problem of the influence of time, space, and items on daily travel routes, the invention provides a daily travel suitability road daily travel planning route prediction research method based on multiple dimensions. The invention also extracts the outstanding performance of a network model in a single variable problem based on deep interest and double characteristics. The combination of SVR and GBRT algorithm makes up for the one-sidedness of single perspective prediction. It uses the weighted fusion principle to fuse the results and establishes a multi-dimensional route suitability prediction model under this mode. Experiments verify that the prediction of multivariate dimensional data achieves the desired results. With the help of the dependence between the level and element dimension data, the future route planning trend can be judged. This algorithm compared to the ant colony algorithm and the traditional genetic algorithm increased by 15.6% and 15.1%. The system response time has been increased by more than 60%, which can effectively improve the accuracy of prediction. Therefore, in the VFC environment, the actual user needs can be improved, the planning and management of traffic routes can be guided, and the development of sport tourism systems of Taihang Mountain can be further promoted.
Learning causal Bayesian networks based on causality analysis for classification
2022, Engineering Applications of Artificial Intelligence
Citation Excerpt :
Bayesian inference aims to speed up inference, improve computational accuracy, and reduce computational complexity by computing the posterior probability distribution of certain non-evidence nodes of interest to us. There exist mainly two types of structural learning approaches: the Conditional Independence Testing method (CIT) and Score-Search method (SS) (Duan et al., 2020; Liu et al., 2020c). The basic idea of CIT is to use appropriate loss functions to discover independence relationships between predictive attributes.
Revealing causal information by analyzing purely observational data, known as causal discovery, has drawn much attention. To prove that the causal knowledge mined from data can be applied to facilitate various machine learning tasks (e.g., classification), we propose to measure, describe and evaluate the causalities in the framework of Bayesian network (BN) learning. In this paper, heuristic search strategy is applied to explore the causal interpretation in the form of directed acyclic graph (DAG) for classification. While adding directed edges to the DAG, we first introduce the log-likelihood equivalence assertion to make the learned joint probability encoded in BN approximates the true one, then introduce the causal dependence assertion to assess the rationality of the learned causal relationship. We perform a range of experiments on 35 datasets and empirically show that this novel algorithm demonstrates competitive classification performance and excellent causal interpretation compared to state-of-the-art Bayesian network classifiers (e.g. SKDB, WATAN, SLB, and TAODE).
Alleviating the attribute conditional independence and I.I.D. assumptions of averaged one-dependence estimator by double weighting
2022, Knowledge-Based Systems
Citation Excerpt :
Yu et al. [21] propose an attribute value weighted AODE (AVWAODE), which respectively employs the information gain (IG) and Kullback–Leibler (KL) measure to learn the dependency relationships between class variables and attribute values. Duan et al. [22] propose another weighting filter, named instance-based weighting filter, which is used to dynamically assign flexible weights rather than fixed weights with different SPODEs of different instances. Wang et al. [23] propose a weighted AODE algorithm, named adaptively weighted one-dependence estimator (AWAODE), which adaptively assigns weights for SPODEs to alleviate the independence assumption and takes each instance as the target to learn probability distribution.
Learning Bayesian network classifiers (BNCs) from data is NP-hard. Of numerous BNCs, averaged one-dependence estimator (AODE) performs extremely well against more sophisticated newcomers, and its trade-off between bias and variance can be attributed to the independence assumption and i.i.d. assumption, which respectively address the issues of structure complexity and data complexity. To alleviate these assumptions and improve AODE, we propose to apply double weighting, including attribute weighting and model weighting, to finely tune the estimates of conditional probability based on generative learning and joint probability based on discriminative learning, respectively. Instance weighting is introduced to define the information-theoretic metrics for identifying the variation in probability distributions for different data points. This highly scalable learning approach can establish a decision boundary that is specifically tailored to each instance. Our extensive experimental evaluation on 34 datasets from the UCI machine learning repository shows that, attribute weighting and model weighting are complementary although they can work separately. The proposed AODE applying double weighting schema, called DWAODE, is a competitive alternative to other weighting approaches. The experimental results show that DWAODE demonstrates significant advantage in terms of zero–one loss, bias–variance decomposition, RMSE (root mean squared error), Friedman and Nemenyi tests.
Multi independent latent component extension of naive Bayes classifier
2021, Knowledge-Based Systems
Citation Excerpt :
This simplicity, however, is basically founded on a strong restricting assumption of conditional independence of all pairs of features given the state of the class variable violated many real-world problems by the high plausibility of existence at least two conditionally dependent features. To tackle this problem, numerous research works have been conducted roughly in three categories: (1) Feature selection [2–4]; (2) Feature weighting [5,6,6,7]; (3) Structure extension [8–10]. The former two approaches deal with features and their values.
Naive Bayes (NB) classifier ease of use along with its remarkable performance has led many researchers to extend the scope of its applications to real-world domains by relaxing the conditional independence assumption of features given the information about the class variable. However, fulfilling this objective, most of the generalizations, cut their own way through compromising the model’s simplicity, make more complex classifiers with a substantial deviation from the original one. Multi Independent Latent Component Naive Bayes Classifier (MILC-NB) leverages a set of latent variables to preserve the overall structure of naive Bayes classifier while rectifying its major restriction. Each latent variable is responsible for keeping a subset of conditionally dependent features d-connected within a component, and the set of features is divided into non-overlapping partitions across different components accordingly. We prove that components are conditionally independent given the information about the class variable which allows us to devise novel mathematical methods with a substantial reduction in the complexities of classification and learning. Experiments on 34 datasets obtained from the OpenML repository indicate that MILC-NB outperforms state-of-the-art classifiers in terms of area under the ROC curve (AUC) and classification accuracy (ACC).

View all citing articles on Scopus

Limin Wang received the Ph.D. degree in computer science from JiLin University, in 2005. He is currently a professor in the College of Computer Science and Technology, JiLin University, China. His research interests include probabilistic logic inference and Bayesian network. He has published innovative papers in journals such as Knowledge-Based Systems, Expert System with Applications, and Progress in Natural Science.

Minghui Sun received the Ph.D. degree in computer science from Kochi University of Technology, Japan, in 2011. He is currently an assistant professor in the college of computer science and technology in Jilin University, China. He is interested in using HCI methods to solve challenging real world computing problems in many areas, including tactile interface, pen-based interface and tangible interface.

View full text

Instance-based weighting filter for superparent one-dependence estimators

Abstract

Introduction

Section snippets

Related work

The instance-based weighting filter

Experimental results

Conclusions

CRediT authorship contribution statement

Declaration of Competing Interest

Acknowledgments

Knowl.-Based. Syst.

Knowl.-Based. Syst.

Pattern Recognit.

Eng. Appl. Artif. Intell.

Expert. Syst. Appl.

IEEE Trans. Knowl. Data. Eng.

Knowl.-Based. Syst.

Knowl.-Based. Syst.

Neurocomputing

Expert. Syst. Appl.

Learning discriminative Bayesian networks from high-dimensional continuous neuroimaging data

IEEE Trans. Pattern Anal. Mach. Intell.

Scalable learning of Bayesian network classifiers

J. Mach. Learn. Res.

Alleviating naive Bayes attribute independence assumption by attribute weighting

J. Mach. Learn. Res.

Learning the structure of augmented Bayesian classifiers

Int. J. Artif. Intell. Tools

Not so naive Bayes: Aggregating one-dependence estimators

Mach. Learn.

Attribute weighting via differential evolution algorithm for attribute weighted naive Bayes (WNB)

J. Comput. Inform. Syst.

Weighted average of one-dependence estimators

J. Exp. Theor. Artif. Intell.

Attribute weighting for averaged one-dependence estimators

Appl. Intell.

A correlation-based feature weighting filter for naive bayes

IEEE Trans. Knowl. Data Eng.

Toward optimal feature selection in naive bayes for text categorization

IEEE Trans. Knowl. Data Eng.