Elsevier

Knowledge-Based Systems

Volume 203, 5 September 2020, 106085
Knowledge-Based Systems

Instance-based weighting filter for superparent one-dependence estimators

https://doi.org/10.1016/j.knosys.2020.106085Get rights and content

Abstract

Bayesian network classifiers remain of great interest in recent years, among which semi-naive Bayesian classifiers which utilize superparent one-dependence estimators (SPODEs) have shown superior predictive power. Linear weighting schemes are effective and efficient ones for linearly combining SPODEs, whereas it is a challenging task for averaged one-dependence estimators (AODE) to find globally optimal and fixed weights for its SPODE members. The joint probability distribution of SPODE may not always fit different test instances to the same extent, thus a flexible rather than rigid weighting scheme would be a feasible solution for the final AODE to approximate the true joint probability distribution. Based on this promise, we propose a novel instance-based weighting filter, which can flexibly assign discriminative weights to each single SPODE for different test instances. Meanwhile, the weight considers not only the mutual dependence between the superparent and class variable, but also the conditional dependence between the superparent and non-superparent attributes. Experimental comparison on 30 publicly available datasets shows that SPODE with instance-based weighting filter outperforms state-of-the-art BNCs with and without weighting methods in terms of zero–one loss, bias and variance with minimal additional computation.

Introduction

Classification is one of the most active research area in both machine learning and data mining communities. Researchers have proposed tremendous classification algorithms [1], [2], [3], [4], [5], among which Bayesian network classifiers (BNCs) receive much attention in recent years, especially after the success of naive Bayes (NB) [6]. However, in practice, the conditional independence assumption of NB rarely holds, and as a result its probability estimates may be suboptimal. A large literature addresses approaches to relax NB’s conditional independence assumption, which can be broadly placed into two categories: semi-naive Bayes approaches and model weighting approaches. The first category is aimed at enhancing the accuracy of NB by introducing a limited number of arcs that represent additional dependence relationships [7], [8]. On the other hand, attribute weighting approaches are usually viewed as a means of increasing the influence of highly predictive attributes and discounting attributes that have little predictive value [9], [10].

Superparent one-dependence estimator (SPODE) [11] is a subcategory of one-dependence estimator which allows all attributes to depend on one superparent except the class variable. Since it is a challenging task for SPODE to find a globally optimal superparent, ensemble methods are often applied to use each single attribute as a superparent to build a SPODE and combine multiple SPODEs for classification. Averaged one-dependence estimator (AODE) [12] is the representative of such SPODE, which has demonstrated excellent classification accuracy with very little extra computational cost. However, the averaging strategy of AODE (i.e., each attribute is treated equally) neglects the different roles of attributes in various learning tasks, which may lead to negative effect on the generalization performance. Existing learning approaches addressing this issue can be broadly placed into four categories:

(1) Attribute weighting [9], [10], [13], [14], [15], [16], [17], [18];

(2) Attribute selection [19], [20], [21], [22];

(3) Model selection [23], [24], [25], [26];

(4) Lazy learning [27], [28], [29], [30].

In this paper, we focus our attention on attribute weighting methods and the linear weighting method can be described as follows, Pˆ(c,x)=i=1nwiPi(c,x).

Thus the joint probability of weighted AODE is the summation of weighted joint probabilities of its SPODE members. If the SPODE members are assigned with fixed weights, then the extents to which corresponding joint probability distributions of the SPODE members fit different data points should always remain the same. Fixed weights can be regarded as a globally optimal solution to weighting. However, if a globally optimal solution is very sensitive to perturbations in the data then there may be cases where it is not good to use this solution [31]. Take dataset Phoneme for example, Fig. 1 presents the distributions of the joint probability for SPODEs on every instance. As can be seen, the distributions differ greatly for different SPODEs, and the probability distribution of the same SPODE does not fit different instances to the same extent. Thus, we argue that the fixed weights (globally optimal solutions) may result in biased estimate of joint probability distribution of the weighted AODE and then suboptimal classification performance. Finding locally optimal solution to weighting for different test instances is a feasible solution. Thus, in this paper we propose a novel filter weighting approach, called instance-based weighting, which can self-adaptively assign appropriate weights to the same SPODE while classifying different instances. Meanwhile, the method not only considers the mutual dependence relations between the superparent and class variable, but also takes into account the difference among sub-models. This method can be applied in various SPODE and in this paper we take AODE as an example. We demonstrate experimentally that the instance-based weighting AODE (IWAODE) outperforms other state-of-the-art BNCs with and without weighting methods in terms of zero–one loss, bias and variance without incurring high computation on 30 publicly available datasets [32].

The rest of this paper is organized as follows. Section 2 introduces AODE and provides a brief survey of related weighting approaches. Our proposed novel techniques for instance-based weighting are described in Section 3. Section 4 presents experimental results of IWAODE and its comparison with other state-of-the-art learners. Section 5 draws conclusions.

Section snippets

Related work

The structure of BNCs [33] is a directed acyclic graph, where nodes denote the predictive attributes and arcs denote the dependence relationships between child nodes and their parent nodes. NB is the simplest BNCs, which calculates the probability of a class label given data using a conditional independence assumption that all attributes are independent given the label. That is, let each instance x be characterized with n values {x1,, xn} for attributes {X1,,Xn}, and class label c{c1,, cm}

The instance-based weighting filter

Since Kullback–Leibler (KL) is often used to measure the attribute weight, we first tend to explain the difference among the sub-models of AODE from the viewpoint of KL. KL, which is also regarded as relative entropy, can measure the distance between two discrete probability distributions (denoted as P(d) and Q(d)), which can be defined as follows [40], KL(PQ)=dP(d)logP(d)Q(d)=HQ(D)HP(D)where D=XC. Let P(d) take the expectation, we can find that KL is the expectation of logarithmic

Experimental results

To illustrate the effectiveness of our proposed instance-based weighting filter, we conduct a number of experiments on 30 datasets from the UCI machine learning repository. The description of these 30 datasets, including the number of attributes, classes and instances, is shown in Table 1. For each dataset, numeric attributes are discretized using Minimum Description Length method [44]. Missing values for qualitative attributes are replaced with values that most frequently appear. For

Conclusions

Linear weighting schemes provide effective and efficient solutions for linearly combining SPODEs. However, the joint probability distribution of SPODE may not always fit different test instances to the same extent, and fixed weights may lead to suboptimal estimate of the joint probability distribution of the final AODE. In this paper we propose a flexible weighting scheme, instance-based weighting filter, which can self-adaptively assign weights to SPODEs while dealing with different test

CRediT authorship contribution statement

Zhiyi Duan: Conceptualization, Methodology, Software, Writing - original draft. Limin Wang: Validation, Writing - review & editing, Visualization. Shenglei Chen: Formal analysis, Investigation, Project administration. Minghui Sun: Resources, Data curation, Supervision, Funding acquisition.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was supported by the National Natural Science Foundation of China , Grant No. 61272209 and No. 61872164.

Zhiyi Duan received the BEng degree in Jilin University, Changchun, China in 2014 and he is current a Ph.D. student in the department of computer science and technology of Jilin University. His research interests include data mining and Bayesian network.

References (51)

  • K. Nguyen, T. Le, T.D. Nguyen, D. Phung, G.I. Webb, Robust Bayesian kernel machine via Stein variational gradient...
  • T. Vandal, E. Kodra, J. Dy, S. Ganguly, R. Nemani, A.R. Ganguly, Qantifying uncertainty in discrete-continuous and...
  • ZhouL.P. et al.

    Learning discriminative Bayesian networks from high-dimensional continuous neuroimaging data

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2015)
  • D.D. Lewis, Naive Bayes at forty: The independence assumption in information retrieval, in: ECML-98: Proceedings of the...
  • M. Sahami, Learning limited dependence Bayesian classifiers, in: Proceedings of second ACM SIGKDD Conference on...
  • MartinezA.M. et al.

    Scalable learning of Bayesian network classifiers

    J. Mach. Learn. Res.

    (2013)
  • ZaidiN.A. et al.

    Alleviating naive Bayes attribute independence assumption by attribute weighting

    J. Mach. Learn. Res.

    (2013)
  • KeoghE.J. et al.

    Learning the structure of augmented Bayesian classifiers

    Int. J. Artif. Intell. Tools

    (2002)
  • WebbG.I. et al.

    Not so naive Bayes: Aggregating one-dependence estimators

    Mach. Learn.

    (2005)
  • WuJ. et al.

    Attribute weighting via differential evolution algorithm for attribute weighted naive Bayes (WNB)

    J. Comput. Inform. Syst.

    (2011)
  • JiangL. et al.

    Weighted average of one-dependence estimators

    J. Exp. Theor. Artif. Intell.

    (2012)
  • XiangZ.L. et al.

    Attribute weighting for averaged one-dependence estimators

    Appl. Intell.

    (2016)
  • JiangL.X. et al.

    A correlation-based feature weighting filter for naive bayes

    IEEE Trans. Knowl. Data Eng.

    (2019)
  • TangB. et al.

    Toward optimal feature selection in naive bayes for text categorization

    IEEE Trans. Knowl. Data Eng.

    (2016)
  • S.L. Chen, A.M. Martinez, G.I. Webb, Highly scalable attribute selection for averaged one-dependence estimators, in:...
  • Cited by (25)

    • Learning causal Bayesian networks based on causality analysis for classification

      2022, Engineering Applications of Artificial Intelligence
      Citation Excerpt :

      Bayesian inference aims to speed up inference, improve computational accuracy, and reduce computational complexity by computing the posterior probability distribution of certain non-evidence nodes of interest to us. There exist mainly two types of structural learning approaches: the Conditional Independence Testing method (CIT) and Score-Search method (SS) (Duan et al., 2020; Liu et al., 2020c). The basic idea of CIT is to use appropriate loss functions to discover independence relationships between predictive attributes.

    • Alleviating the attribute conditional independence and I.I.D. assumptions of averaged one-dependence estimator by double weighting

      2022, Knowledge-Based Systems
      Citation Excerpt :

      Yu et al. [21] propose an attribute value weighted AODE (AVWAODE), which respectively employs the information gain (IG) and Kullback–Leibler (KL) measure to learn the dependency relationships between class variables and attribute values. Duan et al. [22] propose another weighting filter, named instance-based weighting filter, which is used to dynamically assign flexible weights rather than fixed weights with different SPODEs of different instances. Wang et al. [23] propose a weighted AODE algorithm, named adaptively weighted one-dependence estimator (AWAODE), which adaptively assigns weights for SPODEs to alleviate the independence assumption and takes each instance as the target to learn probability distribution.

    • Multi independent latent component extension of naive Bayes classifier

      2021, Knowledge-Based Systems
      Citation Excerpt :

      This simplicity, however, is basically founded on a strong restricting assumption of conditional independence of all pairs of features given the state of the class variable violated many real-world problems by the high plausibility of existence at least two conditionally dependent features. To tackle this problem, numerous research works have been conducted roughly in three categories: (1) Feature selection [2–4]; (2) Feature weighting [5,6,6,7]; (3) Structure extension [8–10]. The former two approaches deal with features and their values.

    View all citing articles on Scopus

    Zhiyi Duan received the BEng degree in Jilin University, Changchun, China in 2014 and he is current a Ph.D. student in the department of computer science and technology of Jilin University. His research interests include data mining and Bayesian network.

    Limin Wang received the Ph.D. degree in computer science from JiLin University, in 2005. He is currently a professor in the College of Computer Science and Technology, JiLin University, China. His research interests include probabilistic logic inference and Bayesian network. He has published innovative papers in journals such as Knowledge-Based Systems, Expert System with Applications, and Progress in Natural Science.

    Minghui Sun received the Ph.D. degree in computer science from Kochi University of Technology, Japan, in 2011. He is currently an assistant professor in the college of computer science and technology in Jilin University, China. He is interested in using HCI methods to solve challenging real world computing problems in many areas, including tactile interface, pen-based interface and tangible interface.

    View full text