AH3: An adaptive hierarchical feature representation model for three-way decision boundary processing
Introduction
Classification has always been a key issue in real life, especially in disease detection, spam detection, and many other applications. For traditional binary classification, there are only two choices for a decision: acceptance and rejection. In practice, it is often impossible to accept or reject some samples because of uncertainty or incompleteness of information, and they are often easily misclassified. Thus, how to properly deal with these uncertain samples for binary classification is an important problem. To better describe datasets and process uncertain samples, three-way decision theory was proposed by Yao [1], [2] to extend two-way decision theory by incorporating an additional choice: boundary decision. All samples are divided into three possible decisions: positive decision, negative decision and boundary decision; namely, as a positive region (POS), a negative region (NEG), and a boundary region (BND).
In three-way decision theory, the POS and NEG regions both contain samples without uncertainty or fuzziness, and uncertain samples are further divided into certainty regions (POS or NEG) when they contain enough information. The process is similar to the human decision strategy. Therefore, how to mine information for boundary processing is especially important, and that is the best way to reasonably improve the results of binary classification. Research on three-way decision theory now focuses more on solving practical problems. For example, Chen et al. [3] extended the notions of a characteristic relation and a characteristic set to systems with four types of characteristic relations and characteristic sets for incomplete data processing based on the wide sense of three-way decision theory. Afridi et al. [4] used the three-way clustering approach to deal with clusters having overlapping regions. They proposed different variance-based criteria for determining the thresholds. The thresholds play a crucial and important role in accurate estimation of the overlapping region. In terms of theoretical innovation, Liu et al. [5] proposed a new three-way decision model with intuitionistic fuzzy numbers to solve multiple-attribute decision-making problems. Zhang et al. [6] constructed and investigated quantitative three-way class-specific attribute reducts based on region preservations in three-way decision theory, with the aim to form three-way types of quantitative optimization that match probabilistic rough sets. In addition, the application of three-way decision theory involves various fields, such as disease detection [7], [8], credit card [9], social networks [10], recommendation systems [11], text classification [12], image data analysis [13], inconsistent information [14], [15], and cloud computing [16].
To better solve the binary classification problem, researchers have made contributions for processing the BND region, and many advanced techniques have resulted. Li et al. [17] proposed a method based on the tri-training algorithm to reduce the BND region, and build up three classifiers on the basis of a three-way decision. Li et al. [18] proposed a three-way decision model for dealing with the uncertain boundary region to improve the binary text classification performance based on rough set techniques and a centroid solution. Ma and Yao [19] thought negative rules are as important as acceptance rules for boundary processing, which is based on class-specific attribute reducts. They proposed three types of class-specific attribute reducts in probabilistic rough set models for boundary processing. We have made much effort in dealing with boundary areas. First, we [20] mined definite information from the POS region and the NEG region, and proposed a multiview decision model based on constructive three-way decision theory, which mined the global information to classify boundary samples. Then, we [21] used a cost-sensitive method to deal with the BND region based on the three-way decision model.
However, most of these methods are oriented toward the data themselves and not to toward the problem decision. That is why humans always think about the reasons why problems arise. They find answers from existing knowledge using conventional methods. They may draw different conclusions based on different basic knowledge or criteria. Therefore, on the basis of three-way decision theory, how to mine useful decision rules from the POS region and the NEG region is an important step, and our goal is to use these two kinds of decision rules to properly divide these boundary samples to increase binary classification accuracy.
In this article, we propose an adaptive hierarchical feature representation model based on three-way decision theory for boundary processing, named . Our contributions are as follows:
- •
We propose a BND region processing method (AH3) for problem-oriented decision making. We select the adaptive feature representation for boundary processing using a validated BND region, which properly improves the results of binary classification.
- •
On the basis of fuzzy quotient space theory (FQST), we construct two kinds of hierarchical feature representation from the POS region and the NEG region, respectively. Then, we adaptively decompose the feature representation with the highest accuracy between the upper layer and the lower layer to finer granularity, and two adaptive granular spaces are selected from the POS region and the NEG region to properly process boundary samples.
- •
We combine variance with mutual information to form a new method (variance-mutual information [VMI] method) that can better represent the relationship between different features to highlight some representative features and remove some redundant features.
- •
To demonstrate the effectiveness of our algorithm (AH3), we experiment on five University of California, Irvine (UCI) datasets: the Spambase dataset, the Chess dataset, and three medical datasets: Breast Cancer Wisconsin (Original) (WBC), Breast Cancer Wisconsin (Diagnostic) (WDBC), and Breast Cancer Wisconsin (Prognostic) (WPBC). The results demonstrate that our algorithm has good classification performance, especially in dealing with the three real medical datasets.
The remainder of this work is organized as follows. In Section 2, we introduce our preliminary work. In Section 3, the process of constructing a hierarchical feature representation based on FQST is introduced. In Section 4, we introduce our algorithm of an adaptive hierarchical feature representation model based on three-way decision theory (AH3). The experimental results are analyzed in Section 5. In Section 6, we present our conclusions.
Section snippets
Preliminary work
In this section, we introduce our preliminary work on a three-way decision model based on a minimum covering algorithm (MinCA) and FQST.
Hierarchical feature representation based on FQST
In this section, we firstly describe the importance of the variance-mutual information (VMI) and show how to obtain a new fuzzy equivalence relation. Then, we construct a hierarchical feature representation for certain regions (POS and NEG) to find the categorical information based on FQST. Finally, we introduce the process of selecting the high-precision feature representation by validating BND samples.
Adaptive feature representation selection
According to Algorithm 1, we have obtained a hierarchical feature representation and the high-precision layer (, ). However, because different feature representation layers are discrete rather than continuous, a difference will exist between different layers. From the granulation point of view, the high-precision layer may be coarse, and it may not be the adaptive feature representation for global classification.
In this section, we adaptively decompose the high-precision
Experiments
In this section, to evaluate the effectiveness of our algorithm, we firstly introduce the basic experimental information, including evaluation index, datasets and the state-of-the-art baseline methods. Then we describe the information regarding the feature representation of the high-precision layer and the adaptive layer in detail, and the difference between them. Finally, we present the experimental results of our binary classification, and compare them with the results obtained with other
Conclusions
In this article, we propose the AH3 algorithm to properly deal with boundary samples for improving binary classification problems. It firstly produces two certain regions (POS and NEG) and an uncertain region (BND) on the basis of the three-way decision model. To better describe the relationship between different features, it constructs the fuzzy equivalence relation on the basis of VMI. Then, it adaptively constructs a hierarchical feature representation for the POS region and the NEG region,
CRediT authorship contribution statement
Jie Chen: Conceptualization, Data curation, Funding acquisition, Methodology, Project administration, Software, Writing – review & editing. Yang Xu: Formal analysis, Investigation, Methodology, Software, Visualization, Writing – original draft. Shu Zhao: Funding acquisition, Project administration, Supervision, Writing – review & editing. Yanping Zhang: Funding acquisition, Supervision, Validation, Writing – review & editing.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgements
This work was supported by the National Natural Science Foundation of China (grants no. 61602003, no. 61673020, and no. 61876001) and the Provincial Natural Science Foundation of Anhui Province (grant no. 1708085QF156).
References (48)
Three-way decisions with probabilistic rough sets
Inf. Sci.
(2010)- et al.
Extending characteristic relations on an incomplete data set by the three-way decision theory
Int. J. Approx. Reason.
(2020) - et al.
Variance based three-way clustering approaches for handling overlapping clustering
Int. J. Approx. Reason.
(2020) - et al.
A multiple attribute decision making three-way model for intuitionistic fuzzy numbers
Int. J. Approx. Reason.
(2020) - et al.
Quantitative three-way class-specific attribute reducts based on region preservations
Int. J. Approx. Reason.
(2020) - et al.
On transformations from semi-three-way decision spaces to three-way decision spaces based on triangular norms and triangular conorms
Inf. Sci.
(2018) - et al.
Proximal three-way decisions: theory and applications in social networks
Knowl.-Based Syst.
(2016) - et al.
Three-way recommender systems based on random forests
Knowl.-Based Syst.
(2016) - et al.
Cost-sensitive sequential three-way decision modeling using a deep neural network
Int. J. Approx. Reason.
(2017) - et al.
Three-way decisions based on neutrosophic sets and AHP-QFD framework for supplier selection problem
Future Gener. Comput. Syst.
(2018)
Adaptive thresholds determination for saving cloud energy using three-way decisions
Clust. Comput.
Three-way decision perspectives on class-specific attribute reducts
Inf. Sci.
The superiority of three-way decisions in probabilistic rough set models
Inf. Sci.
Swarm intelligent based online feature selection (OFS) and weighted entropy frequent pattern mining (WEFPM) algorithm for big data analysis
Clust. Comput.
A knowledge-based system for breast cancer classification using fuzzy logic method
Telemat. Inform.
Optimal breast cancer classification using Gauss-Newton representation based algorithm
Expert Syst. Appl.
Breast cancer classification using deep belief networks
Expert Syst. Appl.
Breast cancer diagnosis using genetically optimized neural network model
Expert Syst. Appl.
Three-way decision: an interpretation of rules in rough set theory
Modified bat algorithm for feature selection with the Wisconsin Diagnosis Breast Cancer (WDBC) dataset
Asian Pac. J. Cancer Prev.
On breast cancer detection: an application of machine learning algorithms on the Wisconsin Diagnostic Dataset
Enhancing binary classification by modeling uncertain boundary in three-way decisions
IEEE Trans. Knowl. Data Eng.
An oversampling method for imbalance data based on three-way decision model
Acta Electron. Sin.
A method to reduce boundary regions in three-way decision theory
Cited by (7)
Selective label enhancement for multi-label classification based on three-way decisions
2022, International Journal of Approximate ReasoningCitation Excerpt :Many scholars broadened the context of the decision by employing the three-way decisions on data mining-related topics like in attribute reduction [32–34], concept analysis [35–37] and clustering [38–40]. With superior performance on effectiveness and efficiency, it is an emerging decision theory for problem-solving with uncertainty [41–46]. TAO is an abbreviation of trisecting, acting and outcome.
A cost-sensitive temporal-spatial three-way recommendation with multi-granularity decision
2022, Information SciencesCitation Excerpt :Three-way decision was proposed by Yao [25] in 2010. In near ten years, 3WD has been developed very rapidly in theories [32,33], methodologies [34,35] and applications [7,36–39]. The initial notion of 3WD focuses on a simple trisecting-acting-outcome (TAO) model under the static and single-level decision environment [11].
Hierarchical sequential three-way decision model
2022, International Journal of Approximate ReasoningCitation Excerpt :During the process of knowledge induction, the advantage of three-way decisions is to discover potential knowledge at a minimal cost. Three-way decision models have been successfully applied in various fields [1,6,13–17,22,24,25,29,42,43,40,46,47]. As we all know, sequential three-way decisions (S3WD) [36] is one of the most attractive 3WD models and it is an important method close to human thinking mode.
Formal concept analysis, rough sets, and three-way decisions
2022, International Journal of Approximate ReasoningA TOPSIS method based on sequential three-way decision
2023, Applied IntelligenceThree-way decision theory based on interval type-2 fuzzy linguistic term sets
2022, Journal of Intelligent and Fuzzy Systems