Accessing Imbalance Learning Using Dynamic Selection Approach in Water Quality Anomaly Detection,Symmetry

当前位置： X-MOL 学术 › Symmetry › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Accessing Imbalance Learning Using Dynamic Selection Approach in Water Quality Anomaly Detection
Symmetry ( IF 2.2 ) Pub Date : 2021-05-07 , DOI: 10.3390/sym13050818
Eustace M. Dogo , Nnamdi I. Nwulu , Bhekisipho Twala , Clinton Aigbavboa

Automatic anomaly detection monitoring plays a vital role in water utilities’ distribution systems to reduce the risk posed by unclean water to consumers. One of the major problems with anomaly detection is imbalanced datasets. Dynamic selection techniques combined with ensemble models have proven to be effective for imbalanced datasets classification tasks. In this paper, water quality anomaly detection is formulated as a classification problem in the presences of class imbalance. To tackle this problem, considering the asymmetry dataset distribution between the majority and minority classes, the performance of sixteen previously proposed single and static ensemble classification methods embedded with resampling strategies are first optimised and compared. After that, six dynamic selection techniques, namely, Modified Class Rank (Rank), Local Class Accuracy (LCA), Overall-Local Accuracy (OLA), K-Nearest Oracles Eliminate (KNORA-E), K-Nearest Oracles Union (KNORA-U) and Meta-Learning for Dynamic Ensemble Selection (META-DES) in combination with homogeneous and heterogeneous ensemble models and three SMOTE-based resampling algorithms (SMOTE, SMOTE+ENN and SMOTE+Tomek Links), and one missing data method (missForest) are proposed and evaluated. A binary real-world drinking-water quality anomaly detection dataset is utilised to evaluate the models. The experimental results obtained reveal all the models benefitting from the combined optimisation of both the classifiers and resampling methods. Considering the three performance measures (balanced accuracy, F-score and G-mean), the result also shows that the dynamic classifier selection (DCS) techniques, in particular, the missForest+SMOTE+RANK and missForest+SMOTE+OLA models based on homogeneous ensemble-bagging with decision tree as the base classifier, exhibited better performances in terms of balanced accuracy and G-mean, while the Bg+mF+SMENN+LCA model based on homogeneous ensemble-bagging with random forest has a better overall F1-measure in comparison to the other models.

中文翻译：

动态选择方法在水质异常检测中获得失衡学习

自动异常检测监视在自来水公司的分配系统中起着至关重要的作用，以减少不洁净的水对消费者造成的风险。异常检测的主要问题之一是数据集不平衡。动态选择技术与集成模型相结合已被证明对不平衡的数据集分类任务有效。在本文中，将水质异常检测公式化为存在类别不平衡的分类问题。为了解决这个问题，考虑到多数和少数族裔类别之间的不对称数据集分布，首先优化和比较了先前提出的十六种嵌入重采样策略的单一和静态集成分类方法的性能。此后，采用了六种动态选择技术，即改良班级排名（Rank），本地类精度（LCA），整体本地精度（OLA），K-最近Oracle消除（KNORA-E），K-最近Oracle联盟（KNORA-U）和用于动态集合选择的元学习（META-DES）提出并评估了同质和异类集成模型以及基于SMOTE的三种重采样算法（SMOTE，SMOTE + ENN和SMOTE + Tomek Links），并提出了一种缺失数据方法（missForest）。使用二进制的现实世界饮用水水质异常检测数据集来评估模型。获得的实验结果揭示了受益于分类器和重采样方法的组合优化的所有模型。考虑到三个性能指标（平衡精度，F分数和G平均值），结果还显示出动态分类器选择（DCS）技术，特别是

更新日期：2021-05-07

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文