当前位置: X-MOL 学术Neural Process Lett. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Two Outlier-Sensitive Measures for Semi-supervised Dynamic Ensemble Anomaly Detection Models
Neural Processing Letters ( IF 3.1 ) Pub Date : 2022-09-01 , DOI: 10.1007/s11063-022-11017-y
Shiyuan Fu , Xin Gao , Baofeng Li , Bing Xue , Xin Jia , Zijian Huang , Guangyao Zhang , Xu Huang

Semi-supervised anomaly detection has received wide interest because of not requiring counterexamples during training. Existing competence measures for semi-supervised dynamic ensemble anomaly detection models do not consider the imbalance characteristic of training samples, which will result in serious overfitting on normal samples. This paper proposes two outlier-sensitive measures to estimate the competence of base classifiers for dynamic ensemble models. When a normal sample is correctly classified, both measures give a higher positive score to base classifiers with confidence closer to 0.5, which is different from the conventional idea that base classifiers with higher confidence should obtain higher scores. When a sample is misclassified, the Output-based Outlier-Sensitive measure calculates a negative score based on the confidence outputted by the base classifier, while the Cost-Sensitive-based Outlier-Sensitive measure gives a negative score based on the category of this sample. Multiple experiments are carried out on 30 datasets from public repositories under the unified framework proposed in this paper, and results show that dynamic ensemble models with our competence measures can outperform a number of typical ensemble models in terms of G-mean and F1, regardless of the pseudo outlier labeling methods and base classifier selection methods used in the model.



中文翻译:

半监督动态集成异常检测模型的两种异常值敏感度量

由于在训练期间不需要反例,半监督异常检测受到了广泛的关注。现有的半监督动态集成异常检测模型的能力度量没有考虑训练样本的不平衡特性,这将导致对正常样本的严重过拟合。本文提出了两种对异常值敏感的措施来估计动态集成模型的基分类器的能力。当一个正常的样本被正确分类时,这两种度量都会给置信度接近0.5的基分类器一个更高的正分数,这与传统的想法不同,即具有较高置信度的基分类器应该获得更高的分数。当样本被错误分类时,基于输出的异常值敏感度量根据基分类器输出的置信度计算负分,而基于成本敏感的异常敏感度量根据该样本的类别给出负分。在本文提出的统一框架下,对来自公共存储库的 30 个数据集进行了多次实验,结果表明,无论采用模型中使用的伪异常值标记方法和基分类器选择方法。

更新日期:2022-09-02
down
wechat
bug