当前位置: X-MOL 学术Data Min. Knowl. Discov. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A drift detection method based on dynamic classifier selection
Data Mining and Knowledge Discovery ( IF 2.8 ) Pub Date : 2019-10-11 , DOI: 10.1007/s10618-019-00656-w
Felipe Pinagé , Eulanda M. dos Santos , João Gama

Machine learning algorithms can be applied to several practical problems, such as spam, fraud and intrusion detection, and customer preferences, among others. In most of these problems, data come in streams, which mean that data distribution may change over time, leading to concept drift. The literature is abundant on providing supervised methods based on error monitoring for explicit drift detection. However, these methods may become infeasible in some real-world applications—where there is no fully labeled data available, and may depend on a significant decrease in accuracy to be able to detect drifts. There are also methods based on blind approaches, where the decision model is updated constantly. However, this may lead to unnecessary system updates. In order to overcome these drawbacks, we propose in this paper a semi-supervised drift detector that uses an ensemble of classifiers based on self-training online learning and dynamic classifier selection. For each unknown sample, a dynamic selection strategy is used to choose among the ensemble’s component members, the classifier most likely to be the correct one for classifying it. The prediction assigned by the chosen classifier is used to compute an estimate of the error produced by the ensemble members. The proposed method monitors such a pseudo-error in order to detect drifts and to update the decision model only after drift detection. The achievement of this method is relevant in that it allows drift detection and reaction and is applicable in several practical problems. The experiments conducted indicate that the proposed method attains high performance and detection rates, while reducing the amount of labeled data used to detect drift.

中文翻译:

基于动态分类器选择的漂移检测方法

机器学习算法可以应用于一些实际问题,例如垃圾邮件,欺诈和入侵检测以及客户偏好等。在大多数这些问题中,数据进入流中,这意味着数据分布可能会随时间变化,从而导致概念漂移。大量文献提供了基于错误监视的监督方法以进行显式漂移检测。但是,这些方法在某些实际应用中可能不可行,因为在这些应用中没有可用的完全标记的数据,并且可能取决于精度的显着降低才能检测到漂移。也有基于盲法的方法,其中决策模型不断更新。但是,这可能导致不必要的系统更新。为了克服这些缺点,我们在本文中提出了一种半监督式漂移检测器,该检测器使用基于自训练在线学习和动态分类器选择的分类器集合。对于每个未知样本,使用动态选择策略从整体的组成成员中进行选择,分类器最有可能是对其进行分类的正确方法。由所选分类器分配的预测用于计算集合成员产生的错误的估计。所提出的方法监视这种伪错误以便仅在漂移检测之后检测漂移并更新决策模型。该方法的实现是有意义的,因为它允许漂移检测和反应,并且可以应用于几个实际问题。实验表明,该方法具有较高的性能和检测率,
更新日期:2019-10-11
down
wechat
bug