当前位置: X-MOL 学术Aut. Control Comp. Sci. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Concept Drift Detection in Streaming Classification of Mobile Application Traffic
Automatic Control and Computer Sciences ( IF 0.6 ) Pub Date : 2021-07-19 , DOI: 10.3103/s0146411621030093
O. I. Sheluhin 1 , S. A. Sekretarev 1
Affiliation  

Abstract

An important aspect of the classification of applications under conditions of a priori uncertainty is the operation of algorithms in streaming mode, with the continuous receipt of measurement data. A distinctive feature of the classification of data in streaming mode is the concept drift. A concept drift occurs when the phenomenon being studied, for which data have been collected, changes over time. Under the conditions of non-stationary data flows, the classification of mobile applications should be paired with a concept drift detector (CDD). The paper proposes a two-stage algorithm for detecting a change in the concept in the observed data stream. The algorithm is based on the statistical characteristics of the attributes analyzed using two sliding windows that control the change in the current statistical characteristics of the attributes of mobile applications. At the first stage, key statistics are applied in accordance with the Fisher criterion. At the second stage, the Page–Hinckley test is applied. As a result of the experiments, using an artificial data set, dependencies were obtained that allow one to evaluate the performance of the proposed two-stage algorithm for detecting the concept drift. It is shown that CDD allows reducing the probability of classification error with each change of concept by about 5%.



中文翻译:

移动应用流量流分类中的概念漂移检测

摘要

在先验不确定性条件下对应用程序进行分类的一个重要方面是在流模式下运行算法,并连续接收测量数据。流模式下数据分类的一个显着特征是概念漂移。当所研究的现象(已为其收集数据)随时间发生变化时,就会发生概念漂移。在非平稳数据流条件下,移动应用的分类应搭配概念漂移检测器(CDD)。该论文提出了一种两阶段算法,用于检测观察数据流中概念的变化。该算法基于属性的统计特征分析,使用两个滑动窗口控制移动应用程序属性的当前统计特征的变化。在第一阶段,根据 Fisher 标准应用关键统计数据。在第二阶段,应用 Page-Hinckley 检验。作为实验的结果,使用人工数据集,获得了相关性,允许人们评估所提出的用于检测概念漂移的两阶段算法的性能。结果表明,CDD 允许将每次概念变化的分类错误概率降低约 5%。作为实验的结果,使用人工数据集,获得了相关性,允许人们评估所提出的用于检测概念漂移的两阶段算法的性能。结果表明,CDD 允许将每次概念变化的分类错误概率降低约 5%。作为实验的结果,使用人工数据集,获得了相关性,允许人们评估所提出的用于检测概念漂移的两阶段算法的性能。结果表明,CDD 允许将每次概念变化的分类错误概率降低约 5%。

更新日期:2021-07-19
down
wechat
bug