当前位置: X-MOL 学术IEEE Trans. Fuzzy Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
On-line Active Learning in Data Stream Regression using Uncertainty Sampling based on Evolving Generalized Fuzzy Models
IEEE Transactions on Fuzzy Systems ( IF 11.9 ) Pub Date : 2018-02-01 , DOI: 10.1109/tfuzz.2017.2654504
Edwin Lughofer , Mahardhika Pratama

In this paper, we propose three criteria for efficient sample selection in case of data stream regression problems within an online active learning context. The selection becomes important whenever the target values, which guide the update of the regressors as well as the implicit model structures, are costly or time-consuming to measure and also in case when very fast models updates are required to cope with stream mining real-time demands. Reducing the selected samples as much as possible while keeping the predictive accuracy of the models on a high level is, thus, a central challenge. This should be ideally achieved in unsupervised and single-pass manner. Our selection criteria rely on three aspects: 1) the extrapolation degree combined with the model's nonlinearity degree , which is measured in terms of a new specific homogeneity criterion among adjacent local approximators; 2) the uncertainty in model outputs, which can be measured in terms of confidence intervals using so-called adaptive local error bars — we integrate a weighted localization of an incremental noise level estimator and propose formulas for online merging of local error bars; 3) the uncertainty in model parameters, which is estimated by the so-called A-optimality criterion, which relies on the Fisher information matrix. The selection criteria are developed in combination with evolving generalized Takagi–Sugeno (TS) fuzzy models (containing rules in arbitrarily rotated position), as it could be shown in previous publications that these outperform conventional evolving TS models (containing axis-parallel rules). The results based on three high-dimensional real-world streaming problems show that a model update based on only 10%–20% selected samples can still achieve similar accumulated model errors over time to the case when performing a full model update on all samples. This can be achieved with a negligible sensitivity on the size of the active learning latency buffer. Random sampling with the same percentages of samples selected, however, achieved much higher error rates. Hence, the intelligence in our sample selection concept leads to an economic balance between model accuracy and measurement as well computational costs for model updates.

中文翻译:

基于进化广义模糊模型的不确定采样的数据流回归在线主动学习

在本文中,我们提出了在在线主动学习环境中出现数据流回归问题时有效样本选择的三个标准。每当指导回归量更新和隐式模型结构的目标值测量成本高或耗时,以及需要非常快速的模型更新以应对流挖掘的情况下,选择就变得很重要。时间要求。因此,在保持模型预测准确度高水平的同时,尽可能减少选择的样本是一项核心挑战。这应该以无监督和单次通过的方式理想地实现。我们的选择标准依赖于三个方面:1)外推度与模型的非线性度相结合,这是根据相邻局部逼近器之间新的特定同质性标准来衡量的;2)模型输出的不确定性,可以使用所谓的自适应局部误差线根据置信区间来衡量——我们整合了增量噪声水平估计器的加权定位,并提出了局部误差线在线合并的公式;3) 模型参数的不确定性,这是由所谓的 A 最优性准则估计的,该准则依赖于 Fisher 信息矩阵。选择标准是结合进化的广义 Takagi-Sugeno (TS) 模糊模型(包含任意旋转位置的规则)而开发的,因为它可以在以前的出版物中显示,这些模型优于传统的进化 TS 模型(包含轴平行规则)。基于三个高维现实世界流问题的结果表明,仅基于 10%–20% 选定样本的模型更新仍然可以实现与对所有样本执行完整模型更新的情况类似的累积模型误差。这可以通过对主动学习延迟缓冲区大小的敏感度可以忽略不计来实现。然而,选择相同百分比的样本进行随机抽样会实现更高的错误率。因此,我们的样本选择概念中的智能导致模型准确性和测量以及模型更新的计算成本之间的经济平衡。
更新日期:2018-02-01
down
wechat
bug