当前位置: X-MOL 学术IEEE Comput. Intell. Mag. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Review of Off-Line Mode Dataset Shifts
IEEE Computational Intelligence Magazine ( IF 10.3 ) Pub Date : 2020-08-01 , DOI: 10.1109/mci.2020.2998231
Carla C. Takahashi , Antonio P. Braga

Dataset shifts are present in many real-world applications, since data generation is not always fully controlled and is subject to noise, degradation, and other natural variations. In machine learning, the lack of regularity in data can degrade performance by breaching error constraints. Different methods have been proposed to solve shifting problems; however, shifts in off-line learning mode are not as well examined. Off-line shifts consist of problems where drifts occur only with unlabeled data. Most methods aimed at dataset shifts consider that new labeled data can be received after training, which is not always the case. Here, a review on dataset shift characteristics and causes is presented as a tool for the analysis and implementation of machine learning methods targeting off-line mode dataset shift problems. In this context, a relationship between statistical learning risk functions and error degradation due to variation in data distribution was straightforwardly derived. Moreover, this paper provides a consistent survey of recent popular machine learning methods that address off-line mode dataset shift problems, focusing on the main characteristics of unlabeled data shifts.

中文翻译:

离线模式数据集转移回顾

数据集偏移存在于许多实际应用中,因为数据生成并不总是完全受控,并且会受到噪声、退化和其他自然变化的影响。在机器学习中,数据缺乏规律性会因为违反错误约束而降低性能。已经提出了不同的方法来解决换档问题;然而,离线学习模式的转变并没有得到很好的研究。离线移位包括仅在未标记的数据上发生漂移的问题。大多数针对数据集转换的方法都认为可以在训练后接收新的标记数据,但情况并非总是如此。在这里,作为分析和实施针对离线模式数据集偏移问题的机器学习方法的工具,对数据集偏移特征和原因进行了回顾。在这种情况下,由于数据分布的变化,统计学习风险函数和错误退化之间的关系可以直接推导出来。此外,本文对最近流行的机器学习方法进行了一致的调查,这些方法解决了离线模式数据集转移问题,重点关注未标记数据转移的主要特征。
更新日期:2020-08-01
down
wechat
bug