Adaptive Sampling for Active Learning with Genetic Programming,Cognitive Systems Research

当前位置： X-MOL 学术 › Cogn. Syst. Res. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Adaptive Sampling for Active Learning with Genetic Programming
Cognitive Systems Research ( IF 3.9 ) Pub Date : 2021-01-01 , DOI: 10.1016/j.cogsys.2020.08.008
Sana Ben Hamida , Hmida Hmida , Amel Borgi , Marta Rukoz

Abstract Active learning is a machine learning paradigm allowing to decide which inputs to use for training. It is introduced to Genetic Programming (GP) essentially thanks to the dynamic data sampling, used to address some known issues such as the computational cost, the over-fitting problem and the imbalanced databases. The traditional dynamic sampling for GP gives to the algorithm a new sample periodically, often each generation, without considering the state of the evolution. In so doing, individuals do not have enough time to extract the hidden knowledge. An alternative approach is to use some information about the learning state to adapt the periodicity of the training data change. In this work, we propose an adaptive sampling strategy for classification tasks based on the state of solved fitness cases throughout learning. It is a flexible approach that could be applied with any dynamic sampling. We implemented some sampling algorithms extended with dynamic and adaptive controlling re-sampling frequency. We experimented them to solve the KDD intrusion detection and the Adult incomes prediction problems with GP. The experimental study demonstrates how the sampling frequency control preserves the power of dynamic sampling with possible improvements in learning time and quality. We also demonstrate that adaptive sampling can be an alternative to multi-level sampling. This work opens many new relevant extension paths.

中文翻译：

使用遗传编程进行主动学习的自适应采样

摘要主动学习是一种机器学习范式，允许决定使用哪些输入进行训练。它被引入遗传编程（GP）主要归功于动态数据采样，用于解决一些已知问题，例如计算成本、过度拟合问题和数据库不平衡。GP 的传统动态采样会定期（通常是每一代）为算法提供一个新样本，而不考虑进化的状态。在这样做时，个人没有足够的时间来提取隐藏的知识。另一种方法是使用一些关于学习状态的信息来适应训练数据变化的周期性。在这项工作中，我们基于整个学习过程中解决的适应度案例的状态为分类任务提出了一种自适应采样策略。这是一种灵活的方法，可以应用于任何动态采样。我们实现了一些扩展了动态和自适应控制重采样频率的采样算法。我们对它们进行了实验，以解决 KDD 入侵检测和 GP 的成人收入预测问题。实验研究展示了采样频率控制如何保持动态采样的能力，并可能改善学习时间和质量。我们还证明了自适应采样可以替代多级采样。这项工作开辟了许多新的相关扩展路径。我们对它们进行了实验，以解决 KDD 入侵检测和 GP 的成人收入预测问题。实验研究展示了采样频率控制如何保持动态采样的能力，并可能改善学习时间和质量。我们还证明了自适应采样可以替代多级采样。这项工作开辟了许多新的相关扩展路径。我们对它们进行了实验，以解决 KDD 入侵检测和 GP 的成人收入预测问题。实验研究展示了采样频率控制如何保持动态采样的能力，并可能改善学习时间和质量。我们还证明了自适应采样可以替代多级采样。这项工作开辟了许多新的相关扩展路径。

更新日期：2021-01-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>