当前位置: X-MOL 学术Mathematics › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
EvoPreprocess—Data Preprocessing Framework with Nature-Inspired Optimization Algorithms
Mathematics ( IF 2.3 ) Pub Date : 2020-06-02 , DOI: 10.3390/math8060900
Sašo Karakatič

The quality of machine learning models can suffer when inappropriate data is used, which is especially prevalent in high-dimensional and imbalanced data sets. Data preparation and preprocessing can mitigate some problems and can thus result in better models. The use of meta-heuristic and nature-inspired methods for data preprocessing has become common, but these approaches are still not readily available to practitioners with a simple and extendable application programming interface (API). In this paper the EvoPreprocess open-source Python framework, that preprocesses data with the use of evolutionary and nature-inspired optimization algorithms, is presented. The main problems addressed by the framework are data sampling (simultaneous over- and under-sampling data instances), feature selection and data weighting for supervised machine learning problems. EvoPreprocess framework provides a simple object-oriented and parallelized API of the preprocessing tasks and can be used with scikit-learn and imbalanced-learn Python machine learning libraries. The framework uses self-adaptive well-known nature-inspired meta-heuristic algorithms and can easily be extended with custom optimization and evaluation strategies. The paper presents the architecture of the framework, its use, experiment results and comparison to other common preprocessing approaches.

中文翻译:

EvoPreprocess-具有自然启发式优化算法的数据预处理框架

当使用不合适的数据时,机器学习模型的质量会受到影响,这在高维和不平衡的数据集中尤为普遍。数据准备和预处理可以减轻一些问题,从而可以建立更好的模型。使用元启发式方法和自然启发式方法进行数据预处理已变得很普遍,但是对于具有简单且可扩展的应用程序编程接口(API)的从业人员,这些方法仍然不易使用。在本文中,提出了EvoPreprocess开源Python框架,该框架使用进化和自然启发式优化算法对数据进行预处理。该框架解决的主要问题是数据采样(同时对数据实例进行过采样和欠采样),功能选择数据加权来监督有监督的机器学习问题。EvoPreprocess框架为预处理任务提供了一个简单的面向对象的并行API,可与scikit-learn和inbalanced-learn的Python机器学习库一起使用。该框架使用自适应的,著名的自然启发式元启发式算法,可以轻松地通过自定义优化和评估策略进行扩展。本文介绍了该框架的体系结构,其用途,实验结果以及与其他常见预处理方法的比较。
更新日期:2020-06-02
down
wechat
bug