Evolving simple and accurate symbolic regression models via asynchronous parallel computing,Applied Soft Computing

当前位置： X-MOL 学术 › Appl. Soft Comput. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Evolving simple and accurate symbolic regression models via asynchronous parallel computing
Applied Soft Computing ( IF 7.2 ) Pub Date : 2021-02-24 , DOI: 10.1016/j.asoc.2021.107198
Aliyu Sani Sambo , R. Muhammad Atif Azad , Yevgeniya Kovalchuk , Vivek Padmanaabhan Indramohan , Hanifa Shah

In machine learning, reducing the complexity of a model can help to improve its computational efficiency and avoid overfitting. In genetic programming (GP), the model complexity reduction is often achieved by reducing the size of evolved expressions. However, previous studies have demonstrated that the expression size reduction does not necessarily prevent model overfitting. Therefore, this paper uses the evaluation time – the computational time required to evaluate a GP model on data – as the estimate of model complexity. The evaluation time depends not only on the size of evolved expressions but also their composition, thus acting as a more nuanced measure of model complexity than the expression size alone. To discourage complexity, this study employs a novel method called asynchronous parallel GP (APGP) that introduces a race condition in the evolutionary process of GP; the race offers an evolutionary advantage to the simple solutions when their accuracy is competitive. To evaluate the proposed method, it is compared to the standard GP (GP) and GP with bloat control (GP+BC) methods on six challenging symbolic regression problems. APGP produced models that are significantly more accurate (on 6/6 problems) than those produced by both GP and GP+BC. In terms of complexity control, APGP prevailed over GP but not over GP+BC; however, GP+BC produced simpler solutions at the cost of test-set accuracy. Moreover, APGP took a significantly lower number of evaluations than both GP and GP+BC to meet a target training fitness in all tests. Our analysis of the proposed APGP also involved: (1) an ablation study that separated the proposed measure of complexity from the race condition in APGP and (2) the study of an initialisation scheme that encourages functional diversity in the initial population that improved the results for all the GP methods. These results question the overall benefits of bloat control and endorse the employment of both the evaluation time as an estimate of model complexity and the proposed APGP method for controlling it.

中文翻译：

通过异步并行计算发展简单而准确的符号回归模型

在机器学习中，降低模型的复杂性可以帮助提高模型的计算效率并避免过度拟合。在基因编程（GP）中，通常通过减小进化表达式的大小来实现模型复杂度的降低。但是，以前的研究表明，减小表达式大小并不一定可以防止模型过度拟合。因此，本文将评估时间（评估基于数据的GP模型所需的计算时间）用作模型复杂度的估算。评估时间不仅取决于演化表达的大小，还取决于它们的组成，因此，比起单独使用表达式大小，它可以更精细地衡量模型的复杂性。为了阻止复杂性，本研究采用了一种称为异步并行GP（APGP）的新方法，该方法在GP的进化过程中引入了竞争条件。当简单的解决方案具有竞争力时，种族为这些简单的解决方案提供了进化上的优势。为了评估所提出的方法，在六个具有挑战性的符号回归问题上，将其与标准GP（GP）和具有膨胀控制（GP + BC）方法的GP进行了比较。APGP生成的模型（在6/6个问题上）比GP和GP + BC生成的模型准确得多。在复杂性控制方面，APGP胜过GP，但不胜过GP + BC。但是，GP + BC以测试设置的准确性为代价，提供了更简单的解决方案。而且，在所有测试中，APGP进行的评估数量均显着低于GP和GP + BC。我们对拟议的APGP的分析还涉及：（1）一项消融研究，将拟议的复杂性度量与APGP的种族条件分开；（2）研究旨在鼓励初始人群中功能多样性的初始化方案，从而改善结果适用于所有GP方法。这些结果质疑膨胀控制的总体益处，并支持采用评估时间作为模型复杂性的估计以及所建议的用于控制模型的APGP方法。（1）一项消融研究，将拟议的复杂性度量与APGP中的种族条件分开，以及（2）一项初始化方案的研究，该方案鼓励初始人群中的功能多样性，从而改善了所有GP方法的结果。这些结果质疑膨胀控制的总体益处，并支持采用评估时间作为模型复杂性的估计以及所建议的用于控制模型的APGP方法。（1）一项消融研究，将拟议的复杂性度量与APGP中的种族条件分离开来；（2）研究初始化方案，该方案鼓励初始人群中的功能多样性，从而改善了所有GP方法的结果。这些结果质疑膨胀控制的总体益处，并支持采用评估时间作为模型复杂性的估计以及所建议的用于控制模型的APGP方法。

更新日期：2021-03-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11