Choosing function sets with better generalisation performance for symbolic regression models,Genetic Programming and Evolvable Machines

当前位置： X-MOL 学术 › Genet. Program. Evolvable Mach. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Choosing function sets with better generalisation performance for symbolic regression models
Genetic Programming and Evolvable Machines ( IF 1.7 ) Pub Date : 2020-05-12 , DOI: 10.1007/s10710-020-09391-4
Miguel Nicolau , Alexandros Agapitos

Supervised learning by means of Genetic Programming (GP) aims at the evolutionary synthesis of a model that achieves a balance between approximating the target function on the training data and generalising on new data. The model space searched by the Evolutionary Algorithm is populated by compositions of primitive functions defined in a function set . Since the target function is unknown, the choice of function set’s constituent elements is primarily guided by the makeup of function sets traditionally used in the GP literature. Our work builds upon previous research of the effects of protected arithmetic operators (i.e. division, logarithm, power) on the output value of an evolved model for input data points not encountered during training. The scope is to benchmark the approximation/generalisation of models evolved using different function set choices across a range of 43 symbolic regression problems. The salient outcomes are as follows. Firstly, Koza’s protected operators of division and exponentiation have a detrimental effect on generalisation, and should therefore be avoided. This result is invariant of the use of moderately sized validation sets for model selection. Secondly, the performance of the recently introduced analytic quotient operator is comparable to that of the sinusoidal operator on average, with their combination being advantageous to both approximation and generalisation. These findings are consistent across two different system implementations, those of standard expression-tree GP and linear Grammatical Evolution. We highlight that this study employed very large test sets, which create confidence when benchmarking the effect of different combinations of primitive functions on model generalisation. Our aim is to encourage GP researchers and practitioners to use similar stringent means of assessing generalisation of evolved models where possible, and also to avoid certain primitive functions that are known to be inappropriate.

中文翻译：

为符号回归模型选择泛化性能更好的函数集

通过遗传编程 (GP) 进行的监督学习旨在模型的进化综合，该模型在训练数据上逼近目标函数和新数据上的泛化之间实现平衡。进化算法搜索的模型空间由在函数集中定义的原始函数的组合填充。由于目标函数是未知的，函数集组成元素的选择主要由 GP 文献中传统使用的函数集的组成指导。我们的工作建立在先前对受保护算术运算符（即除法、对数、幂）对训练期间未遇到的输入数据点的演化模型的输出值的影响的研究之上。范围是在 43 个符号回归问题的范围内对使用不同函数集选择演化的模型的近似/泛化进行基准测试。主要成果如下。首先，Koza 受保护的除法和幂运算符对泛化有不利影响，因此应该避免。该结果对于使用中等大小的验证集进行模型选择是不变的。其次，最近引入的解析商算子的性能平均与正弦算子的性能相当，它们的组合有利于近似和泛化。这些发现在两种不同的系统实现中是一致的，即标准表达式树 GP 和线性语法演化的实现。我们强调，这项研究使用了非常大的测试集，这在对原始函数的不同组合对模型泛化的影响进行基准测试时建立了信心。我们的目标是鼓励 GP 研究人员和从业者在可能的情况下使用类似的严格方法来评估进化模型的泛化，并避免某些已知不合适的原始函数。

更新日期：2020-05-12

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11