当前位置: X-MOL 学术arXiv.cs.NE › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Contemporary Symbolic Regression Methods and their Relative Performance
arXiv - CS - Neural and Evolutionary Computing Pub Date : 2021-07-29 , DOI: arxiv-2107.14351
William La Cava, Patryk Orzechowski, Bogdan Burlacu, Fabrício Olivetti de França, Marco Virgolin, Ying Jin, Michael Kommenda, Jason H. Moore

Many promising approaches to symbolic regression have been presented in recent years, yet progress in the field continues to suffer from a lack of uniform, robust, and transparent benchmarking standards. In this paper, we address this shortcoming by introducing an open-source, reproducible benchmarking platform for symbolic regression. We assess 14 symbolic regression methods and 7 machine learning methods on a set of 252 diverse regression problems. Our assessment includes both real-world datasets with no known model form as well as ground-truth benchmark problems, including physics equations and systems of ordinary differential equations. For the real-world datasets, we benchmark the ability of each method to learn models with low error and low complexity relative to state-of-the-art machine learning methods. For the synthetic problems, we assess each method's ability to find exact solutions in the presence of varying levels of noise. Under these controlled experiments, we conclude that the best performing methods for real-world regression combine genetic algorithms with parameter estimation and/or semantic search drivers. When tasked with recovering exact equations in the presence of noise, we find that deep learning and genetic algorithm-based approaches perform similarly. We provide a detailed guide to reproducing this experiment and contributing new methods, and encourage other researchers to collaborate with us on a common and living symbolic regression benchmark.



近年来已经提出了许多有希望的符号回归方法,但该领域的进展继续受到缺乏统一、强大和透明的基准标准的影响。在本文中,我们通过引入一个用于符号回归的开源、可重现的基准测试平台来解决这个缺点。我们在一组 252 个不同的回归问题上评估了 14 种符号回归方法和 7 种机器学习方法。我们的评估包括没有已知模型形式的真实世界数据集以及真实基准问题,包括物理方程和常微分方程组。对于真实世界的数据集,我们对每种方法相对于最先进的机器学习方法以低错误和低复杂性学习模型的能力进行了基准测试。对于合成问题,我们评估了每种方法在存在不同级别噪声的情况下找到精确解决方案的能力。在这些受控实验下,我们得出结论,现实世界回归的最佳性能方法将遗传算法与参数估计和/或语义搜索驱动程序相结合。当任务是在存在噪声的情况下恢复精确方程时,我们发现深度学习和基于遗传算法的方法表现相似。我们提供了详细的指南来重现这个实验并贡献新方法,并鼓励其他研究人员与我们合作开发一个共同的、生动的符号回归基准。我们得出结论,现实世界回归的最佳性能方法将遗传算法与参数估计和/或语义搜索驱动程序相结合。当任务是在存在噪声的情况下恢复精确方程时,我们发现深度学习和基于遗传算法的方法表现相似。我们提供了详细的指南来重现这个实验并贡献新方法,并鼓励其他研究人员与我们合作开发一个共同的、生动的符号回归基准。我们得出结论,现实世界回归的最佳性能方法将遗传算法与参数估计和/或语义搜索驱动程序相结合。当任务是在存在噪声的情况下恢复精确方程时,我们发现深度学习和基于遗传算法的方法表现相似。我们提供了详细的指南来重现这个实验并贡献新方法,并鼓励其他研究人员与我们合作开发一个共同的、生动的符号回归基准。