Skip to main content
Log in

Multi-region symbolic regression: combining functions under a multi-objective approach

  • Published:
Natural Computing Aims and scope Submit manuscript

Abstract

This paper introduces Multi-Region Symbolic Regression (MR-SR), a general framework that divides the original input data space of symbolic regression problems into subspaces (regions), generates different solutions to fit these regions and then combines them. MR-SR has three main components: (1) a strategy for finding the different regions of the data input space; (2) a method for generating the functions for each region; and (3) a strategy for combining the models found by (2). The main contribution of this paper is on how we generate the functions for each region. We model the function generation problem following a multi-objective approach, where each objective corresponds to the quality of the evolved function in a region, and the number of objectives is equal to the number of regions of the data input space. We test MR-SR in two scenarios with different objectives. In the first, we used the new approach to solve the symbolic regression problem with standard GP, with the main objective of reducing error rate. In the second, we took advantage of this method for a different purpose: to reduce the dimensionality of the semantic space of a variation of GP, namely Geometric Semantic Genetic Programming (GSGP). Results in 10 datasets showed that the method using clustering k-means and a model switching strategy—which makes predictions using the best evolved function for the region of interest—obtained better results in 5 out of 10 datasets for GP with 2 regions. For GSGP the framework was less effective due to the lack of diversity of the solutions evolved.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  • Archetti F, Lanzeni S, Messina E, Vanneschi L (2006) Genetic programming for human oral bioavailability of drugs. In: Proceedings of the 8th annual conference on Genetic and evolutionary computation. pp 255–262

  • Arnaldo I, Krawiec K, O’Reilly U-M (2014) Multiple regression genetic programming. In: Proceedings of the 2014 annual conference on genetic and evolutionary computation. ACM, pp 879–886

  • Brazdil P, Carrier CG, Soares C, Vilalta R (2008) Metalearning: applications to data mining. Springer, Berlin

    MATH  Google Scholar 

  • Breiman L (2001) Random forests. Mach Learn 45(1):5–32. https://doi.org/10.1023/A:1010933404324

    Article  MATH  Google Scholar 

  • Casadei F, Martins JFBS, Pappa GL (2019) A multi-objective approach for symbolic regression with semantic genetic programming. In 2019 8th Brazilian conference on intelligent systems (BRACIS). pp 66–71

  • Castelli M, Silva S, Vanneschi L (2015) A c++ framework for geometric semantic genetic programming. Genet Program Evol Mach 16(1):73–81

    Article  Google Scholar 

  • Castelli M, Gonçalves I, Manzoni L, Vanneschi L (2018) Pruning techniques for mixed ensembles of genetic programming models. In: European conference on genetic programming. pp 52–67

  • Chen Q (2018) Improving the generalisation of genetic programming for symbolic regression, PhD thesis, Victoria University of Wellington

  • Coello CAC, Lamont GB, Van Veldhuizen DA et al (2007) Evolutionary algorithms for solving multi-objective problems, vol 5. Springer, Berlin

    MATH  Google Scholar 

  • De Stefano C, Folino G, Fontanella F, Di Freca AS (2014) Using bayesian networks for selecting classifiers in gp ensembles. Inf Sci 258:200–216

    Article  MathSciNet  Google Scholar 

  • Deb K, Pratap A, Agarwal S, Meyarivan T (2002) A fast and elitist multiobjective genetic algorithm: Nsga-ii. IEEE Trans Evol Comput 6(2):182–197

    Article  Google Scholar 

  • Dua D, Graff C (2017) UCI machine learning repository, 2017. http://archive.ics.uci.edu/ml

  • Ester M, Kriegel H-P, Sander J, Xu X et al (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. Kdd 96(34):226–231

    Google Scholar 

  • Gagné C, Sebag M, Schoenauer M, Tomassini M (2007) Ensemble learning for free with evolutionary algorithms? In: Proceedings of the 9th annual conference on Genetic and evolutionary computation. pp 1782–1789

  • Galván E, Schoenauer M (2019) Promoting semantic diversity in multi-objective genetic programming. In: Proceedings of the genetic and evolutionary computation conference. ACM, pp 1021–1029

  • Galván-López E, Mezura-Montes E, ElHara OA, Schoenauer M (2016) On the use of semantics in multi-objective genetic programming. In: International conference on parallel problem solving from nature. Springer, pp 353–363

  • Hisao I, Noritaka T, Yusuke N (2008) Evolutionary many-objective optimization: a short review. In: 2008 IEEE congress on evolutionary computation (IEEE world congress on computational intelligence), June 2008, pp 2419–2426

  • Kommenda M, Kronberger G, Affenzeller M, Winkler SM, Burlacu B (2016) Evolving simple symbolic regression models by multi-objective genetic programming. In: Genetic programming theory and practice XIII. Springer, pp 1–19

  • Koza JR (1992) Genetic programming: on the programming of computers by means of natural selection. MIT Press, London

    MATH  Google Scholar 

  • McDermott J, White DR, Luke S, Manzoni L, Castelli M, Vanneschi L, Jaskowski W, Krawiec K, Harper R, De Jong K et al (2012) Genetic programming needs better benchmarks. In: Proceedings of the 14th annual conference on Genetic and evolutionary computation. pp 791–798

  • Moraglio A (2014) An efficient implementation of gsgp using higher-order functions and memoization. Semantic Methods in Genetic Programming, Ljubljana, Slovenia, 13

  • Moraglio A, Krawiec K, Johnson CG (2012) Geometric semantic genetic programming. Springer, Berlin, Heidelberg, pp 21–31

    Book  Google Scholar 

  • Oliveira LOV, Miranda LF, Pappa GL, Otero FE, Takahashi RH (2016) Reducing dimensionality to improve search in semantic genetic programming. In: International conference on parallel problem solving from nature. Springer, pp 375–385

  • Oliveira LOV, Otero FE, Pappa GL, Albinati J (2015) Sequential symbolic regression with genetic programming. In: Genetic programming theory and practice XII. Springer, pp 73–90

  • Potter MA, De Jong KA (1994) A cooperative coevolutionary approach to function optimization. In: International conference on parallel problem solving from nature. Springer, pp 249–257

  • Smits GF, Kotanchek M (2005) Pareto-front exploitation in symbolic regression. In: Genetic programming theory and practice II. Springer, pp 283–299

  • Tsai C-F, Eberle W, Chu C-Y (2013) Genetic algorithms in feature and instance selection. Knowl Based Syst 39:240–247

    Article  Google Scholar 

  • Veeramachaneni K, Derby O, Sherry D, O’Reilly U-M (2013) Learning regression ensembles with genetic programming at scale. In: Proceedings of the 15th annual conference on genetic and evolutionary computation. pp 1117–1124

  • Vladislavleva EJ, Smits GF, Den Hertog D (2008) Order of nonlinearity as a complexity measure for models generated by symbolic regression via pareto genetic programming. IEEE Trans Evol Comput 13(2):333–349

    Article  Google Scholar 

  • Xu D, Tian Y (2015) A comprehensive survey of clustering algorithms. Ann Data Sci 2(2):165–193

    Article  MathSciNet  Google Scholar 

  • Zaki MJ, Meira Jr W (2014) Data mining and analysis: fundamental concepts and algorithms. Cambridge University Press, Cambridge

    Book  Google Scholar 

Download references

Acknowledgements

This work was partially supported by CNPq and Fapemig.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gisele L. Pappa.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1: GP and GSGP results of training error

See Table 8.

Table 8 Median training RMSE for different strategies for combining models trained by NSGA-II with GP

Appendix 2: GP and GSGP results of k-GP using other combination strategies

See Tables 9, 10, 11.

Table 9 Median training RMSE for different strategies for combining models trained by NSGA-II with GSGP
Table 10 Median test RMSE for k-GP using unweighted (UA) and weighted average (WA) strategies for combining models
Table 11 Median test RMSE for k-GSGP using unweighted (UA) and weighted average (WA) strategies for combining models

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Casadei, F., Pappa, G.L. Multi-region symbolic regression: combining functions under a multi-objective approach. Nat Comput 20, 753–773 (2021). https://doi.org/10.1007/s11047-021-09851-5

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11047-021-09851-5

Keywords

Navigation