Skip to main content
Log in

Rule-based Bayesian regression

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

We introduce a novel rule-based approach for handling regression problems. The new methodology carries elements from two frameworks: (i) it provides information about the uncertainty of the parameters of interest using Bayesian inference, and (ii) it allows the incorporation of expert knowledge through rule-based systems. The blending of those two different frameworks can be particularly beneficial for various domains (e.g., engineering), where even though the significance of uncertainty quantification motivates a Bayesian approach, there is no simple way to incorporate researcher intuition into the model. We validate our models by applying them to synthetic applications: a simple linear regression problem and two more complex structures based on partial differential equations, and we illustrate their use through two cases derived from real data. Finally, we review the advantages of our methodology, which include the simplicity of the implementation, the uncertainty reduction due to the added information and, in some occasions, the derivation of better point predictions, and we outline limitations, mainly from the computational complexity perspective, such as the difficulty in choosing an appropriate algorithm and the added computational burden.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22

Similar content being viewed by others

Notes

  1. https://github.com/themisbo/Rule-based-Bayesian-regr.

References

  • Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M., et al. (2016). Tensorflow: a system for large-scale machine learning. In: 12th \(\{USENIX\}\) symposium on operating systems design and implementation (\(\{OSDI\}\) 16), pp. 265–283

  • Bar-Sinai, Y., Hoyer, S., Hickey, J., Brenner, M.P.: Learning data-driven discretizations for partial differential equations. Proc. Natl. Acad. Sci. 116(31), 15344–15349 (2019)

    Article  MathSciNet  Google Scholar 

  • Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, Berlin (2006)

    MATH  Google Scholar 

  • Breiman, L., Friedman, J., Stone, C.J., Olshen, R.A.: Classification and Regression Trees. CRC Press, Boca Raton (1984)

    MATH  Google Scholar 

  • Ching, J., Chen, Y.-C.: Transitional Markov chain Monte Carlo method for Bayesian model updating, model class selection, and model averaging. J. Eng. Mech. 133(7), 816–832 (2007)

    Article  Google Scholar 

  • Chipman, H.A., George, E.I., McCulloch, R.E., et al.: BART: Bayesian additive regression trees. Ann. Appl. Stat. 4(1), 266–298 (2010)

    Article  MathSciNet  Google Scholar 

  • de Boor, C. (1978). A practical guide to spline, volume 27. Springer (New York, NY [ua])

  • Gelman, A., Carlin, J.B., Stern, H.S., Dunson, D.B., Vehtari, A., Rubin, D.B.: Bayesian Data Analysis. CRC Press, Boca Raton (2013)

    Book  Google Scholar 

  • González-Díaz, A., Alcaráz-Calderón, A.M., González-Díaz, M.O., Méndez-Aranda, Á., Lucquiaud, M., González-Santaló, J.M.: Effect of the ambient conditions on gas turbine combined cycle power plants with post-combustion CO2 capture. Energy 134, 221–233 (2017)

    Article  Google Scholar 

  • Hastings, W.K.: Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57(1), 97–109 (1970)

    Article  MathSciNet  Google Scholar 

  • Hoyer, S., Zhuang, J. (2020). Data driven discretizations for solving 2D PDEs. https://github.com/google-research/data-driven-pdes

  • Kaya, H., Tüfekc\(\dot{i}\), P., Uzun, E. (2019). Predicting CO and NOx emissions from gas turbines: novel data and a benchmark PEMS. Turk. J. Electr. Eng. Comput. Sci. 27(6):4783–4796

  • Kharratzadeh, M. (2017). Splines in Stan. https://github.com/milkha/Splines_in_Stan/blob/master/splines_in_stan.pdf

  • Lakshminarayanan, B., Roy, D.M., Teh, Y.W. (2016). Mondrian forests for large-scale regression when uncertainty matters. In: Artificial Intelligence and Statistics, pp. 1478–1487

  • Lundberg, S.M., Erion, G., Chen, H., DeGrave, A., Prutkin, J.M., Nair, B., Katz, R., Himmelfarb, J., Bansal, N., Lee, S.-I.: From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell. 2(1), 2522–5839 (2020)

    Article  Google Scholar 

  • Minson, S., Simons, M., Beck, J.: Bayesian inversion for finite fault earthquake source models I-Theory and algorithm. Geophys. J. Int. 194(3), 1701–1726 (2013)

  • Nelder, J.A., Mead, R.: A simplex method for function minimization. Comput. J. 7(4), 308–313 (1965)

  • O’Hagan, A.: Expert knowledge elicitation: subjective but scientific. Am. Stat. 73(sup1), 69–81 (2019)

    Article  MathSciNet  Google Scholar 

  • Pan, I., Bester, D.: Fuzzy Bayesian learning. IEEE Trans. Fuzzy Syst. 26(3), 1719–1731 (2017)

    Article  Google Scholar 

  • Pan, I., Bester, D.: Marginal likelihood based model comparison in Fuzzy Bayesian Learning. IEEE Trans. Emerg. Topics Comput. Intell. 4(6), 794–799 (2018)

    Article  Google Scholar 

  • Rasmussen, C.E. (2003). Gaussian processes in machine learning. In: Summer School on Machine Learning, pp. 63–71. Springer

  • Rochford, A. (2017). A PyMC3 port of Splines in Stan. https://gist.github.com/AustinRochford/d640a240af12f6869a7b9b592485ca15

  • Salvatier, J., Wiecki, T.V., Fonnesbeck, C.: Probabilistic programming in python using PyMC3. PeerJ Comput. Sci. 2, e55 (2016)

    Article  Google Scholar 

  • Stan Development Team: RStan: the R interface to Stan. R package version 2(19), 1 (2019)

  • Tüfekci, P.: Prediction of full load electrical power output of a base load operated combined cycle power plant using machine learning methods. Int. J. Electr. Power Energy Syst. 60, 126–140 (2014)

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by Wave 1 of The UKRI Strategic Priorities Fund under the EPSRC Grant EP/T001569/1, particularly the Digital Twins for Complex Engineering Systems theme within that grant and The Alan Turing Institute. IP acknowledges funding from the Imperial College Research Fellowship scheme. We acknowledge Dr. Daya Shankar Pandey at University of Huddersfield, UK, who is a power plant expert and helped with the rule elicitation in Sect. 4.5.3.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Themistoklis Botsas.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Botsas, T., Mason, L.R. & Pan, I. Rule-based Bayesian regression. Stat Comput 32, 44 (2022). https://doi.org/10.1007/s11222-022-10100-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11222-022-10100-7

Keywords

Navigation