Introduction

The formal departure from the linear modeling arguably starts with the development of derivatives of basic functions by Newton (1687) and von Leibnitz (1920). Nevertheless, the seminal work of Newton and Leibnitz on nonlinearity was implemented mainly on relatively simple formulations, such as trigonometric, power, or exponential functions. Significant advancements in modeling occurred in 1715 when Taylor (1715) presented an approximation of any function that is locally any continuously differentiable with a polynomial function. Even after the introduction of the Taylor series, application to environmental processes was limited, until 1877, when Galton (1877) developed linear regression. One of the main advancements of the regression was the ability to represent nonlinearity by transforming the variables (Schumacher and Hall 1933; Warton and Hui 2011), in most instances the predictors (Neter et al. 1996). The linear regression coefficients were estimated for almost two hundred years with the least square method (Cotes 1722). However, even when the assumptions of the least squares method are met, the transformation of the predictors did not necessarily supply the desired results. In those cases, transformation of the predicted was executed to improve the results. However, the bias induced by the transformation of the dependent variable was formally addressed more than half century later by Williams (1937) and Cochran (1938). Nevertheless, bias correction for the case when the predicted variable was changed was developed only for few transformations, such as the logarithm function (Finney 1941). Transformation of the dependent variable without correcting it was present even after the seminal paper of Neyman and Scott (1960), who proposed a bias-correction framework for almost all functions. The complex implementation of the Neyman and Scott framework (Neyman and Scott 1960) was overcome by the development of the generalized linear models (GLM) by Nelder and Wedderburn (1972). The assumptions of GLM limited its application to a reduced number functions, such as the logistic function.

The development in the information technology at the end of the second millennium, which allowed massive computations in a short amount of time, recommended new procedures for modeling complex nonlinear functions. For more than 50 years, the main estimators were nonlinear least squares, as proposed by Levenberg (1944) and Marquardt (1963), and the restricted maximum likelihood, as proposed Bartlett (1937) and formalized by Patterson and Thompson (1971). Both methods are suboptimal as either considered only a portion of the data, the case of restricted maximum likelihood, or do not search the entire solution space, the case of the nonlinear least squares. Therefore, new procedures were proposed, which are based on complex heuristics (Hoos and Stutzle 2005; Talbi 2009). The heuristic methods, such as simulated annealing, genetic algorithms, or particle swarm optimization, have the ability to either find the actual values of the parameters defining the nonlinear model or supply values close to the actual values in a reasonable amount of time (Aledo et al. 2016; Prieto-Escobar et al. 2018; Özsoy et al. 2020). A wave of developments of heuristic algorithms aiming at the estimation of parameters of nonlinear relationships happened at the beginning of the third millennium, such as Pujol (2007), Yuan (2011, 2015), or Chen et al. (2008), to cite just a few. However, the sophistication of the heuristic techniques relies on the fact that an approximation of the solution is obtained. In many instances, the heuristic solutions are so close to the actual solution, that there no practical reason to spend more effort in attaining better results (Bettinger et al. 2002). However, process-based modeling (Korzukhin et al. 1996) is sensitive to the solution supplied by the heuristic techniques, as incorrect relationships can alter fundamentally the behavior, and consequently the interpretation, of the ecosystem dynamic.

In many instances, common algorithms based on heuristics (e.g., steepest descent, Gauss–Newton, or Marquardt) estimate the parameters of nonlinear relationship with opposite signs than the actual ones. Bayesian approaches (Gelman et al. 2003) can lead to similar results, as proven by Amarioarei et al. (2020). To prove the impact of the estimation procedure on the parameters to be estimated, we use an example. Let assume that a process can be modeled with the equation:

$$y = \arctan {{\left( {5 - 5 \times \exp \left( {\frac{\sqrt x - x}{{1000}}} \right)} \right)} \mathord{\left/ {\vphantom {{\left( {5 - 5 \times \exp \left( {\frac{\sqrt x - x}{{1000}}} \right)} \right)} {1.2}}} \right. \kern-\nulldelimiterspace} {1.2}} + e,$$
(1)

where x is the predictor, y the response variable, and tan(y) at a given x is normally distributed with mean μx and standard deviation of σx2.

If synthetic y is generated for x varying from 1 to 500, assuming a normal distribution of the residuals of tan(y) with variance 0.01 (Fig. 1), then the parameters of Eq. 1 can be estimated using the nonlinear model

$$y = \arctan {{\left( {b_{0} + b_{1} \times \exp \left( {b_{2} \sqrt x + b_{3} x} \right)} \right)} \mathord{\left/ {\vphantom {{\left( {b_{0} + b_{1} \times \exp \left( {b_{2} \sqrt x + b_{3} x} \right)} \right)} {b_{4} }}} \right. \kern-\nulldelimiterspace} {b_{4} }} + e.$$
(2)
Fig. 1
figure 1

Generated data based on Eq. 1

The solution of Eq. 2 varies with the estimation algorithm, as the SAS implementation of the Marquardt algorithm supplies the values b0 = −0.122, b1 = 0.0868, b2 = 0.2262, b3 = –0.00305, and b4 = 1.3712, whereas the Gauss–Newton algorithm leads to b0 = 2.4274, b1 = -2.3739, b2 = 0.0078, b3 = -0.0023, and b4 = 1.0988. The difference between the computed values and the actual values is not necessarily important for this argument; what is important is the change in the sign of b0 and b1 between algorithms, which triggers a different model interpretation. The generated model has b0 positive and b1 negative, whereas the Marquardt algorithm supplies the opposite values for both and Gauss–Newton consistent with the sign of the model. Therefore, the heuristics employed in estimation of nonlinear models can supply incorrect results because of the algorithm. However, if the dependent variable is transformed using the tangent function, then the coefficient would have the correct sign but it would produce biased results when the predicted variable is back-transformed. Nevertheless, if correction of the bias induced by the nonlinear transformation of the dependent variable is applied, then the back-transformation would produce unbiased values.

The reduced number of nonlinear functions for which unbiased estimation exists (Nelder and Wedderburn 1972), the difficult to implement framework proposed by Neyman and Scott (1960) for bias correction when the dependent variable is transformed, and the lack of accuracy associated with heuristics prompted the development of unbiased estimates for 10 transformations of the predicted variable that are commonly encountered in forest modeling (Strimbu et al. 2017). For the 10 functions, complex nonlinear models can be developed exactly, as parameters are estimated without using heuristics. Furthermore, a sequential transformation of the dependent variable can now be applied, as the estimated values are accurate and precise.

The approach proposed by Strimbu et al. (2017), which avoids heuristic estimations, provides unbiased results when transforming the predicted variable, Y, with a differentiable function f. The method for correcting the bias induced by the change of the dependent variable is based on the assumption that between f(Y) and a set of predictor variables, X, there is a linear relationship

$$f\left( {\varvec{Y}} \right) = {\varvec{Xb}} + {\varvec{\varepsilon}},$$
(3)

where b is the vector of coefficients for the independent variables X, ε are the residuals, which are normally distributed with mean 0 and variance σ2, \({\varvec{\varepsilon}}\sim N\left( {0,\sigma^{2} } \right)\).

The bias correction based on Eq. 3 computed explicitly the mean of Y|X for 10 commonly used functions. Because the estimates for eight of these functions (i.e., sine, cosine, tangent, arc sine, arc cosine, arc tangent, hyperbolic sine, and hyperbolic tangent functions) contains a Taylor series expansion, the formulas contain an infinite number of terms. The simulated data used by the author to guide the selection of the number of terms present in the Taylor series expansion is likely to be challenged by real problems, which are more complex than simulated data. Also, the method presented by Strimbu et al. (2017) does not provide unbiased estimates for the variance of the back-transformed Y. Estimation of variance of the predicted values is mandatory for computing the confidence intervals. Therefore, the objective of the present study is twofold: first, to present the unbiased estimates of the variance of the back-transformed variable, and second, to estimate a computational efficient Taylor series expansion that would provide unbiased results for the transformations involving Taylor series expansion. To illustrate the nonlinear estimation method advocated by this study, we present three forestry applications: one on site productivity, one on stem taper, and one on straw decomposition.

Methods

Foundation

Equation 3 can be rewritten \(f_{1} \left( {\varvec{Y}} \right) = {\varvec{Y}}_{1} = {\varvec{Xb}} + \varepsilon\), for which an unbiased estimation of Y given X = x is:

$${\varvec{Y}}|_{x} = f_{1}^{ - 1} \left( {{\varvec{Xb}} + {\varvec{\varepsilon}}} \right)|_{{x {\prime }}}.$$
(4)

When another transformation is applied to the first transformation, f2 ○ f1, then Eq. 3 becomes

$$f_{2} \left( {{\varvec{Y}}_{1} } \right) = f_{2} \left( {f_{1} \left( {\varvec{Y}} \right)} \right) = {{\varvec{X}}^{\prime}b^{\prime}} + {{\varvec{\varepsilon}}^{\prime}}$$
(5)

for which an unbiased estimator of Y at xʹ is according to Shanks and Gambill (1973):

$${\varvec{Y}}|_{{x^{\prime}}} = f_{1}^{ - 1} (f_{2}^{ - 1} \left( {{\varvec{X}}^{\prime}b^{\prime}} + {\varvec{\varepsilon}} ^{\prime}\right)|_{{x^{\prime}}},$$
(6)

where \({\varvec{\varepsilon}} ^{\prime}\) is a normal distributed residual with mean 0 and variance \(\sigma ^{{\prime}{2}}\).

A series of unbiased back-transformations of nonlinear functions, each containing a finite number of terms, concludes with an unbiased estimate. Nevertheless, when an infinite number of terms are required but only few are used, the compounding from Eq. 5 could lead to computational errors (Amarioarei et al. 2020). Significant deviation from the actual model can occur because of the truncation, even when only one transformation is present, like Eq. 4. To assess the impact of truncation on the transformations, we focused on the functions that contains a Taylor series expansion, as computed by Strimbu et al. (2017). However, the identification of an efficient selection of terms for the Taylor series expansion that would produce unbiased numerical estimates is not sufficient for modeling purposes. The variance of the predicted values is also required. In this study, we have also included formulas for the variance \({\text{Var}}\left( Y \right) = {\mathbb{E}}\left[ {Y^{2} } \right] - {\mathbb{E}}^{2} \left[ Y \right]\) (Grimmett and Stirzaker 2002), which requires the estimation of the mean, \({\mathbb{E}}\left[ Y \right]\), and of the second moment, \({\mathbb{E}}\left[ {Y^{2} } \right]\). The formulas for the expectation of Y were proven by (Strimbu et al. 2017); therefore, in this study we will include results for the \({\mathbb{E}}\left[ {Y^{2} } \right]\). The final formulations for a series of common transformations are (proof in “Appendix”):

  • Power: f(y) = ya with a > 0 and y ∈ (0, ∞)

$$\begin{aligned}{\mathbb{E}}\left[ Y \right] &= \frac{1}{{\sqrt {2\pi } }}\mathop \sum \limits_{k = 0}^{\infty } \left( {\begin{array}{*{20}c} \frac{1}{a} \\ k \\ \end{array} } \right)\xi^{{\frac{1}{a} - k}} \sigma^{k} \mathop \int \nolimits_{{ - \frac{\xi }{\sigma }}}^{\infty } t^{k} e^{{ - \frac{{t^{2} }}{2}}} \,{\text{d}}t \\ & = \frac{1}{{\sqrt {2\pi } }}\mathop \sum \limits_{k = 0}^{\infty } \left( {\begin{array}{*{20}c} \frac{1}{a} \\ k \\ \end{array} } \right)\xi^{{\frac{1}{a} - k}} \sigma^{k} I\left( {\alpha ,k} \right) \end{aligned}$$
(7)
$$\begin{aligned} {\mathbb{E}}\left[ {Y^{2} } \right] & = \frac{1}{{\sqrt {2\pi } }}\mathop \sum \limits_{k = 0}^{\infty } \left( {\begin{array}{*{20}c} \frac{2}{a} \\ k \\ \end{array} } \right)\xi^{{\frac{2}{a} - k}} \sigma^{k} \mathop \int \nolimits_{{ - \frac{\xi }{\sigma }}}^{\infty } t^{k} e^{{ - \frac{{t^{2} }}{2}}} \,{\text{d}}t \\ & = \frac{1}{{\sqrt {2\pi } }}\mathop \sum \limits_{k = 0}^{\infty } \left( {\begin{array}{*{20}c} \frac{2}{a} \\ k \\ \end{array} } \right)\xi^{{\frac{2}{a} - k}} \sigma^{k} I\left( {\alpha ,k} \right) \end{aligned}$$
(8)
  • Sine: \(f(y) = \sin (y)\) and \(y \in \left[ { - \frac{\pi }{2},\frac{\pi }{2}} \right]\), for which \(\alpha = - (1 + \xi )/\sigma\) and \(\beta = (1 - \xi )/\sigma\)

$$\begin{aligned}{\mathbb{E}}\left[ Y \right] & = \frac{1}{{\sqrt {2\pi } }}\mathop \sum \limits_{n = 0}^{\infty } \frac{{\left( {2n - 1} \right)!!}}{{\left( {2n} \right)!!}}\frac{1}{2n + 1}\\& \quad \mathop \sum \limits_{k = 0}^{2n + 1} \left( {\begin{array}{*{20}c} {2n + 1} \\ k \\ \end{array} } \right)\xi^{2n + 1 - k} \sigma^{k} I\left( {\alpha ,\beta ,k} \right) \end{aligned}$$
(9)
$$\begin{aligned}{\mathbb{E}}\left[ {Y^{2} } \right] & = \frac{1}{{\sqrt {2\pi } }}\mathop \sum \limits_{n = 0}^{\infty } \frac{{4^{n} (n!)^{2} }}{{\left( {2n + 1} \right)!\left( {n + 1} \right)}}\\ & \quad \mathop \sum \limits_{k = 0}^{2n + 2} \left( {\begin{array}{*{20}c} {2n + 2} \\ k \\ \end{array} } \right)\xi^{2n + 2 - k} \sigma^{k} I\left( {\alpha ,\beta ,k} \right) \end{aligned}$$
(10)
  • Cosine: \(f(y) = \cos (y)\) and \(y \in \left[ {0,\pi } \right]\), for which α = -(1 + ξ) and β = (1 − ξ)

$${\mathbb{E}}\left[ Y \right] = \frac{\pi }{2}\left( {\Phi \left( \beta \right) - \Phi \left( \alpha \right)} \right) - \frac{1}{{\sqrt {2\pi } }}\mathop \sum \limits_{n = 0}^{\infty } \frac{{\left( {2n - 1} \right)!!}}{{\left( {2n} \right)!!}}\frac{1}{2n + 1}\mathop \sum \limits_{k = 0}^{2n + 1} \left( {\begin{array}{*{20}c} {2n + 1} \\ k \\ \end{array} } \right)\xi^{2n + 1 - k} \sigma^{k} I\left( {\alpha ,\beta ,k} \right)$$
(11)
$$\begin{gathered} {\mathbb{E}}\left[ {Y^{2} } \right] = \frac{1}{{\sqrt {2\pi } }}\mathop \sum \limits_{n = 0}^{\infty } \left[ { - \frac{{\left( {2n - 1} \right)!!}}{{\left( {2n} \right)}}\frac{\pi }{2n + 1}\mathop \sum \limits_{k = 0}^{2n + 1} \left( {\begin{array}{*{20}c} {2n + 1} \\ k \\ \end{array} } \right)\xi^{2n + 1 - k} \sigma^{k} I\left( {\alpha ,\beta ,k} \right)} \right. \hfill \\ \left. { + \frac{{4^{n} (n!)^{2} }}{{\left( {2n + 1} \right)!\left( {n + 1} \right)}}\mathop \sum \limits_{k = 0}^{2n + 2} \left( {\begin{array}{*{20}c} {2n + 2} \\ k \\ \end{array} } \right)\xi^{2n + 2 - k} \sigma^{k} I\left( {\alpha ,\beta ,k} \right)} \right] + \frac{{\pi^{2} }}{4}\left( {\Phi \left( \beta \right) - \Phi \left( \alpha \right)} \right) \hfill \\ \end{gathered}$$
(12)
  • Tangent: \(f(y) = \tan (y)\) and \(y \in \left[ {0,\frac{\pi }{4}} \right]\), for which α = − ξ/σ and β = (1 − ξ)

$${\mathbb{E}}\left[ Y \right] = \frac{1}{{\sqrt {2\pi } }}\mathop \sum \limits_{n = 0}^{\infty } \frac{{( - 1)^{n} }}{2n + 1}\mathop \sum \limits_{k = 0}^{2n + 1} \left( {\begin{array}{*{20}c} {2n + 1} \\ k \\ \end{array} } \right)\xi^{2n + 1 - k} \sigma^{k} I\left( {\alpha ,\beta ,k} \right)$$
(13)
$${\mathbb{E}}\left[ {Y^{2} } \right] = \frac{1}{{\sqrt {2\pi } }}\mathop \sum \limits_{n = 0}^{\infty } \frac{{( - 1)^{n} \left( {2H_{2n + 1} - H_{n} } \right)}}{2n + 1}\mathop \sum \limits_{k = 0}^{2n + 2} \left( {\begin{array}{*{20}c} {2n + 2} \\ k \\ \end{array} } \right)\xi^{2n + 2 - k} \sigma^{k} I\left( {\alpha ,\beta ,k} \right),$$
(14)

where \(H_{n} = \mathop \sum \limits_{k = 0}^{n} \frac{1}{k}\) is the nth harmonic number.

  • Arcsine: \(f(y) = \arcsin (y)\) and \(y \in \left[ { - 1,1} \right]\), for which α =  (π/2 + ξ) and β = (π/2 − ξ)

$${\mathbb{E}}\left[ Y \right] = \frac{1}{{\sqrt {2\pi } }}\mathop \sum \limits_{n = 0}^{\infty } \frac{{( - 1)^{n} }}{{\left( {2n + 1} \right)!}}\mathop \sum \limits_{k = 0}^{2n + 1} \left( {\begin{array}{*{20}c} {2n + 1} \\ k \\ \end{array} } \right)\xi^{2n + 1 - k} \sigma^{k} I\left( {\alpha ,\beta ,k} \right)$$
(15)
$${\mathbb{E}}\left[ {Y^{2} } \right] = \frac{1}{2}\left[ {\Phi \left( \beta \right) - \Phi \left( \alpha \right) - \frac{1}{{\sqrt {2\pi } }}\mathop {\mathop {\mathop \sum \limits }\limits^{\infty } }\limits_{n = 0} \frac{{( - 1)^{n} 4^{n} }}{{\left( {2n} \right)!}}\mathop {\mathop {\mathop \sum \limits }\limits^{2n} }\limits_{k = 0} \left( {\begin{array}{*{20}c} {2n} \\ k \\ \end{array} } \right)\xi^{2n - k} \sigma^{k} I\left( {\alpha ,\beta ,k} \right)} \right]$$
(16)
  • Arccosine: \(f\left( y \right) = \arccos (y)\) and \(y \in \left[ { - 1,1} \right]\), for which α = -ξ/σ and β = (π − ξ)

$${\mathbb{E}}\left[ Y \right] = \frac{1}{{\sqrt {2\pi } }}\mathop \sum \limits_{n = 0}^{\infty } \frac{{( - 1)^{n} }}{{\left( {2n} \right)!}}\mathop \sum \limits_{k = 0}^{2n} \left( {\begin{array}{*{20}c} {2n} \\ k \\ \end{array} } \right)\xi^{2n - k} \sigma^{k} I\left( {\alpha ,\beta ,k} \right)$$
(17)
$${\mathbb{E}}\left[ {Y^{2} } \right] = \frac{1}{2}\left[ {\Phi \left( \beta \right) - \Phi \left( \alpha \right) + \frac{1}{{\sqrt {2\pi } }}\mathop {\mathop {\mathop \sum \limits }\limits^{\infty } }\limits_{n = 0} \frac{{( - 1)^{n} 4^{n} }}{{\left( {2n} \right)!}}\mathop {\mathop {\mathop \sum \limits }\limits^{2n} }\limits_{k = 0} \left( {\begin{array}{*{20}c} {2n} \\ k \\ \end{array} } \right)\xi^{2n - k} \sigma^{k} I\left( {\alpha ,\beta ,k} \right)} \right]$$
(18)
  • Arctangent: \(f\left( y \right) = \arctan (y)\) with \(y \in \left[ {0,\frac{\pi }{4}} \right]\), for which \(\alpha = \frac{{ - \frac{\pi }{4} - \xi }}{\sigma }\) and \(\beta = \frac{{\frac{\pi }{4} - \xi }}{\sigma }\)

$${\mathbb{E}}\left[ Y \right] = \frac{1}{{\sqrt {2\pi } }}\mathop \sum \limits_{n = 0}^{\infty } \frac{{( - 1)^{n} 2^{2n} \left( {2^{2n} - 1} \right)B_{2n} }}{{\left( {2n} \right)!}}\mathop \sum \limits_{k = 0}^{2n - 1} \left( {\begin{array}{*{20}c} {2n - 1} \\ k \\ \end{array} } \right)\xi^{2n - 1 - k} \sigma^{k} I\left( {\alpha ,\beta ,k} \right),$$
(19)

where Bn is the nth Bernoulli number, computed recursively starting from n = 0 and B0 = 1 with the formula \(B_{n} = - \sum\limits_{k = 0}^{n - 1} {\left( {\begin{array}{*{20}c} n \\ k \\ \end{array} } \right)\frac{{B_{k} }}{n - k - 1}} ,\,\,n \ge 1\).

  • Hyperbolic sine: \(f\left( y \right) = {\text{sinh}}\left( y \right) = \frac{{e^{y} - e^{ - y} }}{2}\) with \(y \in \left[ {0,a} \right]\), for which α = − ξ/σ and β = (1 − ξ)

$${\mathbb{E}}\left[ Y \right] = \frac{1}{{\sqrt {2\pi } }}\mathop \sum \limits_{n = 0}^{\infty } \frac{{( - 1)^{n} \left( {2n} \right)!}}{{2^{2n} (n!)^{2} \left( {2n + 1} \right)}}\mathop \sum \limits_{k = 0}^{2n + 1} \left( {\begin{array}{*{20}c} {2n + 1} \\ k \\ \end{array} } \right)\xi^{2n + 1 - k} \sigma^{k} I\left( {\alpha ,\beta ,k} \right)$$
(20)
$${\mathbb{E}}\left[ {Y^{2} } \right] = \frac{1}{{\sqrt {2\pi } }}\mathop \sum \limits_{n = 0}^{\infty } \frac{{( - 4)^{n} (n!)^{2} }}{{\left( {n + 1} \right)\left( {2n + 1} \right)!}}\mathop \sum \limits_{k = 0}^{2n + 2} \left( {\begin{array}{*{20}c} {2n + 2} \\ k \\ \end{array} } \right)\xi^{2n + 2 - k} \sigma^{k} I\left( {\alpha ,\beta ,k} \right)$$
(21)
  • Hyperbolic arcsine: \(f\left( y \right) = \mathrm{arcsinh}\left( y \right) = \ln \left( {y + \sqrt {y^{2} + 1} } \right),\quad y \in \left[ {0,\infty } \right)\)

$${\mathbb{E}}\left[ Y \right] = \frac{1}{2}e^{{\frac{{\sigma^{2} }}{2}}} \left[ {2\sinh \left( \xi \right) + e^{ - \xi } \Phi \left( { - \frac{\xi }{\sigma } - \sigma } \right) - e^{\xi } \Phi \left( { - \frac{\xi }{\sigma } + \sigma } \right)} \right]$$
(22)
$${\mathbb{E}}\left[ {Y^{2} } \right] = - \frac{1}{2}\left[ {1 - \Phi \left( { - \frac{\xi }{\sigma }} \right)} \right] + \frac{{e^{{2\sigma^{2} }} }}{2}\left[ {2\cosh \left( {2\xi } \right) - e^{2\xi } \Phi \left( { - \frac{\xi }{\sigma } - 2\sigma } \right) - e^{ - 2\xi } \Phi \left( { - \frac{\xi }{\sigma } + 2\sigma } \right)} \right].$$
(23)
  • Hyperbolic tangent: \(f(y) = \tanh \left( y \right) = \frac{{e^{y} - e^{ - y} }}{{e^{y} + e^{ - y} }}\) with \(y \in \left[ {0,\infty } \right)\), for which α = − ξ/σ and β = (1 − ξ)

$${\mathbb{E}}\left[ Y \right] = \frac{1}{{\sqrt {2\pi } }}\mathop \sum \limits_{n = 0}^{\infty } \frac{1}{2n + 1}\mathop \sum \limits_{k = 0}^{2n + 1} \left( {\begin{array}{*{20}c} {2n + 1} \\ k \\ \end{array} } \right)\;\xi^{2n + 1 - k} \sigma^{k} I\left( {\alpha ,\beta ,k} \right)$$
(24)
$${\mathbb{E}}\left[ {Y^{2} } \right] = \frac{1}{{\sqrt {2\pi } }}\mathop \sum \limits_{n = 0}^{\infty } \frac{{2H_{2n + 1} - H_{n} }}{2n + 1}\mathop \sum \limits_{k = 0}^{2n + 2} \left( {\begin{array}{*{20}c} {2n + 2} \\ k \\ \end{array} } \right)\xi^{2n + 2 - k} \sigma^{k} I\left( {\alpha ,\beta ,k} \right)$$
(25)

with \(H_{n}\) being the nth harmonic number.

The symbols used in Eqs. 925 are:

n, k are natural numbers;

\(\left( {\begin{array}{*{20}c} n \\ k \\ \end{array} } \right) = \frac{n!}{{k!\left( {n - k} \right)!}}\) is the binomial coefficient;

\(\left( {2n - 1} \right)!! = 1 \times 3 \times \cdots \times \left( {2n - 1} \right)\) and \(\left( {2n} \right)!! = 2 \times 4 \times \cdots \times \left( {2n} \right)\) are the products of the uneven, or even, positive integers ≤ 2n − 1 or 2n;

$$\xi = x^{T} b,{\text{with }}x^{T} {\text{being the transpose of the vector x}};$$
(26)
$$I\left( {\alpha ,\beta ,n} \right) = \mathop \int \limits_{\alpha }^{\beta } t^{n} e^{{ - \frac{{t^{2} }}{2}}} {\text{d}}t {\text{and}} I\left( {\alpha ,n} \right) = \lim_{\beta \to \infty } I\left( {\alpha ,\beta ,n} \right).$$
(27)

Denoting by \(\phi \left( x \right) = \frac{1}{{\sqrt {2\pi } }}e^{{ - \frac{{x^{2} }}{2}}}\), the standard normal density function and by \(\Phi \left( x \right) = \mathop \int \limits_{ - \infty }^{x} \phi \left( t \right)\,{\text{d}}t\) the standard normal cumulative distribution function, it can be shown (for example, by mathematical induction) that the integral \(I(\alpha ,\beta ,n)\) is given by

$$\left\{ \begin{gathered} \sqrt {2\pi } \left( {n - 1} \right)!!\left[ {\phi \left( \alpha \right)\left( {1 + \mathop \sum \limits_{k = 0}^{{\frac{n - 1}{2} - 1}} \frac{{\alpha^{2k + 2} }}{{\left( {2k + 2} \right)!!}}} \right) - \phi \left( \beta \right)\left( {1 + \mathop \sum \limits_{k = 0}^{{\frac{n - 1}{2} - 1}} \frac{{\beta^{2k + 2} }}{{\left( {2k + 2} \right)!!}}} \right)} \right],\quad n{\text{ odd}} \hfill \\ \sqrt {2\pi } \left( {n - 1} \right)!!\left[ {\Phi \left( \beta \right) - \Phi \left( \alpha \right) + \phi \left( \alpha \right)\mathop \sum \limits_{k = 0}^{{\frac{n}{2} - 1}} \frac{{\alpha^{2k + 1} }}{{\left( {2k + 1} \right)!!}} - \phi \left( \beta \right)\mathop \sum \limits_{k = 0}^{{\frac{n}{2} - 1}} \frac{{\beta^{2k + 1} }}{{\left( {2k + 1} \right)!!}}} \right],\quad n {\text{even}} \hfill \\ \end{gathered} \right.$$
(28)

with initial values \(I\left( {\alpha ,\beta ,0} \right) = \sqrt {2\pi } \left( {\Phi \left( \beta \right) - \Phi \left( \alpha \right)} \right)\) and \(I\left( {\alpha ,\beta ,1} \right) = \sqrt {2\pi } \left( {\phi \left( \alpha \right) - \phi \left( \beta \right)} \right)\).

For \(\beta \to \infty\), the integral from Eq. 28, \(\lim_{\beta \to \infty } I\left( {\alpha ,\beta ,n} \right) = I\left( {\alpha ,n} \right)\), simplifies to:

$$I\left( {\alpha ,n} \right) = \left\{ {\begin{array}{*{20}l} {\sqrt {2\pi } \phi \left( \alpha \right)\left( {n - 1} \right)!!\left[ {1 + \mathop \sum \limits_{k = 0}^{{\frac{n - 1}{2} - 1}} \frac{{\alpha^{2k + 2} }}{{\left( {2k + 2} \right)!!}}} \right],\quad {\text{for}} n {\text{odd}}} \hfill \\ {\sqrt {2\pi } \left( {n - 1} \right)!!\left[ {1 - \Phi \left( \alpha \right) + \phi \left( \alpha \right)\mathop \sum \limits_{k = 0}^{{\frac{n}{2} - 1}} \frac{{\alpha^{2k + 1} }}{{\left( {2k + 1} \right)!!}}} \right], \quad {\text{for}} n {\text{even}}} \hfill \\ \end{array} } \right.$$
(29)

where \(I\left( {\alpha ,0} \right) = \sqrt {2\pi } \left( {1 - \Phi \left( \alpha \right)} \right)\) and \(I\left( {\alpha ,1} \right) = \sqrt {2\pi } \phi \left( \alpha \right)\).

We considered an efficient Taylor series expansion a series for which the addition of a new term fulfills two conditions: first, the improvement of the existing estimates is < 10–5, and second, the line constructed from estimates computed from two consecutive number of terms is almost horizontal (i.e., the slope is < 2°). The conditions mirror the two common approaches used for selection of terms from a series: one based on a preset value (LeVeque 2007), and one based on the scree method (Cattell 1966; Tabachnick and Fidell 2001; Zhu and Ghodsi 2006).

By truncation, the expression of the means from Eqs. 925 can be rewritten in matrix form such that the Taylor series expansion would include only the first N terms:

$$\hat{y} = \mathop \sum \limits_{k = 0}^{2N + 1} \mathop \sum \limits_{n = 0}^{N} c_{k,n} \Xi_{n,k} = {\text{Tr}}\left( {{\mathbf{C\Xi }}} \right),$$
(30)

where \(\hat{y}\) is the predicted, back-transformed variable

$${\mathbf{C}} = \left( {c_{k + 1,n + 1} } \right)_{{\begin{array}{*{20}c} {0 \le k \le 2N + 1} \\ {0 \le n \le N} \\ \end{array} }} \in {\mathcal{M}}_{2N + 2,N + 1} ,$$
(31)
$${{\varvec{\Xi}}} = \left( {\Xi_{n + 1,k + 1} } \right)_{{\begin{array}{*{20}c} {0 \le n \le N} \\ {0 \le k \le 2N + 1} \\ \end{array} }} \in {\mathcal{M}}_{N + 1,2N + 2}$$
(32)
$$\Xi_{n,k} = \left\{ {\begin{array}{*{20}l} {\xi^{2n - k} \sigma^{k} I\left( {\alpha ,\beta ,k} \right){\mathbf{1}}_{{\left\{ {0, \ldots ,2n} \right\}}} \left( k \right),} \hfill & {{\text{arccosine}}} \hfill \\ {\xi^{2n - 1 - k} \sigma^{k} I\left( {\alpha ,\beta ,k} \right){\mathbf{1}}_{{\left\{ {0, \ldots ,2n - 1} \right\}}} \left( k \right),} \hfill & {{\text{arctangent}}} \hfill \\ {\xi^{2n + 1 - k} \sigma^{k} I\left( {\alpha ,\beta ,k} \right){\mathbf{1}}_{{\left\{ {0, \ldots ,2n + 1} \right\}}} \left( k \right),} \hfill & {{\text{other}}} \hfill \\ \end{array} } \right.,{\text{for}}\;0 \le n \le N.$$
(33)

The terms \(c_{k,n}\) from the triangular matrix \(\mathbf{C}\) depend on the transformation, and are given by

$$c_{k,n} = \left\{ {\begin{array}{*{20}l} {\frac{{\left( {2n - 1} \right)!!}}{{\left( {2n} \right)!!}}\frac{1}{2n + 1}\left( {\begin{array}{*{20}c} {2n + 1} \\ k \\ \end{array} } \right),} \hfill & {{\text{sine}},{\text{ cosine}}} \hfill \\ {\frac{{( - 1)^{n} }}{{\left( {2n + 1} \right)!}}\left( {\begin{array}{*{20}c} {2n + 1} \\ k \\ \end{array} } \right),} \hfill & {{\text{arcsine}}} \hfill \\ {\frac{{( - 1)^{n} }}{{\left( {2n} \right)!}}\left( {\begin{array}{*{20}c} {2n} \\ k \\ \end{array} } \right),} \hfill & {{\text{arccosine}}} \hfill \\ {\frac{{( - 1)^{n - 1} 2^{2n} \left( {2^{2n} - 1} \right)B_{2n} }}{{\left( {2n} \right)!}}\left( {\begin{array}{*{20}c} {2n - 1} \\ k \\ \end{array} } \right),} \hfill & {{\text{tangent}}} \hfill \\ {\frac{{( - 1)^{n} }}{2n + 1}\left( {\begin{array}{*{20}c} {2n + 1} \\ k \\ \end{array} } \right),} \hfill & {{\text{arctangent}}} \hfill \\ {\frac{{( - 1)^{n} \left( {2n} \right)!}}{{2^{2n} (n!)^{2} \left( {2n + 1} \right)}}\left( {\begin{array}{*{20}c} {2n + 1} \\ k \\ \end{array} } \right),} \hfill & {\text{hyperbolic sine}} \hfill \\ {\frac{1}{2n + 1}\left( {\begin{array}{*{20}c} {2n + 1} \\ k \\ \end{array} } \right),} \hfill & {\text{hyperbolic tangent}} \hfill \\ \end{array} } \right..$$
(34)

The elements of the matrix Ξ, Ξn,k, depend on the variance of the linear model. Therefore, the selection of the significant terms triggers the back-transformation of the predicted variable as the product of the constant matrix C and the matrix Ξ, which contains the linear regression values. It should be noticed that the matrix C is completely defined by the transformation and the number of terms selected from the Taylor series expansion and are independent of the data for which the model is developed.

Considering that the first and second moments from Eqs. 925 depend on the variance, Amarioarei et al (2020) used a factorial experiment to prove that relatively few number of terms are required to represent the expectation unbiased (i.e., less than 10). However, real applications revealed that more than 10 terms could be needed for unbiased estimates, likely 20 or even 30. However, the large number of terms could be associated with the implementation, as computational algorithms were proven to play a significant role in the results (Seppelt and Richter 2005; Paun et al. 2020).

Stand-level models for height of dominant Norway spruce (Picea abies L.)

The main economic species in the Carpathian Mountains is Norway spruce (Tudoran and Zotta 2020), which triggered significant efforts in modeling height of dominant trees. A polymorphic model with six parameters for height was developed by Giurgiu and Draghiciu (2004), based on the field measurements published in 1957 (Popescu-Zeletin 1957). The polymorphic model is enforced by the Romanian regulations, and consequently with significant impact on forest management, is

$${\text{height}}_{{{\text{dominant}}}} = a_{0} \times {\text{SI}} \times \left( {t_{{\text{Lorey height}}} + t_{{\text{Lorey height}}}^{{a_{1} }} \times e^{{a_{2 \times } t_{{\text{Lorey height}}} }} } \right),$$
(35)

where heightdominant is the height of dominant trees, tLorey height is transformed Lorey’s height modeled with the expression \(e^{{b_{0} \left( {{\text{age}}^{{b_{1} }} - 100^{{b_{1} }} } \right)}}\), a0, a1, a2, b0, b1, b2 are parameters determined by species, SI is site index.

The models proposed by Giurgiu and Draghiciu (2004) are not justifiable, as they identify a regression with a causal relationships, which was proven to be false (Neter et al. 1996; Duursma and Robinson 2003). The biased models of Giurgiu and Draghiciu were replaced in 2020 by Amarioarei et al (2020), who developed parsimonious unbiased models. Using the same data as Amarioarei et al (2020), we improved the existing models not only in terms of estimators but also by supplying estimates for variance. Evermore, we have expanded the existing models by including the most productive sites, the ones labeled class I, which were not included in the original study of Amarioarei et al. Therefore, we proposed a new set of polymorphic equations for the height of dominant and codominant trees based on the hyperbolic tangent of the ratio between height and the preset site index, SI:

$$f\left( {{\text{height}}} \right) = \tanh \left( {\frac{{{\text{height}}}}{{{\text{SI}}}}} \right) = \frac{{\exp \left( {\frac{{{\text{height}}}}{{{\text{SI}}}}} \right) - \exp \left( { - \frac{{{\text{height}}}}{{{\text{SI}}}}} \right)}}{{\exp \left( {\frac{{{\text{height}}}}{{{\text{SI}}}}} \right) + \exp \left( { - \frac{{{\text{height}}}}{{{\text{SI}}}}} \right)}},$$
(36)

where SI is height at age 100 years, which is 36.9 m for class I, 31.8 m for class II, 26.9 m for class III, and 21. 9 m for class IV.

Amarioarei et al. found that that the reciprocal of \(\sqrt {{\text{age}}}\) is linearly related to \(\tanh \left( {\frac{{{\text{height}}}}{{{\text{SI}}}}} \right)\). Therefore, the height model is:

$$f\left( {{\text{height}}} \right) = b_{0} + b_{1} \times \frac{1}{{\sqrt {{\text{age}}} }} + e,$$
(37)

where \(e\sim N\left( {0,\sigma_{{{\text{height}}}}^{2} } \right)\).

The number of terms needed to avoid numerical bias was identified by considering 5, 10, 20 and 30 terms in the Taylor series expansion. To assess the parsimonious model of height yield represented by Eq. 37, we compared it with the models currently used by the Romanian Forest Administration (Giurgiu and Draghiciu 2004), namely Eq. 35. To prove that the models obtained with the proposed study are not only unbiased but also efficient, we have supplemented the parametric estimation with the Bayesian approach proposed by Stow et al. (2006) to correct the retransformation bias. The Bayesian approach to nonlinear modeling is not new and was advocated by many studies as an alternative to the parametric estimates (van Oijen 2017; Golivets et al. 2019; Kansanen et al. 2019). We used R (Gentleman and Ihaka 2014) implementation of Stan (Stan Development Team 2016a) for Bayesian inference, namely the rstan package (Stan Development Team 2016b). The Bayesian estimates produced by rstan, which is based on Markov Chain Monte Carlo, require a series of parameters, out of which the number of iterations is the most important. We followed the recommendation of the Stan Development Team (2016b), which suggested four chains. The lack of knowledge on data distribution, recommended the usage of a non-informative prior distribution in the computations, as suggested by Stow et al. (2006).

Stem taper models for loblolly pine (Pinus taeda L.) in east Louisiana

Loblolly pine (Pinus taeda L.), which is one of the most important commercial tree species in the USA, was the subject of many stem taper models (Max and Burkhart 1976; Cao et al. 1980; Fang et al. 2000; McClure and Czaplewski 2011; Fang and Strimbu 2017; Nicoletti et al. 2020). The majority of the stem taper models have at least two parameters, an exception being the model developed by Lenhart et al. (1987) who has one parameter. Amarioarei et al (2020) proposed a trigonometric model for describing the stem taper of loblolly pine using the data from Fang and Strimbu (2017). The dataset contains 18 trees from the Vernon Parish, Louisiana, with diameters measuring every meter along the stem, plus the diameter at 1.3 m, breast height (dbh). We found that the cosine of the ratio between diameter, d, and double dbh (i.e., \(\cos \left( {\frac{d}{{2 \times {\text{dbh}}}}} \right)\)) is linearly related with the ratio of total height and square root of dbh and the product of the logarithmized total height [i.e., ln(Total height)] and the square root of relative height (i.e., height of diameter d/total height):

$$f\left( d \right) = \cos \left( {\frac{d}{{2 \times {\text{dbh}}}}} \right) = b_{0} + b_{1} \times \ln \left( {\text{Total height}} \right) \times \sqrt {\frac{{{\text{height}}}}{{\text{Total height}}}} + b_{2} \times \frac{{\text{Total height}}}{{\sqrt {{\text{dbh}}} }} + e,$$
(38)

where \(e\sim N\left( {0,\sigma_{d}^{2} } \right)\).

Considering that the transformation of the predicted variable is cosine, the unbiased diameter along the stem was computed with Eq. 11, and the unbiased variance as the difference between Eqs. 12 and 11. We identified the number of terms needed to avoid numerical biased by considering 5, 10, 20, and 30 terms in the Taylor series expansion.

The model 02 of Kozak (2004) was demonstrated by Fang and Strimbu (2017) to be the most suitable to the 18 trees used in the present study. Therefore, we compared the parsimonious model obtained with Eq. 38 with Kozak02 model. To ensure consistency of the results, we determined the Bayesian estimates of the model presented by Eq. 38, using Stan’s probabilistic language (Stan Development Team 2016a) as implemented in the R programming language (Gentleman and Ihaka 2014). Similarly to the site index models, we have used four Markov chains and a non-informative prior distribution in computations. Stem taper models are subject to autocorrelation (Lindstrom and Bates 1990), which violates the independence assumption (Valentine and Gregoire 2001). However, when the residuals are white noise, then the models are considered complete (Brockwell and Davis 1996). Therefore, we tested autocorrelation with the Durbin–Watson test, as implemented in SAS 9.4 (Ansley et al. 1992). To ensure that the models developed are efficient, we have used Bartlett’s test to assess homoskedasticity.

Models assessment

The key requirement of the proposed nonlinear modeling framework mandatory represented by Eqs. 925 is the normal distribution of the residuals produced from the linear relationship between the transformed variable of interest and the independent variables. To ensure that the normality condition is fulfilled, we have used the Shapiro–Wilk test, as implemented in the base R programming language. Ensuing normal distribution of the residual, the performance of the three models were assessed using four metrics: pseudo-R2, bias, mean absolute error (MAE), and root mean square error (RMSE), similarly to Bilskie and Hagen (2013), Montealegre et al. (2015), and Stängle et al. (2017):

$${\text{bias}} = \mathop \sum \limits_{i = 1}^{n} \frac{{\hat{y}_{i} - y_{i} }}{n}$$
(39)
$${\text{MAE}} = \mathop \sum \limits_{i = n}^{n} \frac{{\left| {\hat{y}_{i} - y_{i} } \right|}}{n}$$
(40)
$${\text{RMSE}} = \mathop \sum \limits_{i = n}^{n} \frac{{\left( {\hat{y}_{i} - y_{i} } \right)^{2} }}{n},$$
(41)

where \(\hat{y}_{i} , y_{i} , \overline{y}_{i}\) is the predicted, measured, or mean value at moment i (i.e., age, height, day), n is the number of observations.

All the computations were executed in R programming language (Gentleman and Ihaka 2014). We considered that an efficient Taylor series expansion is a series for which MAE < 1%, similarly to Amarioarei et al. (2020).

Results

Foundation

The factorial experiment focused on Taylor series expansion showed that an efficient Taylor series could have < 10 terms, regardless of the variance. The number of terms depends on the selection approach, with the preset value supplying more terms than the scree plot. However, the two approaches agreed, when the number of terms is chosen first time when the preset value is reached by the difference between the expectations computed with consecutive number of terms (i.e., one term apart), except for the hyperbolic tangent case (i.e., two terms apart). When the preset value was surpassed at least two times, the scree plot and the preset value approaches could differ by six terms (Amarioarei et al. 2020). Considering that the expectation of the first-order moment depends on the variance, the usage of a fewer terms should be avoided, as it can lead to biased results. To ensure computational unbiasedness, we considered that the number of terms should be selected according to the preset value, the case when it was met at least two times.

The second-order moment, and consequently the variance, depend on the same three estimates as the first-order moment, namely ξ, σ, and I(α, β, n). Evermore, the formula for the first- and second-order moments is similar, the main difference being in the power of ξ or σ (Eqs. 725). Therefore, not surprisingly, the number of terms needed for unbiased estimates of the variance is also in the same range with the first-order moment, with the observation that usually an extra term is present. Therefore, a Taylor series expansion with 10 terms will likely ensure the lack of computational bias for the expectations, the predicted values, and the variance. Nevertheless, in complex applications, with larger variance there is the possibility that a larger number of terms would be needed in the Taylor series expansion.

Models of dominant height of Romanian Norway spruce

According to Eq. 24, the height of dominant and codominant trees for the Romanian Norway spruce developed from Eq. 37 is

$$\widehat{{{\text{height}}}} = \mathop \sum \limits_{k = 0}^{2N + 1} \mathop \sum \limits_{n = 0}^{N} \frac{1}{2n + 1}\left( {\begin{array}{*{20}c} {2n + 1} \\ k \\ \end{array} } \right)\left( {b_{0} - \frac{{b_{1} }}{{\sqrt {{\text{age}}} }}} \right)^{2n + 1 - k} \sigma^{k} I\left( {\alpha ,\beta ,k} \right){\mathbf{1}}_{{\left\{ {0, \ldots ,2n + 1} \right\}}} \left( k \right).$$
(42)

The coefficients b0 and b1 vary with site index (Table 1), which suggests that the height of dominant and codominant Norway spruce should be modeled with polymorphic equations.

Table 1 Coefficient of the height of dominant and codominant tree height according to Eq. 42

All the proposed models are significant (p value < 0.001) and have coefficient of determination > 0.9, which support their suitability in modeling the total height of dominant and codominant Norway spruce from Romania (Fig. 2). The validity of the models is supported by the Shapiro–Wilk test, which suggests normality of the residuals (p value > 0.25); consequently, the appropriateness of Eq. 37 in modeling tree height. As expected, the correlation between the predicted and original data increases with the number of terms (Table 2), but minutely (i.e., < 1% from 10 to 30 terms). The Bayesian model based on Eq. 37 is almost identical with the one from Eq. 42, therefore the assessment metrics were similar (Table 2); with bias, MAE and RMSE being almost undetectable larger than the back-transformed values. The model currently in used in Romania (Giurgiu and Draghiciu 2004) had not surprisingly all six variables significant (p value < 0.001). The correlation coefficient between the predicted and original data is superior to the one supplied by Eq. 42 (Table 2), but the difference is < 1% for all site indices. Regardless of the site index, the bias, MAE, and RMSE were the smallest for Giurgiu and Draghiciu (2004) models (Table 2), but the differences were miniscule (< 1%). If the values predicted by the Giurgiu and Draghiciu (2004) models were infinitesimal superior to the Bayesian and the proposed methods, the situation is completely reversed when the focus is on variance. Irrespective of the assessment metric, the Bayesian and parsimonious models always exhibit a smaller variance than Giurgiu and Draghiciu (2004) models (Table 2). For bias, the variance of parsimonious models is three times smaller than the one supplied by the Bayesian or Giurgiu and Draghiciu (2004) models, sometimes almost one order of magnitude (e.g., 95% vs −9.7 for SI = 26.9 m). The same conclusion is reached for MAE and RMSE, with the parsimonious models consistently providing smaller variance, sometimes almost three times smaller (i.e., 95% vs 29% for the MAE of SI = 26.9 m). Considering that the Giurgiu and Draghiciu (2004) models are less parsimonious than Eq. 42, exhibits significantly larger variances of the three metrics (inferior precision), while revealing an infinitesimal superior accuracy (i.e., < 1%), it can be inferred that they are no longer justified in applications.

Fig. 2
figure 2

Model for the dominant height of Norway spruce in the Carpathian Mountains: a class IV (SI = 21.9 m), b class III (SI = 26.9 m), c class II (SI = 31.8 m), and d class I (SI = 36.9 m)

Table 2 Assessment of the dominant/codominant tree height models for the Norway spruce in the Carpathian Mountains. The abbreviations are: Giurgiu Giurgiu–Draghiciu, Pars. Parsimonious, Var variance

Stem taper models for loblolly pine

Equation 38 that modeled the stem taper model for loblolly pine is

$$\begin{aligned} & \frac{{\hat{d}}}{{2{\text{dbh}}}} = \mathop \sum \limits_{k = 0}^{2N + 1} \mathop \sum \limits_{n = 0}^{N} \frac{1}{2n + 1}\left( {\begin{array}{*{20}c} {2n + 1} \\ k \\ \end{array} } \right) \\ & \quad \left( {0.783 + 0.0318 \times \log \left( {\text{Total height}} \right) \times \sqrt {\frac{h}{{\text{Total height}}}} + 0.01867 \times \frac{{\text{Total height}}}{{\sqrt {{\text{dbh}}} }}} \right)^{2n + 1 - k} \\ & \quad \sigma^{k} I\left( {\alpha ,\beta ,k} \right){\mathbf{1}}_{{\left\{ {0, \ldots ,2n + 1} \right\}}} \left( k \right), \\ \end{aligned}$$
(43)

where \(\hat{d}\) is the predicted diameter measured at height h.

Equation 38 has both terms significant (p value < 0.0001) and supplied a correlation coefficient between the predicted and the original data of −0.96. The normality test did not suggest a lack of normal distribution of the residuals (p value = 0.25), which provides evidence that the parsimonious model is appropriate for stem taper modeling. The coefficient of correlation between the predicted and original data is virtually the same when the Taylor series expansion contains 10 or 30 terms, namely 0.96 (Table 3). Mirroring the height modeling, the Bayesian estimates were similar to the parsimonious model only in terms of bias but not in terms of variability, which was always two times larger (Table 3).

Table 3 Comparison of the stem taper models for loblolly pine from east Louisiana

The nine parameters model of from Kozak (2004) was the most appropriate to describe the stem taper (Eq. 44), with a correlation coefficient between the predicted and original data of more than 98%. However, even that the model was significant as a whole (p value < 0.0001), some of the variables were deemed insignificant, such as b2, the coefficient of the inverse of the slenderness coefficient, or b3, the coefficient of X0.1 (p values > 0.26):

$$\begin{aligned} & \frac{{\hat{d}}}{{2{\text{dbh}}}} = 17.7256{\text{dbh}}^{0.573} \left( {\text{Total height}} \right)^{ - 1.832} X^{{g\left( {{\text{dbh}}, h,{\text{Total height}},X, Q} \right)}} \\ & \quad {\text{where}} g\left( {({\text{dbh}}, h,{\text{Total height}},X, Q} \right) \\ & \quad = - 0.394\left( {\frac{h}{{\text{Total height}}}} \right)^{4} - 0.08e^{{ - \frac{{{\text{dbh}}}}{{\text{Total height}}}}} \\ & \quad \quad + 0.056X^{0.1} + \frac{2.7}{{{\text{dbh}}}} + 0.1{\text{Total height}}^{Q} - 0.564X \\ \end{aligned}$$
(44)
$$X = \frac{{\left( {1 - \sqrt[3]{{\frac{h}{{\text{Total height}}}}}} \right)}}{{\left( {1 - \sqrt[3]{{\frac{1.3}{{\text{Total height}}}}}} \right)}}\; {\text{and}}\; Q = 1 - \sqrt[3]{{\frac{h}{{\text{Total height}}}}}.$$

The elimination of the insignificant variables did not alter the excellent performance of the Kozak model 02, which had a correlation coefficient between the predicted and original data of 98% (Table 3). Among the considered stem taper models, the Kozak model 02 is the most suitable to describe the variation of diameter along the stem based on all the first-order moments, as the bias, MAE and RMSE were smaller than the parsimonious or Bayes models (Table 3). Nevertheless, when the second-order moment was considered, the Kozak 02 model exhibited larger values for all three measures (i.e., bias, MAE, and RMSE), but the difference was minute (Table 3). Both models, parsimonious and Kozak, exhibited no significant autocorrelation, as the Durbin–Watson test supplied a p value of > 0.05. Similar conclusion was reached for homoskedasticity, as Bartlett’s test indicates that variance does not change with height (p value > 0.1).

The parsimonious models with more than 20 terms in the Taylor expansion, the Bayesian model, and the Kozak 02 model are almost identical on the lower half of the stem (Fig. 3), the differences occurring on the upper portion, which inside the crown. Considering that the smallest volume is located inside the crown section of the stem, the differences between the models are operationally insignificant. The superior performance of the Kozak 02 model in respect to the first-order moments is not mirrored by the second-order moments, which place emphasis on the parsimony of the model rather than on the performances. Therefore, the less parsimonious model (i.e., Kozak 02) is more accurate but less precise than the more parsimonious model (Eq. 44)

Fig. 3
figure 3

The stem taper models for planted loblolly pine

Discussion

The parsimonious framework that we are proposing in this study completes the work of Strimbu et al (2017) and Amarioarei et al (2020) by presenting not only the first-order expectations but also the second order. Our models expand the findings of Neyman and Scott (1960) by providing an approach for solving nonlinear models that is fast and simple. Our parsimonious approach is suitable for modeling complex nonlinear relationships, as a Taylor series expansion with 20 terms will provide numerically unbiased results. The successive application of transformation is particularly suitable for functions that do not include Taylor series, such as the hyperbolic arcsine. The improvement carried out by Nelder and Wedderburn (1972) on the work of Neyman and Scott (1960) simplified the distribution of residuals such that any distribution from the exponential family can be used. However, the generalization from normal to an exponential distribution proposed by Nelder and Wedderburn (1972) is based on fusing two distributions: of the residuals and of the predicted variable. The merging of residuals, which are a measure of the lack of knowledge, with a variable, precludes the applicability of generalized linear models to any function (Fox 2008). Consequently, many popular functions, such as trigonometric functions, cannot be used within the Nelder and Wedderburn (1972) framework because their expectation is difficult to compute. Our study restricts the distribution of residuals to normal distribution but allows inclusion of all the standard trigonometric functions in modeling. One of the main attractions of the proposed parsimonious approach is the ability to compound functions, such that exponential or power functions can be combined with trigonometric functions without biasing the results. Consequently, the linear equation (30) is arguably the simplest parsimonious method of solving nonlinear models. However, the requirement that the variance of the residuals must have small values in respect to the predicted variable indicates that the proposed parsimonious modeling framework is highly dependent on data fitting.

The models for height of dominant and codominant Norway spruce and for the stem taper developed with the parsimonious framework presented in this study are accurate as the existing non-parsimonious models but more precise. The accuracy varies with the predicted variable, as the stem taper model is less accurate than the parsimonious model, whereas for the height the situation is reversed. Nevertheless, the differences are minute, which points toward the utility of the new framework. Even that the precision of the stem taper model seems to be superior to the parsimonious models, the variability of the non-parsimonious estimated RMSE suggests the opposite. A similar conclusion holds for height modeling, with the difference that the variability of the estimated assessment measures is significantly larger for the non-parsimonious models. Irrespective of the model, the non-parsimonious and the parsimonious models converge, but a large number of terms is needed in the Taylor series expansion, at least 20 (Figs. 2 and 3). The lack of parsimony is one aspect of the modeling process, another being the meaning of the variables predicting the variable of interest. From the meaning perspective, the parsimonious models include terms that can be interpreted [e.g., relative height (i.e., h/total height) or slenderness coefficient (i.e., total height/dbh)], whereas the non-parsimonious models contain terms like \(\sqrt[{10}]{{\frac{{1 - \sqrt[3]{{{\text{relative}}\;{\text{height}}}}}}{{1 - \sqrt[3]{{\frac{1.3}{{{\text{Total}}\;{\text{height}}}}}}}}}}\), which are not only difficult to infer or relate to tree processes, even that explanation for causality are argued to exist (LeMay 2018), but they are usually present as a part of a series (i.e., the 30th root of a variable). Considering that the differences between the models are minute in terms of linear moments (i.e., accuracy), whereas the second-order moment is constantly smaller for parsimonious models, the model selection should be placed on parsimony. As the natural alternative to heuristic estimations of the nonlinear models, the Bayesian approach provided results inferior to the parsimonious models for all assessment metrics and moment order (i.e., linear or quadratic), which suggests that more research is needed in the Bayesian area to reach levels similar to the parametric approaches. One of the main challenges faced by the Bayesian approach is the reliance on computational resources, which for large datasets and complex nonlinear models could render solutions in unfeasible amount of time.

Our study provides a parsimonious method of solving complex nonlinear problems by enforcing the normality distribution of the residuals. The examples are used to show the performances of the proposed framework points toward superior results not only in terms of assessment metrics but also in terms of interpretability, particularly when the transformed linear model has a coefficient of determination larger than 0.95, recommended 0.99. The main limitation of the study rests in its normality assumption, as the formula are valid only when the residuals are normally distributed. Therefore, evidence of normality is required; we found that a p value > 0.25 is recommended. Consequently, further studies are needed to expand the results to other distribution, such as the exponential family of distribution as defined by Nelder and Wedderburn (1972).

Conclusions

The general linear regression framework developed by Galton (1877) was enhanced for more than one century but was not suitable for complex nonlinear models. The main issue faced by the nonlinear applications was the bias introduced by transforming the predicted variable. Our study builds on the work of Strimbu et al. (2017) and Amarioarei et al. (2020), and computes the bias corrections for the first and second order moments for some of the most popular transformations (i.e., power, trigonometric, and hyperbolic). Strimbu et al. (2017) proved that the estimated values are unbiased and complex nonlinear models can be obtained by compounding various transformations. Because the unbiased estimates for eight functions (i.e., sine, cosine, tangent, arcsine, arccosine, arctangent, hyperbolic sine, and hyperbolic tangent) contain a Taylor series expansion, we have shown that in most situations less than 10 terms are required for results without bias.

We have proved that the parsimonious framework can be applied successfully in forestry, by modeling the height for dominant and codominant Norway Spruce from Romania and stem taper of loblolly pine from east Louisiana. We compare our results with the current models used for stem taper and height, both non-parsimonious. The parsimonious models contain two terms for height, compared with Giurgiu and Draghiciu (2004), which contains six terms, and three terms for stem taper, compared with nine of Kozak model 02 (2004). The parsimonious models have similar first-order expectation for the three-assessment metrics (i.e., bias, MAE, and RMSE) but smaller second-order moment, sometime close to one order of magnitude. Therefore, the parsimonious models, by relaying on the normality assumption, gain in precision without a significant loss, if any, of accuracy. The Bayesian approach to the parsimonious nonlinear models exhibits similar first-order moments for the assessment metrics, but the second-order moments were significantly larger. The attractiveness of Bayesian solution is challenged by the size of the data and the complexity of the model, as it is computationally intensive. The subsequent studies would focus on generalizing the normal distribution for which the formulas are now available by computing the same estimates, but for the exponential family of distribution.