Introduction

White birch (Betula platyphylla Sukaczev) is a valuable hardwood species of China and listed among the top ten species of the country. It covers an area of 10.38 × 106(M) hectares, with a total volume of 1033 M m3 (Xu et al. 2019). It has been registered as one of China’s science and technology research tree species. Stands of white birch are widely dispersed across the north and southwest. The largest area of this species with the largest stand volume lies in northeast China. This region contains abundant forest resources and is one of the most important areas of timber production. As per the latest National Forest Inventory (NFI-8), the northeast region has an area of 32.71 M ha under forest with 28,180 M m3 standing volume (Zeng et al. 2015). The Daxingʹan Mountains are located in Heilongjiang province and in the eastern part of Inner Mongolia, and white birch is the second major tree species in this area, the major species is Dahurian larch (Larix gmelinii Rupr.). The standing volume of white birch is distributed over 1.9 M ha and is 128 M m3, which is 25% of the total standing volume of tree species of the region (Samuelsson 2006).

White birch possesses excellent properties in terms of growth rate, adaptability, wood texture, structure, and durability. Additionally, it is the primary source for pulp in papermaking, for plywood and for furniture (Xu et al. 2016). Its commercial and economic uses extend globally, for example, Chinese white birch plywood is the principal item of wood imports into the USA. The wood contains exceptional physical characteristics that allow for natural and dyed finishes, cabinetry, and lamination (Far-East-American 2012). With the importance of white birch, a stem taper equation would be beneficial for sustainable management of the species. Stem taper functions have been recommended as an effective tool based on several studies to predict stem diameter (d) at any height as well as to estimate merchantable and total volumes (Rojo et al. 2005; Trincado and Burkhart 2006; Li and Weiskittel 2010; Özcelik et al. 2011). The majority of such functions require total tree height (H), diameter at breast height (D), and height (h) of d above the ground as independent variables (Berhe and Arnoldsson 2008; Hjelm 2013). Stem taper functions supersede the customary stem volume models as they estimate d, merchantable height to any diameter above the ground, the volume of a log of any length and at any height from the ground, in addition to merchantable and total stem volume (Kozak 2004).

Stem taper equations may be divided into three categories: simple polynomial, segmented polynomial, and variable-exponent equations. Simple taper equations describe the changes from base to crown tip in diameter by a single function (Behre 1923; Matte 1949; Osumi 1959; Kozak et al. 1969; Demaerschalk 1972). Simple fitting and easy integration to calculate volume are characteristic features of these functions. Such functions adequately account for the middle stem portion but are significantly biased in predicting diameter for upper and lower sections (Max and Burkhart 1976; Demaerschalk and Kozak 1977; Kozak 1988).

This inconsistency of simple taper functions was resolved by Max and Burkhart (1976) by introducing the first segmented polynomial model. Three parts of the stem, i.e., top, middle and bottom, are represented by different sub-functions (Kozak 1988; Rojo et al. 2005), assuming the top as a cone frustum, middle as a paraboloid frustum and the bottom as a neiloid frustum (Corral-Rivas et al. 2007; Li and Weiskittel 2010; Burkhart et al. 2019). Many researchers have successfully used this approach, e.g., Max and Burkhart (1976), Demaerschalk and Kozak (1977) and Fang et al. (2000). These models satisfactorily predicted the diameters at most parts of the trunk.

The third category i.e., variable exponent functions, was introduced by Kozak (1988) who defined the neiloid, paraboloid, and conic forms of the bole by a changing exponent from ground to top. As indicated by the term, the functions of this category are founded on the logic that variation in stem form is continuous from bottom to top (Lee et al. 2003). The most frequently used stem taper equations, allowing for the limitation of simple functions, are segmented polynomial and variable exponent (Berhe and Arnoldsson 2008; Li et al. 2012; Gómez-García et al. 2013).

In China, there have been numerous studies dealing with stem taper equations fitted for important tree species such as Larix gmelinii (Rupr.) Rupr., Cunninghamia lanceolata (Lamb.) Hook., Castanopsis hystrix Miq. (C. hystrix), Erythrophleum fordii Oliv., Tectona grandis L. f., Quercus variabilis Blume (Jiang and Liu 2011; Pang et al. 2016; Tang et al. 2016; Zheng et al. 2017). Different studies for Betula alnoides Buch.-Ham. ex D.Don. in south China and Betula species in Canada and Europe have also been carried out (Gál and Bella 1994; Zianis et al. 2005; Tang et al. 2017). However, as the stem taper function is always species-specific (Sharma and Zhang 2004; Subedi et al. 2011), a specific stem taper function has not been developed for white birch in northeast China. This study was carried out to evaluate the performance of three widely used taper functions and to select the best for stem diameter prediction of B. platyphylla.

The specific objectives were to develop a stem taper equation that delivers an appropriate description of the stem profile of white birch in 10 different sub-regions of the Daxing’an Mountains and to estimate dissimilarities in region-specific taper functions. Two main difficulties allied with the formation of taper functions are multicollinearity and auto-correlated errors. Viable statistical assumptions were used to address these issues.

Materials and methods

Study area and data description

The research area is located in the Daxing’an Mountains in Heilongjiang province and in the eastern part of Inner Mongolia (121° 12′ E to 127° 00′ E and 50° 10′ N to 53° 33′ N). This region is approximately 84,600 ha and is one of the main areas of high quality wood production in China. The major forest types are natural secondary forests with a high density of white birch. The topography is mountainous with diverse ecological conditions and distinct floristic composition. Elevation varies from 700 to 1000 m a.s.l. The prevailing climate is continental with summer monsoons and a long, dry severe winter. Average annual precipitation is 360 − 550 mm, of which 80% occurs in summer, and mean annual temperatures range from − 1.2  to − 5.6 °C. The typical soil type is brown coniferous forest soil (Burger and Shidong 1988).

Ten sub-regions (Songling, Jiagedaqi, Xinlin, Tahe, Huzhong, Shibazhan, Hanjiayuan, Xilinji, Tuqiang, and Amuer) were selected from the three main regions of the Daxing’an Mountains (Zhang et al. 1992). Data from the sub-regions were processed to assess the statistical properties of selected taper functions.

A total of 1344 trees were studied, with the data grouped into sub-regions to examine the differences in stem taper. The sample satisfactorily represented the distribution of trees with regards to diameter and height classes. Diameter at breast height (D, cm) to the nearest 0.1 cm was recorded Trees were felled to measure total height, diameter at the ground and at 2%, 4%, 6%, 8%, 10%, 15%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, and 90% of total height. Measurement intervals fluctuated from 14 to 2.4 m depending upon the total height. Extra measurements were taken for the lower bole to improve the accuracy of prediction. Summary statistics for tree diameter and total height for each sub-region are shown in Table 1.

Table 1 Descriptive statistics for sample trees of 10 sub-regions in the Daxingan Mountains

Functions selected for comparison

Three taper equations from segmented and variable exponent taper equations were selected and included the Max and Burkhart (1976), Fang et al. (2000) and Kozak (2004) models, later written as MB-76, Fang-2000 and Kozak-2. The choice of these models was made on the basis of previous studies (Rojo et al. 2005; Dieguez-Aranda et al. 2006; Barrio Anta et al. 2007; Corral-Rivas et al. 2007; Li and Weiskittel 2010; Özcelik and Crecente-Campo 2016). Mathematical expressions corresponding to the models are presented in Table 2.

Table 2 Analyzed taper functions

The segmented model of Fang-2000 was considered as the best model for Eucalyptus plantations in Turkey and oak species in Mexico (Pompa-García et al. 2009; Özcelik and Göceri 2015). This model was also ratified for different species in Spain (Crecente-Campo et al. 2009; Sevillano-Marco et al. 2009). Similarly, good results were provided by the Kozak-2 model of 11 conifer species in North America, lodgepole pine (Pinus contorta Douglas) and Siberian larch (Larix sibirica Ledeb.) in Iceland and white cedar (Cupressus lusitanica Mill.) plantations in Ethiopia (Berhe and Arnoldsson 2008; Heidarsson and Pukkala 2011; Li et al. 2012). Additionally, the model was used as a base model in a study of Scots pine (Pinus sylvestris L.) and Sitka spruce (Picea sitchensis (Bong.) Carr.) in northern Britain (Fonweban et al. 2011). The segmented function MB-76 produced excellent results for loblolly pine (Pinus taeda L.) in East Texas, for Japanese larch (Larix kaempferi (Lamb.) Carr.) in South Korea, and for Brutian pine (Pinus brutia Ten.), Cedar of Lebanon (Cedrus libani A. Rich.), and Cilicia fir (Abies cilicica Carr.) in Turkey (Coble and Hilpp 2006; Brooks et al. 2008; Doyog et al. 2017).

Model fitting

The model parameters were estimated by the generalized non-linear least-squares methods with the MODEL procedure of SAS (SAS Institute Inc. 2008). Fitting of the stem taper equation is accompanied by various statistical problems, including lack of independence of errors. Two additional issues of importance are multicollinearity and autocorrelation (Kozak 1997). Multicollinearity is the presence of high inter-correlations among predictor variables during the analysis of multiple regression. Dealing with excessively complicated models containing several polynomials and cross-product terms is a key reason for multicollinearity.

Autocorrelation is the occurrence of spatial correlations among the observations from the same tree that negates the assumption of independent error terms. Although least-squares estimates of regression coefficients are consistent and unbiased in conjunction with multicollinearity and autocorrelation, their efficiency is affected (Myers 1990). For this reason, suitable statistical measures need to be followed for model fitting to avoid autocorrelated errors and to reduce multicollinearity whenever possible (Kozak 1997).

The extent of multicollinearity in the models was assessed by condition number (CN), which is the square root of the quotient between maximum and minimum eigenvalues of the correlation matrix. As suggested by Myers (1990), a condition number greater than 10000.5 denotes the presence of multicollinearity- related problems. As another benchmark set by Belsley (1991), there should be no concern about multicollinearity, provided the CN ranges from 5–10. Multicollinearity-associated problems are formed if CN values are from 30–100 and the figure of CN from 1000–3000 signifies a high degree of multicollinearity-related problems.

During regression analysis, error terms are presumed to be independent, evenly distributed, and normal random variables (Kozak 1997). As the database for constructing taper functions consists of multiple observations pertaining to each tree, i.e., hierarchical data, it is rational to expect autocorrelation within the observations. Such correlation contravenes the assumption of independence. A continuous autoregressive error structure, CAR (1), was established to model the error terms for the adjustment of innate autocorrelation in the data. This specified error structure allows the pragmatic use of a model for unbalanced and irregularly spaced data (Gregoire et al. 1995; Dieguez-Aranda et al. 2006), both of which are attributes of many datasets in forestry (West et al. 1984).

Model comparison

The accuracy of diameter estimates for each model was judged by graphical and numerical assessment of the residuals. Two goodness-of-fit statistics, i.e., coefficient of determination (R2) and root mean square error (RMSE), were tested. Some deficiencies are connected with employing R2 in nonlinear regression, but its general expediency overrules such limitations (Thomas 1997). However, it is not advisable to use R2 as the sole criterion while choosing the best model (Myers 1990). The use of RMSE is advantageous since its measurement units are the same as those of the dependent variable, and therefore displays the average error of a model. The notations for these statistics are as under:

$$R^{2} = 1 - \frac{{\mathop \sum \nolimits_{{{\text{i}} = 1}}^{{{\text{i}} = {\text{n}}}} \left( {y_{i} - \hat{y}_{i} } \right)^{2} }}{{\mathop \sum \nolimits_{{{\text{i}} = 1}}^{{{\text{i}} = {\text{n}}}} \left( {y_{i} - \overline{y}_{i} } \right)^{2} }}$$
(1)
$$R_{{\rm{MSE}}} = \sqrt {\frac{{\mathop \sum \nolimits_{{{\text{i}} = 1}}^{{{\text{i}} = {\text{n}}}} \left( {y_{i} - \hat{y}_{i} } \right)^{2} }}{n - p}}$$
(2)

where \(y_{\rm{i}} ,\hat{y}_{\rm{i}}\) and \(\overline{y}_{\rm{i}}\) are measured, predicted, and average values of the response variable; n is the total number of observations, p is the number of parameters, and RMSE is root mean square error.

Ordinary residuals indicate the excellence of fit but these residuals will not likely measure the quality of the predictions in the future (Myers 1990). Thus, it is mandatory to validate the model and only separate data will serve the purpose to some extent. As the chances to access validation data are limited, a number of methods have been suggested, (e.g., splitting the data for fitting and validation, double cross-validation), although it is rare to obtain any additional information beyond the statistics revealed from the models fitted to the entire dataset (Kozak and Kozak 2003). Therefore, in this study, it was decided to make decisions using the available data.

The box and whisker plots of d residuals against relative heights along the stem (5%, 15%, 25% and up to 95%) were also developed to assess the suitability of taper models. These graphs illustrate the domains where the functions deliver inadequate or acceptable predictions (Kozak and Smith 1993; Kozak 2004).

Mixed-effects modeling

Since the late twentieth century, the use of mixed-effects models has become popular in forest growth and yield modeling. Compared to the regression method, these models consist of fixed and random effects parameters to account for the between-tree and within-tree variations in the data (Fang and Bailey 2001; Garber and Maguire 2003; de-Miguel et al. 2013). Additionally, this technique enables the calibration of the taper equation for a specific site or tree, provided additional measurements are available.

After selecting the best taper model, several nonlinear mixed-effects models were developed using different combinations of random parameters. The combinations of these included only those that have an effect on the parameters of the best taper model. In this study, we used the NLMIXED procedure in SAS (SAS Institute Inc. 2008) to estimate the fixed and random parameters. Mixed models were compared using Akaike’s Information Criterion (Akaike 1974), Schwarz’s Information Criterion (Schwarz 1978), and twice the negative log-likelihood, –2Ln (L). To avoid over-parameterization and convergence problems, mixed models were fitted for different sub-regions.

Comparison of taper functions among regions

The non-linear extra sum of squares method was used to find whether different taper functions would be needed for different regions (Bates and Watts 1988). In this method, fitting of full and reduced models is required. It has repeatedly been implemented to judge the necessity of separate models for specified species or distinct geographical regions, e.g., Huang et al. 2000a, b; Zhang et al. 2002; Rivas et al. 2004. The full model constitutes a separate set of parameters for each region, while the reduced model involves the same set of parameters for all regions under consideration. The full model is attained by expanding all global parameters with a dummy variable and an associated parameter to distinguish the regions. The significance of the comparison between full and reduced models is based on the F-test of the formula:

$$F = \frac{{{{\left( {S_{{\rm{SER}}} - S_{{\rm{SEF}}} } \right)} \mathord{\left/ {\vphantom {{\left( {S_{{\rm{SER}}} - S_{{\rm{SEF}}} } \right)} {\left( {d_{{\rm{fR}}} - d_{{\rm{fF}}} } \right)}}} \right. \kern-\nulldelimiterspace} {\left( {d_{{\rm{fR}}} - d_{{\rm{fF}}} } \right)}}}}{{{{S_{{\rm{SEF}}} } \mathord{\left/ {\vphantom {{S_{{\rm{SEF}}} } {d_{{\rm{fF}}} }}} \right. \kern-\nulldelimiterspace} {d_{{\rm{fF}}} }}}}$$
(3)

where SSER, SSEF, dfR, and dfF are the error sum of squares and degrees of freedom for reduced and full models, respectively. The non-linear extra sum of squares follows an F-distribution.

Provided the F-test reveals no differences among taper equations for different regions, a simple composite model fitted with the combined data is desired. If the F-test results indicate otherwise, i.e., taper equations are not the same across regions (P < 0.05), more tests are needed to evaluate whether the differences are due to as few as two or as many as all of the regions. In our case, a full model for all possible sub-regions paired comparisons should be matched with the corresponding reduced model by the F-test. Taper functions for these sub-regions should be considered alike and combined in situations only when an insignificant F-value (P > 0.005 considering the Bonferroni’s correction) is obtained.

Results

Most of the parameters were significant at P < 0.05 (Table 3, 4, 5). The exceptions were: b6 for sub-regions 1 and 5; b2 for sub-region 7; b4, b6 for sub-regions 6 and 8; b4, b8 for sub-regions 2 and 3; b2, b7, b8 for sub-region 9; b6, b7, b8 for sub-region 10, and b4, b6, b7, b8 for sub-region 4 in the model of Kozak-2.

Table 3 Parameter estimates for MB-76 equation for each sub-region. Standard errors of the estimated parameters are shown in in bracket
Table 4 Parameter estimates for Fang-2000 equation for each sub-region
Table 5 Parameter estimates for Kozak-2 equation for each sub-region

The values of the coefficient of determination (R2) and root mean squared error (RMSE) for all models are shown in Table 6. Above 98% of the total variance of d was explained by the models in seven sub-regions except sub-regions 3, 6, and 8, where almost 98% of the total variance was indicated. The values of RMSE varied between 0.98 and 1.37 cm depending upon the sub-region. The model of Kozak-2 with the smallest RMSE was the most stable taper function for all sub-regions except sub-region 4, where it was replaced by the MB-76 model. As per statistics, the leading model was Kozak-2, closely followed by MB-76. Among the three contesting models, the variability displayed by the model of Fang-2000 was relatively higher. Multicollinearity was noted in the models as implied by the condition numbers. The minimum value of CN was 38 − 100 from the MB-76 model, followed by the values of 106 − 152 and 70 − 110 for the Kozak-2 and Fang-2000 models, respectively.

Table 6 Goodness-of-fit statistics and condition number of the taper models analyzed
Table 7 Parameter estimates for the MB-76 model with mixed effects for sub-regions 1–5 of study area
Table 8 Parameter estimates for the MB-76 model with mixed effects for sub-regions 6–10 of study area

The box and whisker plots of d residuals versus relative height classes showed that the error was small, and its distribution along the stem was almost the same among different taper functions (Appendix Fig. S1). Plots showed overall satisfactory performance of the models and did not indicate any distinct inconsistencies among the models.

According to the fit-statistics and box plots, the models of MB-76 and Kozak-2 were more accurate than the Fang-2000 model in predicting diameters. The MB-76 equation was, however, suggested for diameter estimates due to the lowest CN and the provision of an integral solution for volume calculation. The results of the F-test of the MB-76 equation indicated consistent significant differences in different regions (not shown).

The adaptation of the MB-76 model was refitted using nonlinear mixed-effects techniques. The estimated parameters and fit statistics of the best mixed model are listed in Tables 7 and 8.

Discussion

Ample material is available on manifold functions and procedures for the precise estimation of diameter at any position along the trunk (Crecente-Campo et al. 2009; Heidarsson and Pukkala 2011; Hjelm 2013; Özcelik and Göceri 2015; Lumbres et al. 2016; Özcelik and Crecente-Campo 2016; Tang et al. 2017). In this study, three stem taper functions from two groups, segmented and variable exponent, were fitted to estimate stem diameters of white birch with the best possible accuracy.

Prospective variations among the regions were examined by grouping the data into ten sub-regions for which all models were fitted independently (Table 6). As noted earlier, all models exhibited above 98% of the total variance of d accounted for in most of the sub-regions. Further examination of Table 6 reveals that the models of MB-76 and Kozak-2 accounted for total variability better than the model of Fang-2000, which furnished the highest RMSE values. As per goodness-of-fit statistics, models of Kozak-2 and MB-76 demonstrated the lowest errors in predicting stem diameter in all regions with the model of Kozak-2 slightly ahead of MB-76. Although the prediction of merchantable or total stem volumes is beyond the scope of this study, it is important to note that variable form taper functions, to which category Kozak-2 belongs, cannot be integrated analytically. To overcome this difficulty, integration of the taper equation is required to estimate merchantable height (Gómez-García et al. 2013). Alternatively, analytical compatibility on the part of MB-76 is a point worth noting.

The box and whisker plots provide the mean, maximum and minimum errors of prediction, median, and interquartile range (IQR) of diameter residuals by relative height classes (Appendix Fig. S1). The narrowness of IQR demonstrates that the precision of prediction is relatively high in all regions. No significant differences were displayed among the models in the plots of d residuals versus relative height classes. As a whole, all models presented higher standard errors of estimates at 0 − 10% and 65% − 85% relative heights, which may be attributed to the association of these specific parts of the stem with butt swell and the point of the base of live crown (Jiang et al. 2005). The models performed well for the sections nearest the ground. Accurate diameter estimation in this section is vital considering the commercial value of the basal log.

All models portrayed similar and homogenous d residual distributions in general, as the medians are largely distributed near zero. Close observation, however, shows that the MB-76 model slightly underestimates the relative heights of 35 − 55% of trees in sub-regions 1, 2, 3, 6, and 7 and overestimates at relative height 15 − 25%. The middle bole sections are marginally underestimated by Kozak-2 in sub-regions 1, 2, 5, and 6, but the extent of error is lower than the MB-76 function, as it is restricted to the 35 − 45% relative height class. The top bole section is underestimated by Kozak-2 in all sub-regions. Being the least valued part of the stem, this error does not generally influence the performance of the models. The Fang-2000 model underestimated relative height classes 35 − 45% in sub-regions 1, 2, 5, 6, and 7 and overestimated by 15 − 25% in sub-regions 3 and 4. Data on all these plots were too close to decide the superiority of a model.

All models in this study are affected by multicollinearity to some extent. It is worth noting however, that the condition number (CN) of the MB-76 model is lower than for the Kozak-2 and Fang-2000 models. The minor CN value of the MB-76 model was previously noted in a stem taper study in southern China where it was as low as 7.6 (Tang et al. 2017) but still higher than the Kozak-2, Fang-2000 or other similar models (Rojo et al. 2005; Dieguez-Aranda et al. 2006; Özcelik and Crecente-Campo 2016). The CN values are not in an acceptable range for the Kozak-2 and Fang-2000 models, as noted in earlier studies (Barrio Anta et al. 2007; Corral-Rivas et al. 2007). In terms of multicollinearity, the MB-76 model ranked best among the models in this analysis.

The fit statistics and graphic illustrations showed marginal differences between the MB-76 and Kozak-2 models. At the same time, they were more reliable than the Fang-2000 equation for the diameter estimates in all sub-regions. For the models of MB-76 and Kozak-2, the multicollinearity was significantly lower in the former. Kozak (1997) suggested that a model with less severe multicollinearity would be preferable. Although this analysis was limited to diameter predictions, the MB-76 equation can also provide estimates of total and merchantable volumes. Besides, this equation contains all significant parameters compared with the Kozak-2 equation. While evaluating the suitability of a model, some studies have considered the multicollinearity, compatibility, and significance of parameters, e.g., Sakici et al. 2008; Crecente-Campo et al. 2009; Özcelik et al. 2016; Tang et al. 2017. Therefore, the MB-76 model was selected.

To test the reliability of our results, previous studies were reviewed and the MB-76 model has been popular for a variety of species (Martin 1981; Brooks et al. 2002, 2008; Jiang et al. 2005; Teshome 2005; Coble and Hilpp 2006; Özcelik and Brooks 2012; Doyog et al. 2017). Of five equations, this model was most consistent in a study of 18 Appalachian hardwood species in predicting diameter, height, and volume (Martin 1981). As an extensively used model for the prediction of taper and volume, Coble and Hilpp (2006) recommended it for diameter and volume estimation of loblolly pine (Pinus taeda L.) in East Texas, USA. In another study in northwestern Spain, it was a successful model of 14 equations analyzed for accurate diameter estimation at all positions along the stem in Scots pine plantations (Dieguez-Aranda et al. 2006). A study by Brooks et al. (2008), on compatible stem taper and volume equations in Turkey, found it as a reliable model for various statistical measures and sectional performance.

In some studies, the MB-76 equation was not the highest-ranking model. In appraising 33 stem taper equations, de-Miguel et al. (2012) found the model to be the best amongst eight equations. However, with further analysis, the MB-76 model was less accurate in volume prediction. In a comparison of several stem taper models for Lebanon cedar (Cedrus libani A. Rich.) the model’s performance was poor in predicting diameter and other variables (Özcelik and Crecente-Campo 2016). Another comparative study of 31 taper functions for the Bornmullerian fir (Abies nordmanniana subsp. bornmulleriana Mattf.), the MB-76 equation was third among the segmented functions (Sakici et al. 2008). These studies do not fully corroborate our findings, as taper functions are species-specific and their accuracy depends upon the particular species and its stem form (Schröder et al. 2014).

Nonconformity of one equation for different sub-regions was expected, due to the distinct geographical and environmental conditions in the regions.The Daxing’an Mountains extending from the northeast to southwest has different topography on eastern and western slopes; east slopes are steep and west slopes are relatively smooth (Zhang et al. 1992). There is considerable variation in temperatures and precipitation. Southern regions have higher temperatures and more precipitation, while northern areas have lower temperatures and less precipitation. Forest fires are another important ecological factor affecting natural regeneration of the birch forest. Density is increased with increasing fire intensity (Shi et al. 2010). The intensity of forest fires is also influenced by variations in topography and meteorological conditions in this region (Fan et al. 2017). Such contrasting geo-climatic factors are reflected in soil development. Lower elevations of the Daxing’an range have dark brown forest soils with high organic matter (16 − 20%). Towards the south of the range, the brown forest soils contain less organic matter (9%). In drier and colder areas, chestnut soils are also present (Burger and Shidong 1988). Therefore, a combination of biogeoclimatic conditions determines the variations in tree taper among the sub-regions. Zhang et al. (2002) and Özcelik et al. (2016) reported similar findings for jack pine (Pinus banksiana Lamb.), and brutian and black pines (Pinus brutia Ten. and Pinus nigra Arnold.) in Ontario and southern Turkey, respectively.

Similarly, some researchers have specifically investigated the effect of climate and other local factors on stem taper. Climate-induced changes in stem form were recorded for Korean red pine (Pinus densiflora Siebold & Zucc.) in Korea (Lee et al. 2006), for lodgepole pine (Pinus contorta Douglas) in British Columbia (Nigh and Smith 2012), and for several North American tree species (Schneider et al. 2018). Li et al. (2011) also found the influence of elevation on stem taper of Schrenk’s spruce (Picea schrenkiana Fisch. & C.A. Mey.) in northwest China. The quantification of climatic effects on stem taper was not in an objective of this study. Further research might support the decision to fit the separate models for different sub-regions.

The fixed-effects models, fitted by ordinary least squares or nonlinear least squares, minimize the sum of squared differences between the observed and predicted values of the data. The sum of squared errors is smaller than for mixed-effects models when the random parameters are not used in prediction. The fixed-effects models are more accurate when the random parameters of the mixed-effects models are supposed to be zero and additional measurements are not available to calibrate the model (Meng et al. 2009; Pukkala et al. 2009; Shater et al. 2011; Groom et al. 2012; Guzmán et al. 2012; de-Miguel et al. 2013; Arias-Rodil et al. 2015).

Consequently, for prediction purposes, many researchers suggest using the fixed-effects models in the absence of calibration data (de-Miguel et al. 2013; Arias-Rodil et al. 2015). However, de-Miguel et al. (2013) advised to record both fixed and mixed-effect forms of a model since calibration may be feasible in some cases. Calibration can assist in determining the best estimates with or without the prospect of model calibration. Accordingly, mixed-effects modeling of the MB-76 equation was carried out. Mixed-effects models would be used where additional upper stem diameter measurements are available for calibration.

Conclusion

Three widely used stem taper functions, Kozak-2, MB-76, and Fang-2000 were evaluated for Betula platyphylla in different sub-regions of the Daxing’an Mountains. Kozak-2 and MB-76 models provided good results and were similar to a certain extent. Without multicollinearity, Kozak-2 performed slightly better in the goodness of fit statistics and graphical representation. As recommended in the literature, models bearing less multicollinearity should be preferred. Lowest multicollinearity and all significant parameters are two substantial rationales to suggest the MB-76 model for diameter prediction of B. platyphylla in northeast China.