Introduction

Foresters, policy makers and scientists often use models to estimate forest biomass or wood production, carbon budgets, and the impacts of climate change, management, or different species mixtures on growth. Many models have been developed that vary greatly in spatial and temporal resolutions, model complexity and the degree to which they rely on statistical relationships between variables versus the physiological processes of the system (Battaglia and Sands 1998; Fontes et al. 2010; Korzukhin et al. 1996; Pretzsch et al. 2015). Statistical models, or statistical model components, use equations and parameters derived from data sets representative of the conditions of interest (Korzukhin et al. 1996; Pretzsch et al. 2015; Vanclay and Skovsgaard 1997). However, such data sets are often not available for the range of conditions of interest to the model users, for example in areas where a species has not previously been present, or when new silvicultural treatments are of interest, or as new climatic conditions arise. In contrast, process-based models, or models with some process-based components, have been developed to overcome this limitation by simulating the physiological processes that influence growth and how these processes are influenced by the environment.

Parameters used for process-based models are often derived from measurements of physiological processes, such as light absorption, transpiration, carbon partitioning and nutrient cycling (Minunno et al. 2019; Pretzsch et al. 2015). Intensive physiological measurements of many processes are often used to calibrate process-based models (e.g. Battaglia et al. 2004; Duursma and Medlyn 2012; Gonzalez-Benecke et al. 2016; Pietsch et al. 2005; Wei et al. 2014). However, measurements required to directly calculate all parameters are not often available. Therefore, many studies use trial and error to calibrate process-based models such that intensive physiological measurements and growth and yield data are used to calculate most parameters, while remaining parameters are tuned to maximize correlations between predictions and observations (e.g. Sands and Landsberg 2002). In other studies, the parameters are calculated using statistical fits of the given parameter as a function of stand characteristics that were calculated by the model (e.g. Gonzalez-Benecke et al. 2016).

A potential difficulty faced when measuring physiological processes is that measurements are not often possible for all species provenances within the region/s where the process-based models are to be applied and may therefore inadequately represent the physiology, morphology or phenology of the given species. Even when accurately measured parameter values are obtained, given that models are simplifications of reality, their parameters can play slightly different roles than their parameter name implies (van Oijen 2017). Therefore, the value of a parameter that produces the most realistic model behaviour may differ from the measured value (van Oijen 2017). To address this uncertainty, process-based model predictions can be improved when observations of outputs are used to constrain parameter values during model calibration (Hartig et al. 2012; Thomas et al. 2017; van Oijen 2017; van Oijen et al. 2005). While this means that the outputs will depend on the observed data in a similar way to statistical models (Minunno et al. 2019), it increases the reliability of the predictions for the region where the observed data were obtained. Several studies used inverse model-data assimilation methods such as Bayesian calibration to examine the uncertainty of parameters or model structures and to provide statistical distributions of parameter values (Fer et al. 2018; Gertner et al. 1999; Hartig et al. 2012; Minunno et al. 2019; Thomas et al. 2017; van Oijen 2017; van Oijen et al. 2013).

The forests of central Europe, such as in Switzerland, cover many environmental conditions and species compositions. Therefore, the objective of this study was to use Bayesian calibration to obtain parameters for the 3-PG model (Physiological Processes Predicting Growth; Forrester and Tang 2016; Landsberg and Waring 1997) for twelve central European tree species. To accomplish this, a literature review of potential parameter values was combined with analyses of data from 2499 forest inventory plots in Switzerland. Data from monospecific plots were used for the Bayesian calibration and validation to obtain the 3-PG parameters, while an additional validation was done using 13 mixed-species plots.

Methods

The 3-PG model

3-PG is a stand-level model with a monthly time step. It was initially developed for evergreen, even-aged monospecific forests (Landsberg and Waring 1997) and was recently further developed for deciduous, uneven-aged and mixed-species forests (Forrester and Tang 2016), i.e. as a cohort-based model. 3-PG consists of five sub-models: light, biomass production, water balance, allocation and mortality. The light sub-model calculates light absorption using species-specific light extinction coefficients, leaf area indices and, in the case of multi-species or multi-cohort stands, the vertical positioning of each cohort based on species-specific mean height and crown length (Forrester et al. 2014). The biomass production sub-model calculates gross primary productivity based on a species-specific canopy quantum efficiency (αC) that is reduced by limitations caused by temperature, frost, vapour pressure deficit, soil moisture, soil nutrient status, atmospheric CO2 and stand age (Almeida et al. 2009; Landsberg and Waring 1997; Sands and Landsberg 2002). Net primary productivity is calculated as a constant fraction of gross primary productivity (Waring et al. 1998). In the water balance sub-model, transpiration and soil evaporation are calculated using the Penman–Monteith equation (Monteith 1965; Penman 1948). These are added to canopy interception to predict evapotranspiration. Soil water is calculated as the difference between evapotranspiration and rainfall, while draining off any water in excess of the maximum soil water holding capacity (Sands and Landsberg 2002). The biomass allocation sub-model distributes net primary productivity to roots, stems and foliage depending on soil nutrient status, vapour pressure deficit, soil moisture and tree size. The mortality sub-model calculates density-dependent mortality based on the −3/2 self-thinning law by Yoda et al. (1963) and density-independent mortality, e.g. caused by pests, diseases or drought (Gonzalez-Benecke et al. 2014; Sands 2004). The simulated biomass is converted into output variables such as mean tree diameter, height, basal area, wood volume and size distributions using allometric relationships. All sub-models have been validated by comparing predicted outputs from the given sub-model with measurements of the same process, such as transpiration, light absorption, carbon partitioning and mortality (Gupta and Sharma 2019; Landsberg and Sands 2011). This piece-wise approach to validation reduces the likelihood of developing compensating errors among sub-models (Korzukhin et al. 1996; Sands 2004) that can be hidden by only validating a specific selection of the outputs, such as growth, stand density and mortality. 3-PG was run using the r3PG package (Trotsiuk et al. 2020) in R (R Core Team 2019). A description of all 3-PG parameters is provided in Table S1. When running r3PG, we used settings = list(light_model = 2, transp_model = 2, phys_model = 2, height_model = 2, correct_bias = 1, calculate_d13c = 0). Therefore, the parameter sets developed from this study are most appropriate for these settings.

Forest inventory data

We used data from three permanent plot networks: long-term forest monitoring plots from managed forests and experiments from the Swiss Experimental Forest Management (EFM) network (Forrester et al. 2019), un-managed plots from the Swiss Forest Reserve Network (FRN) (Hobi et al. 2020) and the Swiss National Forest Inventory (NFI) (Fischer and Traub 2019; WSL 2020) (Fig. 1).

Fig. 1
figure 1

Location of the monitoring plots used for the estimation of allometric or size-distribution parameters (e.g. based on regression analyses), as well as the Bayesian calibration and validation. Numbers in parentheses indicate the number of plots used for each category in the legend. Note that some EFM and FRN plots with different management strategies or site/topographic characteristics are located at the same location and are therefore overlapping on the maps

The EFM and FRN have been monitored following almost identical methodologies. They consist of long-term permanent plots, with the oldest plots monitored since 1888 and 1955, respectively. The EFM is used to examine silvicultural treatments across a range of species, climate and edaphic conditions. In contrast, the FRN acts as a reference for un-managed conditions. The average measurement intervals for the EFM data are 6.3 years (minimum of 1 and maximum of 29) and for the FRN plots are 12 years (minimum of 1 and maximum of 27). These intervals depend on the growth rates, stand age and research objectives. All individual trees with a diameter at a height of 1.3 m (d) ≥ 4 cm, for FRN plots, or ≥ 8 cm for EFM plots, are measured. For each tree, the d, status, and species are recorded. Tree height, crown diameter and crown length are measured for a subset of trees. For the EFM plots, age is calculated based on the planting or regeneration date, and measurements are taken at the same time as thinning to ensure an accurate recording of trees that are thinned and trees that die. Age is usually not available for the FRN plots, so we only used FRN data for purposes where age information was not required. In this study, the time span between the first and the last measurement (time series length) for a given plot of the EFM and FRN ranged between 2 and 121 years (mean 26 years). For the Bayesian calibration, we only used data collected after the 1930s because the climatic data (required by 3-PG) were less reliable prior to 1930. All plots and data were used for parameter estimation using regression analyses.

The NFI plots are distributed on a regular grid of 1.4 km including about 6,500 permanent monitoring plots that have been measured at approximately 10-year intervals since 1983. They are circular nested plots where every tree with a d ≥ 12 cm is recorded within an inner 200-m2 circle (horizontal radius = 7.98 m), and every tree with a d ≥ 36 cm is recorded within a 500-m2 circle (horizontal radius = 12.62 m). For every individual tree, the d, crown length, status and species are recorded. Tree height and diameter at a height of 7 m are measured for a subset of trees. Age is estimated using a regression model that was fit to the data obtained either from counting tree rings or from counting layers of branch whorls (for conifers) directly on the plot (Fischer and Traub 2019). The trees lost due to thinning or mortality are derived from inventory data. In this study, the time span between the first and the last measurements (time series length) for a given NFI plot ranged between 5 and 34 years (mean 21 years).

The EFM/FRN complements the NFI network in terms of their variability and accuracy. That is, parameter uncertainty, and hence output uncertainty, can be reduced by increasing the variety of data used for calibration, increasing the accuracy of the measurements and increasing the lengths of the time series (van Oijen et al. 2005). The NFI plots increase the range of climatic and edaphic conditions because they have been distributed on a regular grid through Swiss forests and therefore increase the range of conditions already accounted for by the EFM/FRN plots. Thomas et al. (2017) found that calibrations using environmental gradients can constrain parameters associated with water and nutrient sensitivity to a similar degree as nutrient, drought and irrigation experiments. The EFM/FRN plots increase the data accuracy compared with the NFI plots, which have less accurate information about tree status (e.g. whether a tree died or was thinned) and tree age. Minunno et al. (2019) showed that more accurate calibration can be obtained with more accurate inventory data, such as long-term experimental plots (e.g. EFM and FRN) rather than plots with less accurate information about variables such as ages and tree status. The EFM plots also increase the lengths of the time series. For example, although our data set did not include any CO2 fertilization experiments, which are useful for calibrating the 3-PG CO2 parameters (Thomas et al. 2017), the EFM data (pre-1950s to 2019) cover periods with atmospheric CO2 < 320 ppm to > 400 ppm. A summary of stand and site characteristics for all plot networks combined is shown in Table 1.

Table 1 Summaries of plot and site characteristics by species for all plots (EFM, FRN and NFI) used in this study. These species were selected because they were the twelve most abundant in our data set. Relative height is the mean height of the species divided by the mean height of all trees in the plot. Note that not all plots were used for any single purpose (Bayesian calibration, validation, parameter estimation using regression analyses)

Climate and soil data

Climate data were obtained by interpolation (100 m spatial resolution) using the DAYMET method (Thornton et al. 1997) by the Landscape Dynamics group (WSL, Switzerland) using data from MeteoSwiss stations (Swiss Federal Office of Meteorology and Climatology). Site-specific plant available soil water was retrieved from the European soil database derived data (Panagos et al. 2012). No site-specific information about soil fertility was available to estimate the 3-PG input variable that defines soil fertility (FR). Previous studies have used site indices, climate data and available soil water to calculate soil fertility (e.g. Forrester et al. 2017a). That is, by assuming the site index is mainly a function of soil fertility, available soil water and climate, then if only soil fertility is unknown, it can be calculated from the other variables. However, calculations of site index (based on age and height) were not considered reliable enough to use for this study because many plots lacked accurate age data, and therefore, we followed a simpler approach from several previous studies using 3-PG (Coops and Waring 2011; Mathys et al. 2014) where the FR for all sites and species was set to 0.5 for the Bayesian calibration and validation based on monospecific plots.

Plot selection

Subsets of EFM, FRN and NFI data were used for five main calculation steps: (1) 1418 EFM and 297 FRN plots were used for regression-based estimation of many allometric parameters, size-distribution parameters and parameters describing the fraction of mean single-tree foliage, stem or root biomass lost per dead tree (mF, mS, mR), (2) Bayesian calibration based on a subset of 161 EFM and 152 NFI monospecific plots, (3) validation based on a different subset of 138 EFM and 632 NFI monospecific plots that were not used for calibration, (4) Bayesian calibration after combining all plots used for steps 2 and 3 (i.e. to obtain a final parameter set) and (5) validation based on 13 mixed-species EFM plots and the parameter sets develop in step 4. These data sets focused on twelve major European species shown in Table 1.

Step 1—estimation of parameters based on regression analyses

For the first step, parameters associated with tree allometry and size distributions were calculated using regression analyses based on 1418 EFM and 297 FRN plots (Fig. 1). The allometric parameters of 3-PG are used to calculate mean tree height, crown diameter, crown length and volume from variables such as mean d, age, stand density and relative height. The selection criteria for these plots were that all variables required by the allometric equations were available (e.g. d, height, crown length, crown diameter, age, basal area). Since age was usually not available for FRN plots, these could only be used for regressions without age. An additional criterion for plots used to obtain parameters used to describe diameter distributions and stem mass distributions was that the given species was even-aged and hence had unimodal-shaped distributions. That is, the stand can be uneven-aged without a unimodal-shaped size distribution (e.g. where one species is older than another), but the trees from a given species must appear to be about the same age based on plot records and visual inspection of size distributions.

The height and live-crown length equations were fit as nonlinear equations using the nls function in R 3.5.1 (R Core Team 2019). The volume, crown diameter and Weibull parameter functions were fit as hierarchical mixed-effect models using the lme function of the nlme R package (Pinheiro et al. 2018). Initially all fixed effect variables were included before all non-significant (P > 0.05) variables were removed in order of decreasing P-value. Residual and normal quantile plots were assessed to ensure that residuals were centred at zero and approximately normally distributed. Plot was included as a random effect. For the mixed-effect models, a “pseudo” R2 and conditional “pseudo” R2 (R2c) were calculated using the function r.squaredGLMM in the R package MuMIn (Bartoń 2016).

We used an alternative height and live-crown length equation to the earlier equation described in Forrester and Tang (2016). This alternative equation was a Michajlow (or Schumacher) function (Michajlow 1952) (Eq. 1):

$$y=1.3+\mathrm{a}\times {e}^{-{n}_{B}/d}+{n}_{C}\times C\times d$$
(1)

where y is height or live-crown length in metres, C is the competition variable of 3-PG indicating stand density, and a, nB and nC are fitted parameters used for 3-PG.

3-PG calculates d and stem mass distributions as Weibull distributions using shape, scale and location variables. Each of these variables is calculated as a function of mean tree or stand characteristics, and the parameters of these functions are 3-PG parameters. These can be calculated using data from monocultures or mixtures, but in mixed-species plots, the number of trees in each size class is divided by the species proportion (in terms of basal area) before fitting the equations (Tables S6–S11). The fractions of mean single-tree foliage, stem or root biomass lost per dead tree (mF, mS, mR) (Table S1) were calculated from the slope of the relationship between the proportion of stand foliage, stem or root biomass of dead trees and the proportion of the number of trees lost due to mortality (Landsberg et al. 2005). Detailed descriptions of parameter estimation (based on regressions) or calculations using literature sources are provided in Forrester (2020).

Step 2—Bayesian calibration

Bayesian inference was used to derive parameter estimates and uncertainties for 18 parameters (Table 2). These 18 parameters were selected because they could not be calculated directly from our data, such as those in step 1, and sensitivity analyses have shown that 3-PG is sensitive to these parameters (Almeida et al. 2004; Esprey et al. 2004; Forrester and Tang 2016; Law et al. 2000; Mathys et al. 2014; Meyer et al. 2017; Navarro-Cerrillo et al. 2016; Pérez-Cruzado et al. 2011; Potithep and Yasuoka 2011; Xenakis et al. 2008). This step was based on 161 EFM and 152 NFI plots. The selection criteria were: (1) that the plots were monospecific and even-aged, (2) with at least two consecutive measurements, (3) no ingrowth, such that any pairs of consecutive measurements with ingrowth were excluded, (4) no obvious measurement errors, and (5) stand age was known. No FRN plots were used for calibration or validation because they are generally mixed-species and uneven-aged plots without age information. Monocultures were defined as plots where more than 80% of the basal area was composed of the object species. If the object species occupied > 80% but < 100% of plot basal area, its biomass stocks were adjusted by dividing by its proportional contribution of basal area to the plot. Mixed-species plots could not be used for Bayesian calibration because each species within a mixture would need to be calibrated simultaneously. Therefore, since most species occurred in different types of mixtures, most species would have needed to be calibrated simultaneously, which was beyond the scope of this study. An additional criterion for the NFI plots were that there was no thinning because thinning, as opposed to mortality, is difficult to measure accurately in the NFI plots. Thinned plots were not excluded from the EFM network.

Table 2 Prior ranges of each parameter included in the Bayesian calibration. These ranges were made by using information from the literature review (Table S12)

This Bayesian inference approach requires a prior range between which the parameter for the given species can occur. We assumed uniform (i.e. non-informative) prior distributions for each of the 18 model parameters. If many published values were available for a given parameter and species, the prior range was set as the minimum and maximum values (Table S12) plus or minus 15% (Table 2). For the less commonly measured parameters and species, we set the prior ranges to the mean value found in the literature for that species (Table S12) or other species considered similar in terms of the given parameter, plus or minus 20% (Table 2).

The first calibrations indicated that the prior ranges of the self-thinning parameters (wSx1000 and thinPower in Tables 2 and S1) were too narrow, despite many published studies informing the prior ranges. In contrast, the upper limit of the prior ranges for the Tmin parameter was too high for P. abies, P. cembra and P. menziesii and was reduced to 2.5, as informed by the literature (Table S12). Note that the Tmin parameter of 3-PG represents a mean monthly minimum temperature (i.e. monthly-Tmin) where the species can no longer grow. However, this value will probably be lower than the actual daily Tmin (daily-Tmin) that determines whether growth is possible for the given species, because even when the monthly-Tmin is lower than the daily-Tmin, there are likely to be several days during that month where the minimum temperature is higher than the daily-Tmin, and hence, some growth will still occur for that month. The reduction of Tmin to 2.5 was required because Tmin is multiplied by the canopy quantum efficiency parameter, αCx, to calculate gross primary production. The overestimation of Tmin restricted growth to warmer months, and the resulting reduction in annual growth was compensated for by an overestimation of αCx.

The likelihood function was constructed to be robust against outliers by modelling the residual error as a Student’s t distribution with sampled degrees of freedom (see Code S1; Lange et al. 1989), following Augustynczik et al. (2017). For each calibration, we parametrized the degrees of freedom of the output variable using the constant of the probability of having outliers in the dataset and estimated the parameter using a uniform prior distribution from 0 to 1. The variance of each observation was estimated using the uniform prior distribution specific for each variable: stem biomass (0–50), foliage biomass (0–5) and root biomass (0–15). The joint posterior distribution for the model parameters was estimated using a Differential Evolution Markov Chain Monte-Carlo algorithm (DEzs MCMC, ter Braak and Vrugt 2008) implemented in the BayesianTools R package (Hartig et al. 2019). For each species, three independent DEzs MCMC runs were made, each with three internal chains. Convergence was tested by visual inspection of the trace plots and using the Gelman–Rubin diagnostic (Gelman and Rubin 1992). Convergence was accepted when the multivariate potential scale reduction factor was ≤ 1.1. Three independent DEzs MCMC chains with 2 × 106 iterations were required to achieve convergence. All analyses and calculations were performed in the R language for statistical computing (R Core Team 2019).

For calibration, we used three variables that describe stand stocks: stem biomass (SB), foliage biomass (FB) and root biomass (RB). Stem, root and foliage biomass were calculated for each measured tree using equations developed for European forests (Forrester et al. 2017b) and summed up to the stand level in Mg dry matter ha−1. The fractions of mean single-tree foliage, stem or root biomass lost per thinned tree (F, S, R) were calculated for each plot and growth period as the ratio of the proportion of stand foliage, stem or root biomass of thinned trees and the proportion of the number of trees lost due to thinning. The first observations on each monitoring plot were used to initialize the 3-PG model runs (see below), while the subsequent observations were used to calculate likelihood for DEzs MCMC runs (Tables 3 and 4).

Table 3 Posterior ranges of each parameter included in the Bayesian calibration that was based only on the calibration plots
Table 4 Parameters and their sources: * calculated from published studies (for sources, see Table S12),  default (Forrester and Tang 2016; Sands and Landsberg 2002), calculated from EFM and FRN data (regression analyses), #posterior parameter estimates from Bayesian inference (using all calibration and validation plots; Step 4) and their quantiles (median and 2.5%–97.5% quantiles). For a description of the 3-PG parameters, see Table S1

Step 3—3-PG model evaluation and validation for monocultures

The validation of monospecific plots was based on the same criteria as the plots used for Bayesian calibration and included 138 EFM and 632 NFI plots that had not been used for the calibration. For species where the total number of plots was above 30, we randomly split the full set of monitoring data into two equally sized groups, resulting in a calibration and a validation set. For the two Quercus species, we used 70% of the total number of plots for calibration and 30% for validation. For the rest of the species, we used all available monitoring plots for calibration.

The skill of the 3-PG model to generate model predictions was assessed using posterior predictive distributions obtained by running the model with 1,000 random samples from the parameters’ posterior distribution. The model performance was evaluated using the percentage bias (pBias; Eq. 2), root mean squared error (RMSE; Eq. 3) and normalized root mean square error (NRMSE; Eq. 4). The statistics were calculated at the plot level and then averaged for each of the 1,000 parameter samples. The validation only used the most recent set of observations at each plot to maximize the time between initialization and validation, which ranged from 4 to 87 years.

The pBias, RMSE and NRMSE were calculated as

$$pBias=100\frac{\stackrel{\sim }{P}-\stackrel{\sim }{O}}{\stackrel{\sim }{O}}$$
(2)
$$RMSE=\sqrt{\frac{{\sum }_{i=1}^{n}{{(P}_{i}-{O}_{i})}^{2}}{n}}$$
(3)
$$NRMSE=\frac{RMSE}{{O}_{max}-{O}_{min}}$$
(4)

where Oi are the observed values, Pi are the predicted values from 3-PG, \(\stackrel{\sim }{O}\) and \(\stackrel{\sim }{P}\) are the means and Omax and Omin are the maximum and minimum of the observed values.

Step 4—3-PG model calibration using all plots

The parameter estimates obtained in Step 2 were based on only about half of the plots, with the other half used for validation (Step 3). Therefore, a final set of parameters were obtained by repeating Step 2 but after combining the calibration plots and validation plots.

Step 5—3-PG model evaluation and validation for mixtures

To test that the parameter sets also provided reliable predictions for mixed-species forests, 13 mixed-species plots were simulated by inputting information from the first inventory. The selection criteria for these plots were the same as the calibration plots, except that they needed to be mixed. Since site-specific soil fertility data were not available, the FR was adjusted to a value that gave satisfactory model performance. The parameters used were obtained in Step 4 and were based on all calibration and validation plots. pBias and RMSE were calculated for each species within the 13 mixed-species plots. Only the most recent set of observations for each plot were used to maximize the time between initialization and validation, which ranged from 16 to 47 years. The model performance was calculated using the pBias (Eq. 2) and the relationships between predictions and observations.

Results

Estimation of parameters based on regression analyses

By using the EFM and FRN data sets, large sample sizes were available for most regressions, and these samples generally included broad ranges in tree sizes and stand, or site conditions (Tables 1 and S2-S11). The statistical information for each regression is provided in Tables S2 to S11.

The mean tree height and live-crown length were often influenced by stand density, as well as d (Tables S2 and S3). Individual tree crown diameter was often influenced by stand density and relative height, in addition to d (Table S5). Mean tree volume was a function of d and height for all species (Table S4).

The scale, shape and location parameters that describe d and stem mass distributions varied between species in terms of which variables were significant (d, relative height, age and stand density). For several species with lower samples sizes (A. pseudoplatanus, B. pendula, F. excelsior), none of the explanatory variables were significant for at least one of the scale, shape and location parameters, and in those cases, the mean scale, shape or location parameter for the given species was used (Tables S6-S11).

Bayesian calibration

By using the Bayesian calibration, we were able to reduce the parametric uncertainty of 3-PG. The width of the posterior parameter distributions based only on the calibration plots (Step 2), measured by the 95% quantile range (see Table 5), was on average only 69% of the width of the prior ranges (which are shown in Tables 2 and 5). This was reduced to 63% when considering the posterior parameter distributions based on the calibration and validation plots (Step 4) (Table 5). This reduction in the uncertainty of the parameters was greater for more common species (e.g. 20% for P. abies) than for rarer species in the data set (e.g. 81–82% for F. excelsior and P. cembra) (Table 5). The largest reduction in uncertainty for all species was for parameters associated with biomass partitioning (pFS20), root or foliage litterfall (gammaF1, gammaR) and light-use efficiency (alphaCx), while the lowest reduction in uncertainty was for parameters defining the responses to vapour pressure deficit, temperature or CO2 (CoeffCond, Tmax, fCg700) and the canopy interception of precipitation (MaxIntcptn).

Table 5 Posterior parameter ranges as percentages of the prior parameter ranges. The prior range is 95% of the estimated range between which the parameter for the given species can occur (Table 2), which was based on the literature review (Table S12). The posterior ranges are narrower ranges produced by the Bayesian calibration based on the “observed” biomass of the plots. The percentages are shown for two sets of posterior ranges. The first set were obtained from Bayesian calibration with only the calibration plots (Step 2: posterior ranges shown in Table 3). The second set are based on Bayesian calibration using all calibration and validation plots (Step 4: posterior ranges shown in Table 4)

3-PG model evaluation and validation for monocultures

3-PG reliably predicted the biomass stocks of the monospecific plots used for validation. The pBias was generally less than 20%, for stems, foliage and roots, and for several species it was less than 10% (L. decidua, P. menziesii and Q. robur) (Fig. 2). In comparison, the pBias for the calibration plots was generally < 10%, except for species with sample sizes that were too low to do a validation (B. pendula, A. pseudoplatanus, F. excelsior, P. cembra) (Fig. 2). As expected, the RMSE was highest for stem mass because this stock is typically much greater than root or foliage biomass. After accounting for differences in the size of stem stocks compared with root and foliage stocks, the NRMSE indicated that the foliage mass predictions had the highest errors.

Fig. 2
figure 2

Statistics for predictive error (percent bias, normalized root mean squared error and root mean squared error). The posterior predictive uncertainty was calculated by drawing 1,000 parameter combinations from the posteriori distribution and calculating model predictions for the calibration (green dots), validation (orange dots) and calibration + validation (black dots) monitoring data subsets. The dots represent the median value of the posterior predictive distribution, while the horizontal lines represent the 95% confidence interval. Species are ordered by number of available monitoring plots for calibration and validation. Numbers under the species name indicate the number of monitoring plots used for calibration and validation. For B. pendula, A. pseudoplatanus, F. excelsior, P. cembra, the samples sizes were too small to do a validation

3-PG model evaluation and validation for mixtures

The parameters resulted in accurate predictions for mixed-species plots in terms of stem, foliage (not shown) and root biomass, as well as other outputs derived from them such as d, height and basal area (Fig. 3). The slopes of the relationships between the predicted and observed values were usually close to 1 and not significantly different to 1. The exceptions included A. alba, which often also had a high pBias, and P. abies in terms of stem and root mass (Fig. 3; Table 6).

Fig. 3
figure 3

Comparison of observed and predicted stem biomass (a), root biomass (b), arithmetic mean diameter (c) and mean height (d) for a selection of 6 species in 13 mixed-species plots. Note that the “observed” biomass was calculated using allometric biomass equations. Only data from the end of the simulations are shown; for a given species, only one point is shown for each plot where that species occurred at the end of the simulations. The solid lines are 1:1 lines, and the dashed lines are fitted to the data to pass through the origin. Sample size N = 24

Table 6 Statistical information that describes the relationships between the predicted and observed variables for mixtures as shown in Fig. 3. The statistical information includes the percentage bias (pBias%), the relative mean absolute error (MAE%), the slope of the relationship forced through the origin, the P-value for the test of whether the slope of the relationship is significantly different from 1, and the R2 values. The mean species proportion is in terms of the contribution made to total stand basal area

Discussion

For all twelve European tree species, the Bayesian calibration provided parameter sets with prediction bias < 20% and in several cases < 10%. Given the wide range of site, climate and stand structural conditions covered by the plots of many of these species, the parameters are expected to be generally applicable for central European forests. Similarly, 3-PG parameters for species in North America, South America and Thailand, developed from a sample of plots within a population, have been successfully applied across wide ranges of site conditions and management (Almeida et al. 2010; Gonzalez-Benecke et al. 2014, 2016; Hung et al. 2016; Thomas et al. 2017). Consistent with this study, Bayesian calibrations using outputs including basal area, volume, biomass stocks, mean tree diameter or height, tree and soil nutrition, leaf area index and net primary productivity have produced accurate predictions for other process-based forest models (Minunno et al. 2019; van Oijen et al. 2013, 2005).

The reduction in parametric uncertainty was often larger for the parameters to which output variables of 3-PG have been found to be highly sensitive (e.g. alphaCx, pFS20, gammaF1), and minimal to parameters to which 3-PG outputs are relatively insensitive (e.g. Tmax, CoeffCond, MaxIntcptn) (for 3-PG sensitivity analyses, see Almeida et al. 2004; Esprey et al. 2004; Forrester and Tang 2016; Law et al. 2000; Mathys et al. 2014; Meyer et al. 2017; Navarro-Cerrillo et al. 2016; Pérez-Cruzado et al. 2011; Potithep and Yasuoka 2011; Xenakis et al. 2008). The largest reduction in uncertainty for all species was for parameters associated with biomass partitioning (pFS20), root or foliage litterfall (gammaF1, gammaR) and light-use efficiency (alphaCx). The gammaR is only used to calculate the loss of root mass and has no influence on other variables, which explains why it was so easily constrained by the output variable root mass. The pFS20 (and pFS2) controls the partitioning of aboveground biomass growth between stems and foliage, and the gammaF1 controls the foliage litterfall rate. These are therefore important determinants of two output variables that we used to constrain 3-PG (stem mass and foliage mass). They are also important determinants of stand structure (e.g. leaf area index, mean tree size) and hence processes including light absorption and mortality. Similarly, for several species there was a high reduction in uncertainty in the light extinction parameter (k) and the potential light-use efficiency parameter (alphaCx), which are also very influential parameters in terms of biomass production. Parametric uncertainty was improved very little for Tmax, CoeffCond and MaxIntcptn. The former two parameters were also not improved in another study using inverse modelling with 3-PG, despite including Eddy covariance evapotranspiration data to constrain the model (Thomas et al. 2017). Therefore, reductions in parametric uncertainty appear to be higher for parameters to which the model is most sensitive as well as parameters that influence the output variables used for the inverse modelling.

Inter-specific differences in parameters often confirmed expected inter-specific differences in physiology. For example, the potential light-use efficiency parameter (alphaCx) was highest for species known to have potentially high growth rates (e.g. P. menziesii, L. decidua) and intermediate for species such as F. sylvatica and Q. robur/Q. petraea. Nevertheless, it is important to note that the inverse modelling approach only provides an indirect indication of the parameters in terms of their physiological and ecological meanings.

In general, species for which the parametric uncertainty was reduced the most were those with the highest sample sizes. However, the prediction bias did not always decline with increasing numbers of plots, such that some of the most abundant species in the data set (e.g. F. sylvatica, P. abies and P. sylvestris) had prediction errors of > 10%, at least for some variables. Given that the accuracy of 3-PG predictions can be improved by developing different parameter sets for different provenances or even for different clones (Almeida et al. 2004), the larger prediction errors may have resulted from a greater intra-specific genetic diversity within the populations of these species. For example, P. sylvestris has a large physiological and morphological plasticity (Rehfeldt et al. 2002), and our data set included several provenance trials, although none were abundant enough to develop provenance-specific parameter sets.

A parameterization of the BGC forest model for P. abies indicated that different parameter sets were required for populations at low and high elevations (means of 607 m vs. 1385 m) (Pietsch et al. 2005). We did not observe any elevation-related effect on the bias of P. abies predictions and used a single parameter set for all elevations (282–2065 m). A single parameter set was also found to be appropriate in Finnish P. abies parameterizations of the PREBAS model (Minunno et al. 2019). This may indicate that 3-PG and PREBAS account for a cause of the elevation differences that was not accounted for by the BGC model.

Since all models are simplifications of reality, they inevitably do not include all the processes that influence their outputs. Consequently, some of the parameters within the model may compensate for missing information, and as a result, they will not represent exactly what is implied by their parameter name (van Oijen 2017). Preliminary calibrations showed that the self-thinning parameter (wSx1000) required different values than indicated by the means of many published values, and the prior ranges therefore needed to be widened. Self-thinning parameters were also required to vary between experiments for Bayesian calibrations of 3-PG for Pinus taeda (Thomas et al. 2017). This may indicate that improvements can be made to 3-PG in relation to quantifying the carrying capacity of a site, although the effects of site and climate on self-thinning rates do not appear large enough in Swiss forests to account for this (Forrester et al. 2021). This also indicates the value of validating each sub-model by comparing predicted outputs from the given sub-model with measurements of the same process, rather than indirect approaches that, for example, validate a light or water balance sub-model using growth and yield data (Korzukhin et al. 1996; Sands 2004). This provides confidence that the sub-models perform as their names imply and therefore that deviations in parameter estimates based on Bayesian calibrations, compared with field measurements of the parameters, are likely to indicate a missing process within the model, rather than significant problems with existing model components.

Another source of error is the calculation of biomass. The “observed” biomass was predicted using allometric equations developed from an independent European-wide data set (Forrester et al. 2017b). However, even though these equations accounted for variables such as tree diameter and stand density, they are unlikely to be as accurate as site-specific destructively sampled biomass measurements, and they will not reflect the short-term (monthly or annual) variability in biomass allocation that is predicted by 3-PG. In this study, we did not account for the errors associated with the allometric biomass equations or the allometric equations used to obtain 3-PG parameters for height, crown length and crown diameter. Therefore, the ranges of the parameter posterior distributions may have been underestimated, and when estimates of errors are required when forecasting, the variance related to the allometric equations would need to be considered in the outputs.

By setting the soil fertility for all plots to 0.5, some of the other parameters would have been forced to account for variability in biomass that was actually due to soil fertility. The resulting influence of this approach on the parameter estimates is assumed to be relatively low because of the large number of plots used and because many of the plots are likely to have FR close to 0.5 assuming a roughly normal distribution of FR. However, the parameters could still be improved when reliable soil fertility information becomes available.

Lastly, the parameter sets generally provided accurate predictions for mixed-species forests. The validation for mixtures was based on small sample sizes (13 mixed-species plots) due to the difficulty in finding even-aged mixtures that only contained the species that had been parameterized. Nevertheless, these results confirm that 3-PG can be calibrated using monocultures and applied to mixtures, as previously found in Europe (Bouwman et al. 2021; Forrester et al. 2017a) and China (Forrester and Tang 2016).

In conclusion, the combination of a literature review, direct estimation of many allometric parameters, several inventory plot networks and Bayesian calibration resulted in reliable 3-PG parameters for twelve major European species. These parameters are also applicable for mixed-species forests. The information sources used to develop the parameters included a wide range of climatic, edaphic and management conditions and long time spans (from 1930 to present). Given this and the process-based structure of the 3-PG model, these parameters are likely to be applicable for most central European forests and conditions.