1 Introduction

Water shortages are increasingly severe throughout the world, and the imbalance between water supply and demand has seriously restricted regional social and economic development. Agricultural water use dominates the consumption of the total water resources. If the water supply cannot meet the crop demands, food security will not be guaranteed (Zhang et al. 2019). Global climate change has a great influence on the global ecosystem as well as social and economic systems, and will continue to have far-reaching negative or adverse effects (Dobson et al. 1997; Jun et al. 2012; Seaby et al. 2015; D’Amato and Akdis 2020). The increase in atmospheric CO2 concentration and temperature as well as the change of precipitation are the most important ecological factors impacting agricultural production as they influence the yield, pests, and diseases of crops as well as agricultural water resources (Zilli et al. 2020). Although the water resources in Southwest China are relatively abundant, climate change has contributed to serious water scarcity in Southwest China in the 2000s (Qiu 2010; Tang et al. 2014; Xu et al. 2015). Precipitation has decrease and drought frequently has increased (Wang et al. 2013; Li et al. 2019). As a principal agricultural district, it is necessary to accurately analyze the risk of agricultural water shortages under natural precipitation conditions such that the irrigation water can be determined. Furthermore, it can provide an objective support for irrigation system formulation and regional water resources allocation.

The authentic data series of natural water supply and demand is the key to understand the risk of regional agricultural water shortages, which was generally expressed by the reference crop evapotranspiration (ET0) and precipitation (P) in previous research. Several studies have analyzed the correlation between ET0 and P, revealing an imbalance between natural water demand and supply to some extent (Ding et al. 2011; Vangelis et al. 2011; Zhang et al. 2017a, b). However, the crop water requirement (ETc) which considers the physiological characteristics of crops and effective precipitation (Pe) subtracts the surface runoff and deep leakage more accurately represent the natural water demand and supply (Gong et al. 2020). Zhang et al. (2007) reported ETc and Pe could more accurately evaluate the risk probability of natural water shortages. Few studies conducted a risk analysis of water shortages using ETc and Pe, especially in the Sichuan Hilly Area which is a main grain producing area in Southwest China.

The data series of ETc and Pe obtained from historical data may have errors or missing. Moreover, there may be deviation between future and historical data due to the influence of climate change and human activities. Considering these challenges, the stochastic simulation methods are rapidly developed to overcome the uncertainty and randomness of variables. The Monte Carlo method, a statistical experiment method, has been extensively applied in hydrological stochastic simulation analysis (Yu et al. 2013; Caballero and Rahman 2014; Watson et al. 2015; Yazdi et al. 2016; Peng et al. 2017; Goharian et al. 2018; Montaseri et al. 2018). Yu et al. (2013) developed a joint Monte Carlo and fuzzy possibilistic simulation method (MC-FPS) for flood risk assessment, and the results showed the flood damage estimated by MC-FPS was higher compared to the conventional method. Montaseri et al. (2018) simulated the precipitation time series of 1000 sequences by Monte-Carol procedure based on the 50-years measured data of 10 stations in the world, then performed drought monitoring by using the Standardized Precipitation Index (SPI). The result reinforced the advantages of the Monte Carlo procedure in an investigation of long-term drought characteristics. The Monte Carlo method can make up the weaknesses that historical data may be unavailable or missing, and obtain accurate results. To our knowledge, only Zhang et al. (2019) made use of two-dimensional Monte Carlo simulation to assess the risk of water scarcity based on the P and ET0 data series during 1970–2013 in the Luhun irrigation district of China.

The stochastic simulation of climate data should maintain the internal properties of each variable and the dependence between them (Srikanthan and McMahon 2001), and they can be realized by the copula function and Markov process. The copula function, which reflects the dependence structure independently of the marginal distributions of variables, has been widely applied in the hydrological field of drought (Shiau 2006; Serinaldi et al. 2009; Song and Singh, 2010; Wong et al. 2010; Zhang et al. 2013; Nguyen-Huy et al. 2019), precipitation frequency analysis (De Michele and Salvadori 2003), streamflow (Chen et al. 2015; Jeong and Lee 2015; Pereira and Veiga, 2018), flood risk analysis (Favre et al. 2004; Tatiana et al. 2012; Requena et al. 2016) and storms (De Michele et al. 2007; Ariff et al. 2012). Recently, due to the advantages of copula function that refers to the unrestricted marginal distribution, diversified forms and easy calculation, the risk analyses based on the copula function has been a focal point (De Michele and Salvadori 2003). Naz et al. (2019) analyzed the impact of the flooding from the Indus river and its tributaries on the region of Pakistan using Archimedean copulas. Botai et al. (2020) researched the dependency between duration and severity of drought based on the copula function, and provided a robust approach to drought risk analysis at the provincial level. Risk analysis integrated with the copula function has a satisfactory effect, but the studies focusing on regional water shortages are still lacking. In addition, the Markov process, which is suitable for dynamic variables prediction changing with time and can combine with the copula function to resolve relevance issues, has been commonly used to perform the hydrological risk analysis such as flood and drought (Farahat and Asad 2012). But the applications based on the agricultural water resources shortage were rare.

Therefore, the objective of this study is to develop a stochastic simulation model by integrating the Monte Carlo, Copula and Markov process, and generate long ETc and Pe time series, so that they can be used in agricultural water shortages risk analysis. Firstly, based on the ETc and Pe data series of wheat-rice during 1961–2017 in the Sichuan Hilly Area, the contemporaneous dependence relationship between measured ETc and Pe series and the temporal dependence relationship of ETc or Pe series considering the Markov process were analyzed using the copula function. Secondly, a stochastic simulation model (MCMP-Copula) was constructed to simulate more data by Monte Carlo comprehensively considering the contemporaneous dependence and the temporal dependence of the measured ETc and Pe series. Thirdly, the joint probability distributions of measured and simulated data were used to assess the risk probabilities of natural water shortages in the Sichuan Hilly Area, and the assessment results of measured and simulated data were compared to estimate the modeling capabilities of the MCMP-Copula.

2 Materials and methods

2.1 Study area and data series

Sichuan Hilly Area is a typical hilly area in Southwest China that lies between longitude 104°2′ E–107°7′ E and latitude 30°3′ N–32°2′ N. The region has an area of 10,500 km2, and administratively covers Bazhong, Yibin, Leshan and Nanchong, etc. (Chen et al. 2013; Mathieu et al. 2016). The study area has a subtropical monsoon climate, and is characterized by warm and moist throughout the whole year. The annual precipitation is above 1000 mm, with 70% of precipitation occurring in summer. Precipitation tends to concentrate at the edge of region (Zhu et al. 2009), so the seasonal and regional water shortages exist. As an area with a high intensity of agricultural production, there are various cropping systems in the Sichuan Hilly Area. In our study, the wheat-rice cropping system was selected to analyze the risk of the imbalance between water supply and demand from October 9th of the first year (wheat sowing) to September 10th of the next year (rice harvest).

Nine representative stations locating in the agricultural production areas of the Sichuan Hilly Area were selected for the risk analysis of agricultural water shortages based on the data availability and computation cost. Table 1 describes the meteorological variables of the nine selected stations in detail. In addition, the Thiessen polygon put forward by A.H. Thiessen was applied to acquire an average surface data based on the point data of meteorological stations (Daffi and Wamyil 2017; Seong et al. 2019; Bai et al. 2019) (Fig. 1). The daily meteorological data during 1961–2017 were obtained from the China meteorological science data sharing service network (http://data.cma.cn), and includes sunshine duration (n), relative humidity (RH), maximum temperature (Tmax), minimum temperature (Tmin), precipitation (P) and wind speed at 2 m above the ground (u2) transformed from u10.

Table 1 The meteorological variables of the nine selected stations in the study
Fig. 1
figure 1

Geographic location, stations distribution, and Thiessen polygon analysis of the study area

2.2 Methods

The primary methods used in this study includes four stages: (1) The marginal distribution of ETc and Pecalculated based on the meteorological data was determined by hypothesis testing; (2) The common copula function was employed to analyze the dependence relationships of measured data, and the optimal copula function was selected utilizing goodness-of-fit evaluation. The contemporaneous dependence was performed by the copula function, and the temporal dependence not only used the copula function, but also considered the Markov process; (3) Based on the dependence relationships of measured data, the stochastic simulation model (MCMP-Copula) was constructed by the Monte Carlo method, and then a mass of simulated data was generated. Moreover, the joint distribution of simulated data was determined by using the copula function; (4) The water scarcity risk can be analyzed according to the joint distribution of the measured and simulated data series in different scenarios. The general framework used for the development of this study is shown in Fig. 2.

Fig. 2
figure 2

The general framework of this study

2.2.1 Crop water requirements (ETc) and effective precipitation (P e)

The crop water requirements (ETc) can be determined by the reference crop evapotranspiration (ET0) and crop coefficients (kc). In this study, the Penman–Monteith formula recommended by the Food and Agriculture Organization (FAO) was adopted to estimate ET0 (Allen et al. 1998). The value for kc was taken from the results of Wang et al. (2012), who revised the value of kc using the Subsection Mean Method raised by FAO. The effective precipitation (Pe) is the part of precipitation that can be used directly and can be estimated by the precipitation and the effective utilization coefficient (Nie et al. 2019). According to the Pe and ETc of the nine selected meteorological stations, the average surface ETc and Pe can be acquired by the Thiessen polygon.

2.2.2 Copula method

The copula can flexibly structure multivariate joint distribution with arbitrary marginal distributions, and is applied to reflect the positive and negative correlation among variables. Sklar’s theory (1959) is the basis of copula. Let H be the joint distribution function, F and G be the continuous marginal distribution function, then a unique copula function C exists and is expressed as follows:

$$H(x,y) = C(F(x),G(y)),\quad \forall x,y$$

Nelsen (1999) defined the two-dimensional copula function as a mapping C: [0, 1]2 → [0, 1]. Many types of copula functions can be used to establish multidimensional joint distribution of hydrological variables, among which the Archimedean copula family is applied widely because of its simple structure and easy calculation. The elliptic copula family which can well fit multivariate extreme value distributions and non-normal structures is also a popular copula family (Fang et al. 2002). In our study, five copula functions were used to analyze the water shortages risk, including three Archimedean copulas (Gumbel–Hougaard, Frank and Clayton copula) and two elliptic copulas (Gaussian and Student t copula).

Table 2 Three commonly used two-dimensional Archimedean copula functions

The optimal copula was selected using the following steps:

  1. (1)

    Parameter estimation of copula functions.

    The parameter θ of three Archimedean copulas can be estimated by the Kendall’s rank coefficient τ which reveals the nonlinear correlations between ETc and Pe.

    $$\tau = \frac{1}{{\mathop C\nolimits_{n}^{2} }}\sum\limits_{i < j} {sign[(\mathop x\nolimits_{i} - \mathop x\nolimits_{j} )(\mathop y\nolimits_{i} - \mathop y\nolimits_{j} )]}$$
    $${\text{sign}}[(\mathop x\nolimits_{i} - \mathop x\nolimits_{j} )(\mathop y\nolimits_{i} - \mathop y\nolimits_{j} )] = \left\{ {\begin{array}{*{20}c} 1 \\ 0 \\ { - 1} \\ \end{array} } \right.\begin{array}{*{20}c} {(\mathop x\nolimits_{i} - \mathop x\nolimits_{j} )(\mathop y\nolimits_{i} - \mathop y\nolimits_{j} ) > 0} \\ {(\mathop x\nolimits_{i} - \mathop x\nolimits_{j} )(\mathop y\nolimits_{i} - \mathop y\nolimits_{j} ) > 0} \\ {(\mathop x\nolimits_{i} - \mathop x\nolimits_{j} )(\mathop y\nolimits_{i} - \mathop y\nolimits_{j} ) > 0} \\ \end{array}$$

    where n is the number of measured hydrology variables. The relationship between θ and τ is described in Table 2 (Abdollahi et al. 2019).

    The detailed formulas of two elliptic copula functions and their parameters that are calculated by the maximum likelihood method can be found in Nadarajah and Kotz (2005) and Genest et al. (2007).

  2. (2)

    Hypothesis testing.

    The nonparametric Kolmogorov–Smirnov (K–S) test can perform a fit test between the joint distribution frequency and joint measured samples of hydrological variables, and its statistical magnitude D can be expressed as:

    $$D{ = }\mathop {\max }\limits_{1 \le k \le n} \left\{ {\left| {\mathop c\nolimits_{k} - \frac{{\mathop m\nolimits_{k} }}{n}} \right|,\left| {\mathop c\nolimits_{k} - \frac{{\mathop m\nolimits_{k} - 1}}{n}} \right|} \right\}$$

    where Ck is the value of joint measured samples (xk, yk) of the copula function and mk is the number of measured pairs (xk, yk) satisfying x ≤ xk and y ≤ yk. If the D is lower than its critical value, the assumed copula function passes the K–S test.

  3. (3)

    Goodness-of-fit evaluation.

    The most appropriate copula was selected according to the ordinary least square (OLS) and the Akaike information criterion (AIC) (Zhang and Singh 2007; Gao et al. 2018), and they are defined as:

    $$OLS = \sqrt {\frac{1}{n}\sum\limits_{i = 1}^{n} {\mathop {(\mathop P\nolimits_{i} - \mathop P\nolimits_{ei} )}\nolimits^{2} } }$$
    (5)
    $$AIC{ = }n\ln \left( {\frac{{\sum\nolimits_{i = 1}^{n} {\mathop {(\mathop P\nolimits_{i} - \mathop P\nolimits_{ei} )}\nolimits^{2} } }}{n}} \right) + 2k$$
    (6)

    where n is the number of samples, Pi and Pei are the calculated and empirical frequency of the copula function, respectively, and k is the number of parameters of the joint distribution. According to the joint empirical probability and theorical probability for the data series, the copula with lower OLS and AIC is determined as the reasonable joint probability distribution.

2.2.3 Markov process

The Markov process is a method used to study dynamic system state-space of random events, which has the non-aftereffect property, and the state value of the next moment is obtained according to the transfer rule between the adjacent states in space. Assume that {Xt, t = 1, 2, …, T} is a time sequence of hydrology variable and xt (t = 1, 2, …, T) is the value of observed sample. The first-order Markov chain in space can be given as:

$$P(\mathop X\nolimits_{t + 1} \le \mathop x\nolimits_{t + 1} \left| {\mathop X\nolimits_{0} } \right. = \mathop x\nolimits_{0} ,\mathop X\nolimits_{1} = \mathop x\nolimits_{1} ,...,\mathop X\nolimits_{t} = \mathop x\nolimits_{t} ) = P(\mathop X\nolimits_{t + 1} \le \mathop x\nolimits_{t + 1} \left| {\mathop X\nolimits_{t} } \right. = \mathop x\nolimits_{t} ) = k(\mathop x\nolimits_{t + 1} \left| {\mathop x\nolimits_{t} } \right.)$$
(7)

where k (xt+1|xt) represents the core of state transition, and it is the basic statistic to describe the continuous state space Markov chain.

$$f(\mathop x\nolimits_{t + 1} \left| {\mathop x\nolimits_{t} } \right.) = \frac{{\frac{{\mathop \partial \nolimits^{2} M(\mathop x\nolimits_{t + 1} ,\mathop x\nolimits_{t} )}}{{\partial \mathop x\nolimits_{t + 1} \partial \mathop x\nolimits_{t} }}}}{{\frac{{\partial F(\mathop x\nolimits_{t} )}}{{\partial \mathop x\nolimits_{t} }}}}$$
(8)

where f(·|·) represents the transfer density function of the Markov chain and M(∵) is the probability distribution function of variables at two adjacent moments.

Darsow et al. (1992) introduced the copula theory into Markov process. The one-dimensional copula-based Markov time-series process can be written as:

$$k(x_{t + 1} |x_{t} ,y_{t + 1} |y_{t} ) = C(F_{X}^{^{\prime}} ,F_{Y}^{^{\prime}} )$$
(9)
$$f(\mathop x\nolimits_{t + 1} \left| {\mathop x\nolimits_{t} } \right.,\mathop y\nolimits_{t + 1} \left| {\mathop y\nolimits_{t} } \right.) = c(\mathop {F^{\prime}}\nolimits_{X} ,\mathop {F^{\prime}}\nolimits_{Y} ) \cdot \mathop c\nolimits_{X} (\mathop F\nolimits_{X} (\mathop x\nolimits_{t + 1} ),\mathop F\nolimits_{X} (\mathop x\nolimits_{t} )) \cdot f(\mathop x\nolimits_{t} ) \cdot \mathop c\nolimits_{Y} (\mathop F\nolimits_{Y} (\mathop y\nolimits_{t + 1} ),\mathop F\nolimits_{Y} (\mathop y\nolimits_{t} )) \cdot f(\mathop y\nolimits_{t} )$$
(10)

where ut is the marginal distribution function and C is the copula function.

The two-dimensional copula-based Markov time-series process needs to consider the contemporaneous and temporal dependence relationships for data series. In the same way, its functions can be given as:

$$k(x_{t + 1} |x_{t} ,y_{t + 1} |y_{t} ) = C(F_{X}^{^{\prime}} ,F_{Y}^{^{\prime}} )$$
(11)
$$f(\mathop x\nolimits_{t + 1} \left| {\mathop x\nolimits_{t} } \right.,\mathop y\nolimits_{t + 1} \left| {\mathop y\nolimits_{t} } \right.) = c(\mathop {F^{\prime}}\nolimits_{X} ,\mathop {F^{\prime}}\nolimits_{Y} ) \cdot \mathop c\nolimits_{X} (\mathop F\nolimits_{X} (\mathop x\nolimits_{t + 1} ),\mathop F\nolimits_{X} (\mathop x\nolimits_{t} )) \cdot f(\mathop x\nolimits_{t} ) \cdot \mathop c\nolimits_{Y} (\mathop F\nolimits_{Y} (\mathop y\nolimits_{t + 1} ),\mathop F\nolimits_{Y} (\mathop y\nolimits_{t} )) \cdot f(\mathop y\nolimits_{t} )$$
(12)

where cX is the joint probability density function of Xt and Xt+1 and cY is the joint probability density function of Yt and Yt+1. FX and FY are the marginal distributions of X and Y, and F'X and F'Y refer to the corresponding conditional distributions, respectively.

2.2.4 Two-dimensional Monte Carlo simulation

The dependent relationships of two-dimension random variables include contemporaneous dependence which refers to the correlation between Xt and Yt and temporal dependence which refers to the correlation between Xt − 1 (Yt − 1) and Xt (Yt). The simulation by the copula function is the key point for generating a two-dimension time series in here. This indicates the simulating random numbers should be produced by the joint distribution of measured data considering the contemporaneous and temporal dependence relationships as both of them are indispensable. The two-dimensional Monte Carlo simulation can be achieved by the following two procedures: (1) The frequency sequences of random variables are generated based on the contemporaneous dependence function C; and (2) the two series of random variables in the previous stage are further simulated to satisfy the temporal dependence functions (Cx and Cy). Next, the stochastic simulation model (MCMP-Copula) is constructed and a long two-dimensional sequence that satisfies the Eq. (12) for natural water supply and demand can be obtained. Figure 3 presents the modeling processes and formulas in detail.

Fig. 3
figure 3

The flow chart of the two-dimensional Monte Carlo simulation

3 Results

3.1 The marginal distribution of measured P e and ETc

Based on the meteorological data in the Sichuan Hilly Area of China during 1961–2017, the data series of Pe and ETc (measured data) were achieved, as shown in Fig. 4. The annual average Pe and ETc were 756 and 845 mm, respectively, suggesting that the water demand (ETc) and supply (Pe) was unbalanced under natural precipitation conditions. Five univariable distributions (i.e., Normal, Lognormal, Gamma, Generalized Extreme Value, and Weibull distributions) were used to fit the Pe and ETc data series, and the corresponding parameters were obtained using the maximum likelihood method. When n is 56, the critical value for the K–S test is 0.1786. Five univariable distributions all passed the K–S test, and the Normal distribution and the Generalized Extreme Value distribution had the smallest D values for Pe and ETc, respectively (Table 3). Their correlation between the theoretical and empirical values were also the highest with correlation coefficients R2 of 0.9942 and 0.9908, respectively. Therefore, they were selected as the corresponding marginal distribution and the equations could be expressed as:

$$\mathop {u = F}\nolimits_{X} (x) = \frac{1}{{95.87\sqrt {2\pi } }}\int_{ - \infty }^{x} {\mathop e\nolimits^{{\frac{{ - \mathop {(t - 755.52)}\nolimits^{2} }}{18382.11}}} dt}$$
(13)
$$\mathop {v = F}\nolimits_{Y} (y) = \exp \left\{ { - \left[ {1 - 0.0075\mathop {\left( {\frac{y - 820.87}{{41.41}}} \right)}\nolimits^{{\frac{1}{0.0075}}} } \right]} \right\}$$
(14)

where x was Pe, and y was ETc. u and v expressed the marginal distributions of the copula function.

Fig. 4
figure 4

The data series of Pe and ETc in the Sichuan Hilly Area of China from 1961 to 2017

Table 3 Five commonly used univariable distributions and the results of hypothesis testing

3.2 Dependence relationships of measured P e and ETc

3.2.1 Contemporaneous dependence relationship (Joint distribution of measured data)

Five common copula functions, Gumbel-Hougaard, Frank, Clayton, Gaussian, and Student t copula, were applied to analyze the joint probability distribution between measured Pe and ETc which was defined as the contemporaneous dependence relationship in the study. On the basis of the data series for Pe and ETc from 1961 to 2017, the Kendall’s rank coefficient τ was estimated to be − 0.0103. In three Archimedean copula functions, only the Frank copula reflected the negative correlation between variables and was suitable to fit the joint probability distribution of measured Pe and ETc by passing the K–S test with the D value of 0.0858. The parameters of Gaussian and Student t copula were calculated using the maximum likelihood method, and both passed the K–S test with the values of 0.0851 and 0.0853, respectively. As noted in Table 4, the Frank copula had lower OLS and AIC values, and the theoretical probabilities were proximate to empirical probabilities with higher R2 of 0.9843. So the Frank copula was selected as the optimal copula function to reflect the correlation between measured Pe and ETc, and the function was expressed as:

$$C(u,v) = \frac{1}{0.0925}\ln \left[ {1 + \frac{{(\mathop e\nolimits^{0.0925u} - 1)(\mathop e\nolimits^{0.0925v} - 1)}}{{\mathop e\nolimits^{0.0925} - 1}}} \right]$$
(15)

where u and v were the marginal distributions of Pe and ETc, respectively.

Table 4 The goodness-of-fit evaluation of the joint probability distribution for the measured data series

3.2.2 Temporal dependence relationship

The temporal dependence relationship can reflect the time scale characteristics of hydrologic variables. In here, the Markov process was introduced, by which the continuous transfer core of the probability relationship between adjacent years was achieved. Subsequently, combing with the copula function, a one-dimensional copula-based Markov process was constructed to express the temporal dependence relationship of the Pe or ETc series, respectively (detailed in Sect. 2.2.4). Hence, the measured Pe or ETc data series were divided into two time periods: from 1961 to 2016 and from 1962 to 2017. The marginal distribution functions were still the Normal distribution for Pe series and the Generalized Extreme Value distribution for ETc series, which were selected in Sect. 3.1. Assuming that u3 and u4 were the marginal distributions of the Pe time series during 1961–2016 and 1962–2017, respectively, v3 and v4 were the marginal distributions of the ETc time series during 1961–2016 and 1962–2017, respectively.

Because the values of Kendall’s rank coefficient τ were − 0.0109 and − 0.0097 for the ETc time series and Pe time series (Table 5), the three copula functions (Frank, Gaussian, and Student t) were used to fit the temporal dependence relationship of the ETc or Pe series. The critical value of the K–S test is 0.1834 when n = 55 and α = 0.05. From Table 5, the copula functions of the ETc or Pe series all passed the K–S test. The lower OLS and AIC values as well as the higher R2 imply a better distribution fit, thus using the Gaussian copula as the joint distribution for Pe and ETc in two time periods was optimal with OLS, AIC and R2 of 0.0365, − 362.2211 and 0.9839 for Pe, respectively, 0.0429, − 344.3210 and 0.9825 for ETc, respectively. The copula functions were presented as:

$$\mathop C\nolimits_{x} (\mathop u\nolimits_{3} ,\mathop u\nolimits_{4} ) = \frac{1}{0.9998}\exp \left\{ {\frac{{ - [\mathop {(\mathop \Phi \nolimits^{ - 1} (\mathop u\nolimits_{3} ))}\nolimits^{2} + 0.037\mathop \Phi \nolimits^{ - 1} (\mathop u\nolimits_{3} )\mathop \Phi \nolimits^{ - 1} (\mathop u\nolimits_{4} ){ + }\mathop {\mathop \Phi \nolimits^{ - 1} (\mathop u\nolimits_{4} )}\nolimits^{2} ]}}{1.9993}{ + }\frac{{\mathop {(\mathop \Phi \nolimits^{ - 1} (\mathop u\nolimits_{3} ))}\nolimits^{2} { + }\mathop {\mathop \Phi \nolimits^{ - 1} (\mathop u\nolimits_{4} )}\nolimits^{2} }}{2}} \right\}$$
(16)
$$\mathop C\nolimits_{y} (\mathop v\nolimits_{3} ,\mathop v\nolimits_{4} ) = \frac{1}{0.9256}\exp \left\{ {\frac{{ - [\mathop {(\mathop \Phi \nolimits^{ - 1} (\mathop v\nolimits_{3} ))}\nolimits^{2} + 0.7568\mathop \Phi \nolimits^{ - 1} (\mathop v\nolimits_{3} )\mathop \Phi \nolimits^{ - 1} (\mathop v\nolimits_{4} ){ + }\mathop {\mathop \Phi \nolimits^{ - 1} (\mathop v\nolimits_{4} )}\nolimits^{2} ]}}{1.7136}{ + }\frac{{\mathop {(\mathop \Phi \nolimits^{ - 1} (\mathop v\nolimits_{3} ))}\nolimits^{2} { + }\mathop {\mathop \Phi \nolimits^{ - 1} (\mathop v\nolimits_{4} )}\nolimits^{2} }}{2}} \right\}$$
(17)

where Cx (u3, u4) expressed the joint probability distribution of the Pe series during 1961–2016 and 1962–2017, and Cy (v3, v4) expressed the joint probability distribution of the ETc series during 1961–2016 and 1962–2017.

Table 5 Evaluation indicators of the copula functions for measured ETc time series and Pe time series, respectively

3.3 Stochastic simulation based on historical sequence

3.3.1 Reliability demonstration of simulated P e and ETc

The water supply and demand data exhibited the nonlinear and random changes affected by the meteorological conditions, underlying surface conditions in the watershed, and human activities, etc. Relatively long measured data series were difficult to obtain, and risk analysis based on the small samples may produce some errors. To decrease the uncertainty of measured data, according to the measured Pe and ETc from 1961 to 2017 in the Sichuan Hilly Area of China, the MCMP-Copula considering the contemporaneous and temporal dependence relationships of measured data was developed to simulate 560 years Pe and ETc data.

Figure 5 plots the ETc and Pe data series based on stochastic simulation and Table 6 compares the measured and simulated data statistics. It was clear that the simulated data had identical statistical characteristics with the measured data and there was a small diversity between them.

Fig. 5
figure 5

The data series of ETc and Pe based on stochastic simulation

Table 6 The contrast between the measured values and simulated values of ETc and Pe in the Sichuan Hilly Area of China

The dependency relationships verification of simulated data was further performed to reveal the internal law of each variable and the relationships between them. Figure 6 draws the scatter plots to compare the theoretical and empirical joint distributions between measured and simulated data. The points were evenly distributed near the 45° line, which indicated the simulated Pe and ETc had the identical contemporaneous dependence relationship with the measured data. Figure 7 gives the autocorrelation function curve of measured and simulated data for Pe and ETc. The autocorrelation coefficients of measured and simulated data were equal within a certain time-lapse range, indicating that the simulated data captured the temporal dependence relationship of the measured data. Moreover, the autocorrelation coefficients of simulated data presented a more stable variation trend compared with measured data, and fluctuated around 0.

Fig. 6
figure 6

The mutual correlation verification of measured data and simulated data for a theoretical joint distribution and b empirical joint distribution

Fig. 7
figure 7

The autocorrelation verification of measured data and simulated data for a Pe and b ETc

In general, the simulated data which primely maintained the same statistical characteristics, contemporaneous and temporal dependence relationships as the measured data could be used to assess the natural water shortages risk.

3.3.2 The joint probability distribution of simulated P e and ETc

The five common copula functions were also applied to analyze the joint probability distribution of the simulated Pe and ETc. The Gumbel–Hougaard and Clayton copula in the Archimedean copula family were still unfit for estimating the joint probability because of the negative correlation between simulated Pe and ETc. The D values for the Frank, Gaussian and Student t copula were all less than the critical value for the K–S test with 0.0575 (when n = 560 and α = 0.05), showing that they all passed the K–S test (Table 7). According to the OLS, AIC, and R2 values form Table 7, the Gaussian and Student t copula were superior to the Frank copula. The OLS and R2values for Gaussian and Student t copula were equal, but the AIC of Gaussian copula was lower than the Student t copula. As a result, the Gaussian copula was the optimal joint probability distribution for simulated Pe and ETc, and the function could be written as:

$$C(U,V) = \frac{1}{0.9949}\exp \left\{ {\frac{{ - [\mathop {(\mathop F\nolimits^{ - 1} (U))}\nolimits^{2} + 0.201\mathop F\nolimits^{ - 1} (U)\mathop F\nolimits^{ - 1} (V) + \mathop {\mathop F\nolimits^{ - 1} (V)}\nolimits^{2} ]}}{1.9798} + \frac{{\mathop {(\mathop F\nolimits^{ - 1} (U))}\nolimits^{2} + \mathop {\mathop F\nolimits^{ - 1} (V)}\nolimits^{2} }}{2}} \right\}$$
(18)

where U and V were the marginal distribution functions of simulated Pe and ETc, respectively. Φ−1(·) was the inverse function of the standard normal distribution.

Table 7 The goodness-of-fit evaluation of the joint probability distribution for the simulated data series

3.4 Agricultural water resources scarcity risk analysis

Utilizing the joint probability distributions for measured and simulated data determined above, the natural agricultural water resources scarcity risk was analyzed and the effect of MCMP-Copula model on the results of risk analysis was revealed. On the basis of pf = 37.5% and Pk = 62.5%, the Pe and ETc could be divided into three states: when p (Pe or ETc) < 37.5%, Pe or ETc was high; when 37.5% < p (Pe or ETc) < 62.5%, Pe or ETc was normal; and when p (Pe or ETc) > 62.5%, Pe or ETc was poor. Table 8 shows the seven frequencies of Pe and ETc for measured and simulated data. Comparing the Pe and ETc values under different states for simulated or measured data, when Pe was high and ETc was normal or poor, there was no water shortages in the Sichuan Hilly Area, but in other situations, water shortages were likely. This also suggested that the water supply and demand usually was in an uncoordinated state under natural precipitation conditions. The agricultural water shortages only existed when Pe ≤ x and ETc ≥ y, therefore, the joint probability and return period were calculated under this condition (Fig. 8). As shown in Fig. 8, with an increase in ETc and a decrease in Pe, the risk probability gradually reduced and the return period gradually increased, indicating an extreme water shortage rarely occurred in the Sichuan Hilly Area of China. Moreover, comparing the results of risk assessment of water demand and supply for measured and simulated data, the risk probabilities were slightly greater and the return periods were shorter for simulated data.

Table 8 High and poor states division of Pe and ETc
Fig. 8
figure 8

Risk assessment of water demand and supply about a measured data and b simulated data, respectively (i.e., the 3D maps of joint probability distribution, contour maps of joint probability distribution and contour maps of return period for X ≤x and Y ≥y). Notes X referred to the Pe, Y referred to the ETc; P (·) presented the joint probability of X ≤x and Y ≥y, T (·) presented the return period of X ≤x and Y ≥y. The symbol “*” showed the state that Pe was poor and ETc was high; The symbol “ + ” showed the situation that irrigation was required

From Table 8, we can see that regardless of the measured and simulated Pe and ETc, when p (Pe) > 25% and p (ETc) < 62.5%, the Pe values (about 820 mm) was just less than ETc values (about 821 mm), so irrigation was needed in the Sichuan Hilly Area of China. The irrigation probability was 48.10% and the corresponding return period was 2.08 years for simulated data, whereas 47.08% and 2.12 years for measured data (Fig. 8). When Pe was poor (p (Pe) > 62.5%) and ETc was high (p (ETc) < 37.5%), it's more prone to water shortages in the Sichuan Hilly Area, therefore, the joint probability and return period were further analyzed in this scenario. The probability of Pe poor and ETc high was 15.51% and the corresponding return period was 6.45 years for simulated data, whereas the values were 14.32% and 6.98 years for measured data (Fig. 8). Therefore, the water shortages risk analysis based on the MCMP-Copula model was conservative and reliable in the Sichuan Hilly Area of China under natural precipitation conditions.

4 Discussion

4.1 Selection of the copula functions

The copula function can flexibly reveal the correlation among the random variables, and the optimal copula function selection is the key for the risk analysis of multivariable hydrology. The existing literature commonly selected three Archimedean copula functions to estimate the risk probability of natural water supply and demand. However, only Frank copula could describe the joint probability of water supply and demand as a result of the negative correlation of two-dimensional random variables (Ding et al. 2011; Zhang et al. 2017a, b, 2019). In our study, in order to achieve a proper copula method, five copula functions, including three Archimedean copulas (Gumbel-Hougaard, Frank and Clayton copula) and two elliptic copulas (Gaussian and Student t copula), were tested to describe the dependence between Pe and ETc. For the measured data, the optimal joint probability distribution was still Frank copula, but there was little discrepancy among the Frank, Gaussian, and Student t copula functions. For the simulated data, the three appropriate copula functions followed the order of Gaussian > Student t > Frank. Obviously, two added elliptic copula functions compensated for the deficiency that referred to the restriction of the range and structure types of variables dependence, and there was significant correlation between the empirical copula and theoretical copula. The differences in the optimal copula function for the measured and simulated data may be caused by the small sample error. This suggests that using the elliptic copula family to fit the joint probability distribution of natural water supply and demand was a reasonable choice.

4.2 Advantages of the stochastic simulation model

The simulated data series generated on the basis of the MCMP-Copula model had the same statistic features and dependence relationships as the measured data, indicating the application of the stochastic simulation model to analyze the risk of natural water supply and demand was reasonable. Focusing on the lower Pe and higher ETc that was more prone to water shortages, the changing trend of the dependence between Pe and ETc for simulated data was consistent with the measured data. There was a higher encounter probability and lower return period for the simulated data series compared with the measured data series, which revealed that the risk assessment based on the MCMP-Copula model was more conservative. In short, the MCMP-Copula model considered the contemporaneous and temporal dependence relationships of random variables and accurately calculated the risk probability of natural water supply and demand imbalances. The model also indicates the actual correlation between Pe and ETc, which is an effective measure to solve the water shortage problems.

When p (Pe) > 25% and p (ETc) < 62.5%, the Pe value was just less than ETc and irrigation was needed for crops growth in the Sichuan Hilly Area of China. The result showed the water supply and demand usually was in an uncoordinated state in the Sichuan Hilly Area. Zhang et al. (2017a) researched the risk probability of P and ET0 in the Luhun irrigation district, and reported a mismatch between water supply and demand that exceeded what was reported in our study. According to the risk assessment chart, the water shortages will most likely occur in the Sichuan Hilly Area, but the extreme water scarcity is just not going to happen because the risk probability decreased and the return period increased following increasing ETc and decreasing Pe. Due to the relatively abundant precipitation in the Sichuan Hilly Area, there is barely frequent and persistent water resources shortage.

Under the background of the global climate change, the climate and agricultural resources in Southwest China have also changed. There was a severe drought from winter 2009 to spring 2010 in Southwest China which was caused by abnormal high temperature and a persistent lack of precipitation (Feng et al. 2013; Yan et al. 2013). Liu and Zhang (2013) put forward that the ET0 began to increase from 1993 based on the data from 80 meteorological stations in Southwest China. Jiang et al. (2019) also found that ET0 and temperature during the growing season revealed a significant increasing trend from 1961 to 2016 in Southwest China, yet relative humidity, wind speed and sunshine duration hours were significant declining, which infers the crops will consume more water. Therefore, the stochastic simulation model (MCMP-Copula), which reflected the transformation laws and statistic features of the measured Pe and ETc, increased the risk assessment probability of natural water resources supply and demand and was an effective and reliable way to cope with the standing climate variation.

4.3 Limitations and future studies

In the study, the stochastic simulation model (MCMP-Copula) achieved accurate and conservative results of risk assessment under natural water supply and demand, whereas in reality, the water supply also includes the artificial irrigation. Thus, utilizing three-dimensional copulas to construct the joint probability distribution model of water resources supply (precipitation and irrigation) and demand (crop water requirements) is of great significance for the formulation of irrigation schedule and the optimal allocation of regional water resources. In addition, research on conditional joint probability as well as conditional return period can further perform the risk analysis of concrete situations for water shortages, such as the corresponding value of water supply (precipitation or artificial irrigation) when crop water demand is high state, and its application can provide a practical guidance for water resources allocation. In our study, the wheat-rice was considered as a typical cropping system to analyze the water shortages risk, while single cropping pattern always is not economic in practice. Therefore, to make full use of precipitation as well as optimize industrial structure and agriculture distribution, more crops should be attracted attention, and finally achieve the objective of risk reduction for water shortages. Due to the seasonal drought in Southwest China, the risk of natural water shortages during different growth periods of each crop needs to be assessed, and then the optimal allocation of water resources can be achieved by some engineering measures.

5 Conclusions

In the study, a stochastic simulation model (MCMP-Copula) which comprehensively considered the contemporaneous and temporal dependence relationships of measured ETc and Pe series by Monte Carlo, Copula and Markov process was presented to perform natural water resources risk analysis in the Sichuan Hilly Area of China. Taking account of the full growth period of wheat-rice from October 9th of the first year (wheat sowing) to September 10th of the next year (rice harvest) over 1961–2017 as the study period, the conclusions were as follows:

  1. (1)

    On the basis of the measured data series of Pe and ETc, the marginal distributions were determined, and the optimal copula functions were selected. For the contemporaneous dependence relationship of measured Pe and ETc, the Frank copula was the best one with the parameter θ of − 0.0925, which also presented the optimal joint probability distribution of measured Pe and ETc. The Gaussian copula was the optimal selection for the temporal dependence relationship of measured Pe or ETc series with the parameter ρ of 0.0185 and 0.3784, respectively.

  2. (2)

    The statistical characteristics and dependence relationships of the simulated Pe and ETc data series generated based on the MCMP-Copula model were identical with those of the measured data, which indicated the simulated data based on the MCMP-Copula were reliable. The Gaussian copula was identified as the optimal joint probability distribution for the simulated Pe and ETc with the R2 of 0.9986.

  3. (3)

    According to the joint probability analysis of Pe ≤ x and ETc  ≥  y, the conclusion can be drawn that there could not be an extreme water shortage in the Sichuan Hilly Area of China. The irrigation probability and return period were 48.10% and 2.08 years for simulated data, whereas 47.08% and 2.12 years for measured data. The encounter probability and return period of Pe poor-ETc high for simulated data were 15.51% and 6.45 years, whereas 14.32% and 6.98 years for measured data. These results indicated that the water resources risk analysis of simulated data was conservative, which could provide a more secure and reliable theoretical guidance for the risk managers.