Abstract
The ability to predict the risk of water shortage is critical, and therefore it is important to develop methods of parameter estimation for statistical models in situations when insufficient data are available. Based on the maximum entropy principle, this paper proposes an alternative method of parameter estimation for a logistic regression model in the case of small sample numbers. The new method requires very little data about risk factors, whereas the maximum likelihood estimation requires a high quantity of data regarding risk and risk factors. In addition, the paper applies a new formula for normalized information flow (information flow is a physical notion logically associated with causality, which can be used to quantify the cause–effect relation between dynamic events) to select important risk factors. Five experiments are performed based on predictions of water shortage risk in the Beijing–Tianjin–Tangshan region to validate the performance of the new method with different small sample sizes. The results show that the new method is generally reliable and performs much better than the maximum likelihood estimation when only small samples are used. Specifically, an improvement of between 87.9 and 95.3% is observed when the number of samples is more than 15 and less than 30. The new method still generates an acceptable result using only 10 samples, while the maximum likelihood estimation is unreliable in such situations.
Similar content being viewed by others
References
Adamowski J, Adamowski K, Prokoph A (2013) A spectral analysis based methodology to detect climatological influences on daily urban water demand. Math Geol 45(1):48–68
Bai C, Zhang R, Hong M, Qian L, Wang Z (2015) A new information diffusion modelling technique based on vibrating string equation and its application in natural disaster risk assessment. Int J Gen Syst 44(5):601–614
Bai C, Zhang R, Bao S, Liang XS, Guo W (2018) Forecasting the tropical cyclone genesis over the northwest Pacific through identifying the causal factors in cyclone–climate interactions. J Atmos Ocean Technol 35(2):247–259
Carle SF, Fogg GE (1996) Transition probability-based indicator geostatistics. Math Geosci 28(4):453–476
Coron C, Calenge C, Giraud C, Julliard R (2018) Bayesian estimation of species relative abundances and habit preferences using opportunistic data. Environ Ecol Stat 25(1):71–93
Davis JC (1977) Estimation of the probability of success in petroleum exploration. J Int Assoc Math Geol 9(4):409–427
Feng LH, Luo GY (2011) Practical research on fuzzy risk of water resources in Jinhua City, China. Math Geosci 43(1):121–132
Fujinawa Y (1991) A method for estimating earthquake occurrence probability using first- and multiple-order Markov chain models. Nat Hazards 4(1):7–22
Goovaerts P (1994) Comparative performance of indicator algorithms for modeling conditional probability distribution functions. Math Geol 26(3):389–411
Haimes YY (2009) On the complex definition of risk: a systems-based approach. Risk Anal 29(12):1647–1654
Huang CF (1997) Principle of information diffusion. Fuzzy Sets Syst 91(1):69–90
Jia X, Li C, Cai Y, Wang X, Sun L (2015) An improved method for integrated water security assessment in the Yellow River basin, China. Stoch Environ Res Risk Assess 29(8):2213–2227
Jiang R, Yu X, Xie J, Zhao Y, Li F, Yang M (2018) Recent changes in daily climate extremes in a serious water shortage metropolitan region, a case study in Jing-Jin-Ji of China. Theor Appl Climatol 134(1–2):565–584
Jones GA, Jones JM (2000) Information and coding theory. Springer, London
Larsen K, Petersen JH, Budtz-Jørgensen E, Endahl LA (2015) Interpreting parameters in the logistic regression model with random effects. Biometrics 56(3):909–914
Liang XS (2014) Unraveling the cause-effect relation between time series. Phys Rev E 90:052150–1–052150-11
Liang XS (2015) Normalizing the causality between time series. Phys Rev E 92:022126
Liang XS (2016) Information flow and causality as rigorous notions ab initio. Phys Rev E 94:052201
Mackenzie AC (2014) Summarizing risk using risk measures and risk indices. Risk Anal 34(12):2143–2162
Qian L, Wang H, Zhang K (2014) Evaluation criteria and model for risk between water supply and water demand and its application in Beijing. Water Resour Manag 28:4433–4447
Qian L, Zhang R, Hong M, Wang H, Yang L (2016) A new multiple integral model for water shortage risk assessment and its application in Beijing, China. Nat Hazards 80(1):43–67
Qian L, Wang H, Dang S, Wang C, Jiao Z, Zhao Y (2018) Modelling bivariate extreme precipitation distribution for data scarce regions using Gumbel–Hougaard copula with maximum entropy estimation. Hydrol Process 32:212–227
Singh VP (1997) The use of entropy in hydrological and water resources. Hydrol Process 11:587–626
Tidwell VC, Cooper JA, Silva CJ (2005) Threat assessment of water supply systems using Markov latent effects modeling. J Water Resour Plan Manag 131(3):218–227
Udevitz MS, Bloomfield P, Apperson CS (1987) Prediction of occurrence of four species of Mosquito Larvae with logistic regression on water-chemistry variables. Environ Entomol 16(1):281–285
Vanwindekens FM, Gobin A, Curnel A, Planchon V (2018) New approach for mapping the vulnerability of agroecosystems based on expert knowledge. Math Geosci 50(6):679–696
Yerel S, Anagun AS (2010) Assessment of water quality observation stations using cluster analysis and ordinal logistic regression technique. Int J Environ Pollut 42(4):344–358
Yu PS, Yang TC, Kuo CM, Wang YT (2015) Systematic quantitative risk analysis of water shortage mitigation projects considering climate change. Water Resour Manag 29(4):1067–1081
Zhang Q, Zhang J, Yan D, Bao Y (2013) Dynamic risk prediction based on discriminant analysis for maize drought disaster. Nat Hazards 65:1275–1284
Zhang D, Agterberg F, Cheng Q, Zuo R (2014) A comparison of modified fuzzy weights of evidence, fuzzy weights of evidence, and logistic regression for mapping mineral prospectivity. Math Geosci 46(7):869–885
Zheng J, Wu W, Hu X, He F, Wang D, Man Z, Zhang S, Zhao J, Li L (2011) Integrated risk governance-comprehensive energy and water resources risk in China. Science Press, Beijing
Acknowledgements
This study was supported by the National Natural Science Foundation of China (Grant Nos. 51609254, 51879010 and 51479003).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Qian, L., Wang, H., Bai, C. et al. A New Parameter Estimation Method for a Logistic Regression Model of Water Shortage Risk in the Case of Small Sample Numbers. Math Geosci 52, 929–944 (2020). https://doi.org/10.1007/s11004-019-09824-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11004-019-09824-6