Abstract
The reliability and accuracy of a risk assessment of extreme hydro-meteorological events are highly dependent on the quality of the historical rainfall time series data. However, missing data in a time series such as this could result in lower quality data. Therefore, this paper proposes a multiple-imputation algorithm for treating missing data without requiring information from adjoining monitoring stations. The proposed imputation algorithms are based on the M-component probabilistic principal component analysis model and an expectation maximisation algorithm (MPPCA-EM). In order to evaluate the effectiveness of the MPPCA-EM imputation algorithm, six distinct historical daily rainfall time series data were recorded from six monitoring stations. These stations were located at the coastal and inland regions of the East-Coast Economic Region (ECER) Malaysia. The results of analysis show that, when it comes to treating missing historical daily rainfall time series data recorded from coastal monitoring stations, the 2-component probabilistic principal component analysis model and expectation-maximisation algorithm (2PPCA-EM) were found to be superior to the single- and multiple-imputation algorithms proposed in previous studies. On the contrary, the single-imputation algorithms as proposed in previous studies were superior to the MPPCA-EM imputation algorithms when treating missing historical daily rainfall time series data recorded from inland monitoring stations.
Similar content being viewed by others
Change history
08 April 2024
The Denotation of equation 16 has been corrected.
27 March 2024
A Correction to this paper has been published: https://doi.org/10.1007/s13143-024-00363-7
References
Agilan, V., Umamahesh, N.V.: Is the covariate based non-stationary rainfall IDF curve capable of encompassing future rainfall changes? J. Hydrol. 541(B), 1441–1455 (2016)
Burhanuddin, S.N.Z.A., Deni, S.N., Ramli, N.M.: Imputation of missing rainfall data using revised normal ratio method. Adv. Sci. Lett. 23(11), 10981–10985 (2017a)
Burhanuddin, S.N.Z.A., Deni, S.N., Ramli, N.M.: Normal ratio in multiple imputation based on bootstrapped sample for rainfall data with missingness. International Journal of GEOMATE. 13(36), 131–137 (2017b)
Cai, W., Santoso, A., Wang, G., Weller, E., Wu, L., Ashok, K., Masumoto, Y., Yamagata, T.: Increased frequency of extreme Indian Ocean dipole events due to greenhouse warming. Nature. 510(7504), 254–258 (2014)
Chuan, Z.L., Senawi, A., Yusoff, W.N.S.W., Ismail, N., Ken, T.L., Chuan, M.W.: Identifying the ideal number of Q-component of the Bayesian principal component analysis model for missing precipitation data treatment. IJET. 7(4.30), 5–10 (2018a)
Chuan, Z.L., Ismail, N., Shinyie, W.L., Ken, T.L., Fam, S.-L., Senawi, A., Yusoff, W.N.S.W.: W.N.S.W.: The efficiency of average linkage hierarchical clustering algorithm associated multi-scale bootstrap resampling in identifying homogeneous precipitation catchments. IOP Conf. Ser.:Mater. Sci. Eng. 342(1), 012070 (2018b). https://doi.org/10.1088/1757-899X/342/1/012070.
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. JRSS B. 39(1), 1–38 (1977)
Mardani, A., Zavadskas, E.K., Govindan, K., Senin, A.A., Jusoh, A.: VIKOR technique: a systematic review of the state of the art literature on methodologies and applications. Sustainability. 8(1), 1–38 (2016)
Masseran, N., Razali, A.M., Ibrahim, K.: Application of single imputation method to estimate missing wind speed data in Malaysia. Res. J. Appl. Sci. Eng. Technol. 6(10), 1780–1784 (2013)
Mondal, W.I.: An analysis of the industrial development potential of Malaysia: a shift-share approach. JBER. 7(5), 41–46 (2009)
Opricovic, S.: Multicriteria Optimization of Civil Engineering Systems. University of Belgrade, Serbia (1998)
Saeed, G.A.A., Chuan, Z.L., Zakaria, R., Yusoff, W.N.S.W., Salleh, M.Z.: Determination of the best single imputation algorithm for missing rainfall data treatment. JQMA. 12(1–2), 79–87 (2016)
Simanton, J.R., Osborn, H.B.: Reciprocal-distance estimate of point rainfall. J. Hydraul. Eng. 106, 1242–1246 (1980)
Suhaila, J., Sayang, M.D., Jemain, A.A.: Revised spatial weighting methods for estimation of missing rainfall data. Asia-Pac. J. Atmos. Sci. 44(2), 93–104 (2008)
Tabios, G., Salas, J.D.: A comparative analysis of techniques for spatial interpolation of precipitation. Water Resour. Bull. 21(3), 365–380 (1985)
Tang, W.Y., Kassim, A.H.M., Abubakar, S.H.: Comparative studies of various missing data treatment methods-Malaysia experience. Atmos. Res. 42(1–4), 247–262 (1996)
Tangang, F.T., Juneng, L., Salimun, E., Sei, K.M., Le, L.J., Muhammad, H.: Climate change and variability over Malaysia: gaps in science and research information. Sains Malaysiana. 41(11), 1355–1366 (2012)
Teegavarapu, R.S.V., Chandramouli, V.: Improved weighting methods, deterministic and stochastic data driven models for estimation of missing precipitation records. J. Hydrol. 312(1–4), 191–206 (2005)
Tipping, M.E., Bishop, C.M.: Probabilistic principal component analysis. JRSS B. 61(3), 611–622 (1999)
Villafuerte, M.Q., Matsumoto, J.: Significant influences of global mean temperature and ENSO on extreme rainfall in Southeast Asia. J. Clim. 28(5), 1905–1919 (2015)
Yu, L., Snapp, R.R., Ruiz, T., Radermacher, M.: Probabilistic principal component analysis with expectation maximization (PPCA-EM) facilitates volume classification and estimates the missing data. J. Struct. Biol. 171(1), 18–30 (2010)
Acknowledgements
The authors would like to thank the Department of Irrigation and Drainage (DID), Malaysia, for providing the historical rainfall time series data in this study. The authors also acknowledge This appreciation also extended to Ministry of Education Malaysia and Universiti Malaysia Pahang (UMP) for providing the FRGS grant RDU190134, flagship research grant RDU150393, and the internal research grant RDU1703184 to conduct this study.
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible Editor: Edvin Aldrian.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Chuan, Z.L., Deni, S.M., Fam, SF. et al. The Effectiveness of a Probabilistic Principal Component Analysis Model and Expectation Maximisation Algorithm in Treating Missing Daily Rainfall Data. Asia-Pacific J Atmos Sci 56, 119–129 (2020). https://doi.org/10.1007/s13143-019-00135-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13143-019-00135-8