Skip to main content
Log in

The Effectiveness of a Probabilistic Principal Component Analysis Model and Expectation Maximisation Algorithm in Treating Missing Daily Rainfall Data

  • Original Article
  • Published:
Asia-Pacific Journal of Atmospheric Sciences Aims and scope Submit manuscript

A Publisher Correction to this article was published on 27 March 2024

This article has been updated

Abstract

The reliability and accuracy of a risk assessment of extreme hydro-meteorological events are highly dependent on the quality of the historical rainfall time series data. However, missing data in a time series such as this could result in lower quality data. Therefore, this paper proposes a multiple-imputation algorithm for treating missing data without requiring information from adjoining monitoring stations. The proposed imputation algorithms are based on the M-component probabilistic principal component analysis model and an expectation maximisation algorithm (MPPCA-EM). In order to evaluate the effectiveness of the MPPCA-EM imputation algorithm, six distinct historical daily rainfall time series data were recorded from six monitoring stations. These stations were located at the coastal and inland regions of the East-Coast Economic Region (ECER) Malaysia. The results of analysis show that, when it comes to treating missing historical daily rainfall time series data recorded from coastal monitoring stations, the 2-component probabilistic principal component analysis model and expectation-maximisation algorithm (2PPCA-EM) were found to be superior to the single- and multiple-imputation algorithms proposed in previous studies. On the contrary, the single-imputation algorithms as proposed in previous studies were superior to the MPPCA-EM imputation algorithms when treating missing historical daily rainfall time series data recorded from inland monitoring stations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Change history

References

  • Agilan, V., Umamahesh, N.V.: Is the covariate based non-stationary rainfall IDF curve capable of encompassing future rainfall changes? J. Hydrol. 541(B), 1441–1455 (2016)

    Article  Google Scholar 

  • Burhanuddin, S.N.Z.A., Deni, S.N., Ramli, N.M.: Imputation of missing rainfall data using revised normal ratio method. Adv. Sci. Lett. 23(11), 10981–10985 (2017a)

    Article  Google Scholar 

  • Burhanuddin, S.N.Z.A., Deni, S.N., Ramli, N.M.: Normal ratio in multiple imputation based on bootstrapped sample for rainfall data with missingness. International Journal of GEOMATE. 13(36), 131–137 (2017b)

    Google Scholar 

  • Cai, W., Santoso, A., Wang, G., Weller, E., Wu, L., Ashok, K., Masumoto, Y., Yamagata, T.: Increased frequency of extreme Indian Ocean dipole events due to greenhouse warming. Nature. 510(7504), 254–258 (2014)

    Article  CAS  Google Scholar 

  • Chuan, Z.L., Senawi, A., Yusoff, W.N.S.W., Ismail, N., Ken, T.L., Chuan, M.W.: Identifying the ideal number of Q-component of the Bayesian principal component analysis model for missing precipitation data treatment. IJET. 7(4.30), 5–10 (2018a)

    Article  Google Scholar 

  • Chuan, Z.L., Ismail, N., Shinyie, W.L., Ken, T.L., Fam, S.-L., Senawi, A., Yusoff, W.N.S.W.: W.N.S.W.: The efficiency of average linkage hierarchical clustering algorithm associated multi-scale bootstrap resampling in identifying homogeneous precipitation catchments. IOP Conf. Ser.:Mater. Sci. Eng. 342(1), 012070 (2018b). https://doi.org/10.1088/1757-899X/342/1/012070.

    Article  Google Scholar 

  • Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. JRSS B. 39(1), 1–38 (1977)

    Google Scholar 

  • Mardani, A., Zavadskas, E.K., Govindan, K., Senin, A.A., Jusoh, A.: VIKOR technique: a systematic review of the state of the art literature on methodologies and applications. Sustainability. 8(1), 1–38 (2016)

    Article  Google Scholar 

  • Masseran, N., Razali, A.M., Ibrahim, K.: Application of single imputation method to estimate missing wind speed data in Malaysia. Res. J. Appl. Sci. Eng. Technol. 6(10), 1780–1784 (2013)

    Article  Google Scholar 

  • Mondal, W.I.: An analysis of the industrial development potential of Malaysia: a shift-share approach. JBER. 7(5), 41–46 (2009)

    Google Scholar 

  • Opricovic, S.: Multicriteria Optimization of Civil Engineering Systems. University of Belgrade, Serbia (1998)

    Google Scholar 

  • Saeed, G.A.A., Chuan, Z.L., Zakaria, R., Yusoff, W.N.S.W., Salleh, M.Z.: Determination of the best single imputation algorithm for missing rainfall data treatment. JQMA. 12(1–2), 79–87 (2016)

    Google Scholar 

  • Simanton, J.R., Osborn, H.B.: Reciprocal-distance estimate of point rainfall. J. Hydraul. Eng. 106, 1242–1246 (1980)

    Google Scholar 

  • Suhaila, J., Sayang, M.D., Jemain, A.A.: Revised spatial weighting methods for estimation of missing rainfall data. Asia-Pac. J. Atmos. Sci. 44(2), 93–104 (2008)

    Google Scholar 

  • Tabios, G., Salas, J.D.: A comparative analysis of techniques for spatial interpolation of precipitation. Water Resour. Bull. 21(3), 365–380 (1985)

    Article  Google Scholar 

  • Tang, W.Y., Kassim, A.H.M., Abubakar, S.H.: Comparative studies of various missing data treatment methods-Malaysia experience. Atmos. Res. 42(1–4), 247–262 (1996)

    Article  Google Scholar 

  • Tangang, F.T., Juneng, L., Salimun, E., Sei, K.M., Le, L.J., Muhammad, H.: Climate change and variability over Malaysia: gaps in science and research information. Sains Malaysiana. 41(11), 1355–1366 (2012)

    Google Scholar 

  • Teegavarapu, R.S.V., Chandramouli, V.: Improved weighting methods, deterministic and stochastic data driven models for estimation of missing precipitation records. J. Hydrol. 312(1–4), 191–206 (2005)

    Article  Google Scholar 

  • Tipping, M.E., Bishop, C.M.: Probabilistic principal component analysis. JRSS B. 61(3), 611–622 (1999)

    Article  Google Scholar 

  • Villafuerte, M.Q., Matsumoto, J.: Significant influences of global mean temperature and ENSO on extreme rainfall in Southeast Asia. J. Clim. 28(5), 1905–1919 (2015)

    Article  Google Scholar 

  • Yu, L., Snapp, R.R., Ruiz, T., Radermacher, M.: Probabilistic principal component analysis with expectation maximization (PPCA-EM) facilitates volume classification and estimates the missing data. J. Struct. Biol. 171(1), 18–30 (2010)

    Article  Google Scholar 

Download references

Acknowledgements

The authors would like to thank the Department of Irrigation and Drainage (DID), Malaysia, for providing the historical rainfall time series data in this study. The authors also acknowledge This appreciation also extended to Ministry of Education Malaysia and Universiti Malaysia Pahang (UMP) for providing the FRGS grant RDU190134, flagship research grant RDU150393, and the internal research grant RDU1703184 to conduct this study.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zun Liang Chuan.

Additional information

Responsible Editor: Edvin Aldrian.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chuan, Z.L., Deni, S.M., Fam, SF. et al. The Effectiveness of a Probabilistic Principal Component Analysis Model and Expectation Maximisation Algorithm in Treating Missing Daily Rainfall Data. Asia-Pacific J Atmos Sci 56, 119–129 (2020). https://doi.org/10.1007/s13143-019-00135-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13143-019-00135-8

Keywords

Navigation