A novel Hybrid Wavelet-Locally Weighted Linear Regression (W-LWLR) Model for Electrical Conductivity (EC) Prediction in Surface Water

https://doi.org/10.1016/j.jconhyd.2020.103641Get rights and content

Highlights

  • Development of a hybrid wavelet-locally weighted linear regression (W-LWLR) to forecast the monthly EC levels of rivers.

  • Comparing the W-LWLR model with the W-SVR, W-ARIMA, standalone LWLR, SVR and ARIMA techniques.

  • Significant accuracy improvement in the LWLR method in coupled with DWT for predicting the EC water quality parameters.

Abstract

Rivers are the most common and vital sources of water, which play a fundamental role in ecological systems and human life. Water quality assessment is a major element of managing water resources and accurate prediction of water quality is very essential for better management of rivers. The electrical conductivity (EC) is known as one of the most important water quality parameters to predict salinity and mineralization of water. The present study introduces a novel hybrid wavelet-locally weighted linear regression (W-LWLR) method to predict the monthly EC of the Sefidrud River in Iran. 240 monthly discharge (Q) and EC samples, over a period of 20 years, were collected. The data were divided into two frequency components at two decomposition levels using the mother wavelet Bior 6.8. To compare the performance of various methods, the standalone LWLR, support vector regression (SVR), wavelet support vector regression (W-SVR), autoregressive integrated moving average (ARIMA), wavelet ARIMA (W-ARIMA), multivariate linear regression (MLR), and wavelet MLR (W-MLR) were also used. The discrete wavelet transform (DWT) was coupled with the LWLR, SVR, and ARIMA to create the W-LWLR, W-SVR, W-ARIMA methods to predict the EC parameter. The comparisons demonstrated that the W-LWLR was more accurate and efficient than the LWLR, SVR, W-SVR, ARIMA, and W-ARIMA methods. The correlation coefficient (R) values were 0.973, 0.95, 0.565, 0.473, 0.425, 0.917 for the W-LWLR, W-SVR, LWLR, SVR, ARIMA, and W-ARIMA methods, respectively. Further, the root mean square error (RMSE) of W-LWLR was 89.78, while the corresponding values for W-SVR, LWLR, SVR, ARIMA, W-ARIMA, MLR, and W-MLR were 123.50, 319.95, 341.20, 350.153, 155.292, 351.774, and 157.856 respectively. The overall comparison metrics and error analysis demonstrated the superiority of the new proposed W-LWLR method for water quality prediction.

Introduction

Surface water resources, such as rivers, streams, lakes, and reservoirs are the most vitally important water sources for drinking, irrigation in agriculture, mining, and industrial purposes. Hence, considering the lack of fresh surface water resources, water quality monitoring and control are the most important strategies to obtain the information that leads to decision making and knowledge about water contamination and the spatial and temporal variations. Electrical conductivity (EC) based salinity is one of the most significant water quality parameters to determine the suitability of water for drinking and irrigation purposes (Kumarasamy et al., 2014). EC is normally measured in a unit of microsiemens per centimeter (μS/cm) (Heydari et al., 2013). Since EC is dominated by total dissolved solid (TDS) and is directly related to dissolved ionic solutes such as sodium (Na+), chloride (Cl), magnesium (Mg+2), sulfate (SO4−2), and calcium (Ca+2) in water, it can be an indicator of pollutants in surface water. The increased ionic composition of water has a significant influence on plant growth and can reduce the quality of drinking water.

In surface water quality classifications, EC is a main measure of the salinity hazard for irrigation and drinking water. The U.S. Department of Agriculture (Wilcox, 1948) and the World Health Organization (2008) classify surface water quality based on the EC-sodium concentration of water and EC, respectively. The EC of freshwater usually varies between 0 and 1500 μS/cm, while the EC of sea water can be as high as 50,000 μS/cm. According to the Wilcox EC-based classification for irrigation water, 0–750 mS/cm, 750–2000 mS/cm, and >2000 mS/cm are respectively classified as fine, allowable, and unacceptable (Wilcox, 1948). The EC level higher than 10,000 μS/cm is not suitable for human consumption or irrigation. The maximum permissible EC value recommended by the World Health Organization (WHO) for drinking water is 1400 μS/cm (1993).

Nowadays, prediction of water quality parameters such as TDS, EC and turbidity in water is among the challenging issues in water resources management. Multivariate statistics and time series analysis have been widely used for water quality prediction. The commonly-used methods include moving average (MA), autoregressive moving average (ARMA), autoregressive integrated moving average (ARIMA), and multiple linear regression (MLR) (Çamdevýren et al., 2005; Civelekoglu et al., 2007). However, the aforementioned traditional models may not be able to provide proper predictions due to the lack of reliable instruments to collect observation data for a given timeframe, the complexity of effective factors in forecasting, and the inability to capture non-stationarity and nonlinearity of the water quality parameters (Deng et al., 2015).

In recent decades, the artificial intelligence (AI) method has been successfully utilized in various aspects of environmental engineering such as water quality prediction, which is able to overcome the drawback of the traditional methods. The artificial neural network (ANN) and the adaptive neuro-fuzzy inference system (ANFIS) techniques have been widely employed to predict nocturnal dynamics of dissolved oxygen (DO) in aquatic systems (Karakaya et al., 2013); to forecast TDS, EC, and turbidity in rivers (Najah et al., 2009); to estimate DO and specific conductance as two water quality parameters in rivers (Heydari et al., 2013); to assess EC, sodium absorption ratio (SAR), and total hardness (TH) in rivers (Azad et al., 2018); and to predict the chemical concentrations in rivers (Mahmoodabadi and Arshad, 2018). In recent years, hybrid support vector regression (SVR) and shuffled frog leaping algorithm (SFLA) were employed to forecast eight water quality parameters including Na+, K+, Mg+2, SO4−2, Cl, pH, EC, and TDS (Mahmoudi et al., 2016). A deep learning predictive model was developed to model the DO levels of the reservoir (Banerjee et al., 2019). The least square support vector machine (LSSVM), multivariate adaptive regression splines (MARS), and M5 model tree (M5Tree) were used to assess free ammonia (AMM), total Kjeldahl nitrogen (TKN), water temperature (WT), total coliform (TC), fecal coliform (FC), and pH (Kisi and Parmar, 2016). Comparing the overall results indicated that the MARS and LSSVM models had better performance in the accuracy than other methods. More recently, artificial neural network (ANN) has been widely applied for surface water quality modeling. For instance, have conducted various studies on the DO concentration modeling using different AI based models developed two nonlinear predictive models, namely modified response surface method (MRSM) and multilayer perceptron neural network (MLPNN) to model the daily dissolved oxygen concentration Applied three data-driven methods such as ANN, ANFIS, SVM, ARIMA and three different ensemble techniques i.e., simple average ensemble (SAE), weighted average ensemble (WAE) and neural network ensemble (NNE) to forecast single and multi-step ahead modeling of DO at river water. Provided a model to assess the DO of water using the polynomial chaos expansion approach. Further, some researchers focused on different applicable aspects of ANN methods to forecast the water quality indexes.

Recently, hybrid discrete wavelet transform (DWT) and robust artificial intelligence have been used to estimate water quality parameters based on limited time series data. In particular, the hybrid wavelet-artificial neural network (WANN) techniques have been applied to predict the EC of river water (Ravansalar and Rajaee, 2015). Specifically, the original time series of monthly EC and discharge (Q) values were decomposed by using the DWT and then coupled with the ANN model. The results indicated that the WANN model enhanced the modeling accuracy. Montaseri et al.(Montaseri et al., 2018) utilized wavelet-ANFIS, wavelet-GEP, and other hybrid AI models to forecast TDS based on EC, Na+, Cl under various climatic conditions. Rajaee et al. (Rajaee et al., 2018) applied wavelet-multiple linear regression (WMLR) and WANN to predict daily pH levels with the original time series of pH and discharge Q. Barzegar et al. (Barzegar et al., 2016) used ANN, ANFIS, wavelet-ANN, and wavelet-ANFIS to assess water salinity levels based on different subsets of monthly Ca+2, Mg+2, Na+, SO4−2, and Cl in rivers. Barzegar et al. (Barzegar et al., 2018) simulated multi-step-ahead EC by a hybrid wavelet-extreme learning machine (WA-ELM) model and compared with an adaptive neuro-fuzzy inference system (ANFIS).

The aim of this study is to investigate the influence of the DWT in the LWLR method to predict monthly EC levels in the Sefidrud River, Iran based on multiple hydrological data (including EC and discharge). To the best of our knowledge, the hybrid wavelet-locally weighted linear regression (W-LWLR) method has not been utilized before to forecast the water quality parameters. Also, few studies have been conducted to use the LWLR as a data-driven method (Ahmadianfar et al., 2019; Jamei and Ahmadianfar, 2019). In this research, the W-LWLR, W-SVR, W-ARIMA, and W-MLR models are developed to assess monthly EC levels and the results are compared with those from the original LWLR, SVR, ARIMA, and MLR models.

Section snippets

Support vector machine

The SVM is a machine learning method based on the theory of statistical learning and the structural risk minimization principle, which was firstly introduced by Cortes and Vapnik (Vapnik and Cortes, 1995). SVM has been widely applied for classification and regression, which usually outperforms traditional methods used in the previous studies (Huang et al., 2002; Sarzaeim et al., 2017). The SVR is the utilization of SVM for regression. Various types of kernel function such as exponential radial

Results and discussion

Tables 3–6 show the statistical metrics obtained for the LWLR, SVR, ARIMA, and MLR models, respectively. As indicated in these tables, for the LWLR combination 3 is the best one, which contains ECt−1, ECt−2, ECt−3, Qt, Qt−1, and Qt−2 with an average rank of 2.335 (2 and 2.67 for the training and testing phases, respectively). The best input combination for the SVR is combination 3 with an average rank of 3 (2 and 4 for the training and testing phases, respectively). For the ARIMA model, the

Conclusions

In this research, for the first time, the LWLR model was developed to predict the water quality parameters. Particularly, the LWLR was coupled with the discrete wavelet transform to forecast the monthly EC levels of surface water. In order to evaluate the performance and the ability of the models, they were compared with the SVR, W-SVR, ARIMA, W-ARIMA, MLR, and W-MLR models. In the model development, 240 observed monthly river discharge and water EC sample data from the Astane Station were

Funding

This study was funded by Vice-Chancellor for Research and Technology, Shohadaye Hoveizeh University of Technology, (project code. IR-Civ-SHHUT97-200-3).

Declaration of Competing Interest

We are the authors and confirm that there is no conflict of interest.

References (51)

  • V. Nourani et al.

    Applications of hybrid wavelet–artificial intelligence models in hydrology: a review

    J. Hydrol.

    (2014)
  • M. Ravansalar et al.

    A wavelet–linear genetic programming model for sodium (Na+) concentration forecasting in rivers

    J. Hydrol.

    (2016)
  • Y.-F. Sang

    A review on the applications of wavelet transform in hydrology time series analysis

    Atmos. Res.

    (2013)
  • H. Wang et al.

    Multiple linear regression modeling for compositional data

    Neurocomputing

    (2013)
  • J. Wang et al.

    Locally weighted linear regression for cross-lingual valence-arousal prediction of affective words

    Neurocomputing

    (2016)
  • I. Ahmadianfar et al.

    Prediction of local scour around circular piles under waves using a novel artificial intelligence approach

    Mar. Georesour. Geotechnol.

    (2019)
  • C.G. Atkeson et al.

    Locally Weighted Learning for Control

    (1997)
  • A. Azad

    Prediction of water quality parameters using ANFIS optimized by intelligence algorithms (case study: Gorganrood River)

    KSCE J. Civ. Eng.

    (2018)
  • R. Barzegar et al.

    Application of wavelet-artificial intelligence hybrid models for water quality prediction: a case study in Aji-Chay River, Iran

    Stoch. Env. Res. Risk A.

    (2016)
  • R. Barzegar et al.

    Multi-step water quality forecasting using a boosting ensemble multi-wavelet extreme learning machine model

    Stoch. Env. Res. Risk A.

    (2018)
  • K. Budu

    Comparison of wavelet-based ANN and regression models for reservoir inflow forecasting

    J. Hydrol. Eng.

    (2013)
  • T. Chai et al.

    Root mean square error (RMSE) or mean absolute error (MAE)?–arguments against avoiding RMSE in the literature

    Geosci. Model Dev.

    (2014)
  • G. Civelekoglu et al.

    Prediction of bromate formation using multi-linear regression and artificial neural networks

    Ozone Sci. Eng.

    (2007)
  • D.Ö. Faruk

    A hybrid neural network and ARIMA model for water quality time series prediction

    Engineering applications of artificial intelligence

    (2010)
  • M. Heydari et al.

    Development of a neural network technique for prediction of water quality parameters in the Delaware River, Pennsylvania

    Middle-East J. Sci. Res.

    (2013)
  • Cited by (57)

    • Inverse groundwater salinization modeling in a sandstone's aquifer using stand-alone models with an improved non-linear ensemble machine learning technique

      2022, Journal of King Saud University - Computer and Information Sciences
      Citation Excerpt :

      Similarly, Tutmez et al., (2006) simulated the EC in the Southern part of Turkey using the ANFIS model, and their finding (R2 = 0.97) were less accurate than our study. We also compared our best standalone model with that of Ahmadianfar et al., (2020) and found that his best goodness-of-fit was 0.97, indicating our work's superiority. The implementation of single data intelligent algorithms has received remarkable attention but, in several instances, was found less accurate owing to various reasons.

    View all citing articles on Scopus
    View full text