Elsevier

Environmental Pollution

Volume 269, 15 January 2021, 116166
Environmental Pollution

Monitoring urban black-odorous water by using hyperspectral data and machine learning

https://doi.org/10.1016/j.envpol.2020.116166Get rights and content

Highlights

  • Hyperspectral bands provide unique info concerning severity of black-odorous water.

  • Factorial data mining synthesizes water quality parameters into reduced dimensions.

  • Reduced dimensions establish correlation between water quality and spectral data.

  • Machine learning based feature extraction selects critical spectral bands.

  • Reduced dimensions and critical spectral bands reveal condition of water quality.

Abstract

Economic development, population growth, industrialization, and urbanization dramatically increase urban water quality deterioration, and thereby endanger human life and health. However, there are not many efficient methods and techniques to monitor urban black and odorous water (BOW) pollution. Our research aims at identifying primary indicators of urban BOW through their spectral characteristics and differentiation. This research combined ground in-situ water quality data with ground hyperspectral data collected from main urban BOWs in Guangzhou, China, and integrated factorial data mining and machine learning techniques to investigate how to monitor urban BOW. Eight key water quality parameters at 52 sample sites were used to retrieve three latent dimensions of urban BOW quality by factorial data mining. The synchronically measured hyperspectral bands along with the band combinations were examined by the machine learning technique, Lasso regression, to identify the most correlated bands and band combinations, over which three multiple regression models were fitted against three latent water quality indicators to determine which spectral bands were highly sensitive to three dimensions of urban BOW pollution. The findings revealed that the many sensitive bands were concentrated in higher hyperspectral band ranges, which supported the unique contribution of hyperspectral data for monitoring water quality. In addition, this integrated data mining and machine learning approach overcame the limitations of conventional band selection, which focus on a limited number of band ratios, band differences, and reflectance bands in the lower range of infrared region. The outcome also indicated that the integration of dimensionality reduction with feature selection shows good potential for monitoring urban BOW. This new analysis framework can be used in urban BOW monitoring and provides scientific data for policymakers to monitor it.

Introduction

The volume of industrial wastewater and urban domestic sewage discharged into human living spaces and the natural environment has increased with economic and industrial development and urban growth. Among various types of pollution, water pollution is one of the most considerable threats to socioeconomic sustainability due to its tremendous hazard to human health and environmental health (Song et al., 2019). In developing countries experiencing rapid urbanization, urban water pollution is a leading concern due to inadequate wastewater treatment facilities and regulations (Water Pollution Worries in Developing World, 2020). For instance, in developing countries such as Nepal, India, and Bangladesh, enormous amounts of pollutants discharged in urban areas cause critical river pollution near urban vicinities (Karn et al., 2001). The main cause of surface and groundwater pollution is the discharge of industrial and urban residential wastewater into rivers or underground aquifers (Song et al., 2017). When water is polluted, it is often dark/black in color and gives off smelly gases and is thus known as black odorous water (BOW) (Wang et al., 2014).

The attention to urban BOW can be traced back to the early-mid-1980s and the “The Great Stink” problem reported in the Thames River, United Kingdom (Mann, 2016). Since then, urban BOW phenomena have been observed and studied globally. The population in Mascow Russia more than doubled from the 1970s to the 1980s. Discharging of wastewater into rivers posed a huge challenge and caused malodorous and unsanitary urban areas (Martin, 2008). Sado-Inamura and Fukushi (2018) quantitatively analyzed the odor of urban rivers in Japan by developing human olfaction-based order scores. During the last two decades, as the population rapidly rose and urbanized areas swiftly expanded, the frequencies and severities of BOW phenomena intensified dramatically in urban rivers. China is not an exception. The first incident of the urban BOW was reported in the Huangpu River of China in the early 1980s (Gu et al., 1983). In recent years, BOW was reported in most of the major rivers in China on a quarterly and annual basis (Wang and Yang, 2016). There is an urgent need to address the concerns of BOW pollution (Yu et al., 2020).

Urban BOW phenomena are caused by different physical and chemical processes and reflect unique physical and chemical characteristics (Liu et al., 2003; Gaafar et al., 2020). However, there are no unified evaluation methods or standards for detecting urban BOW phenomena due to the complexity of pollution sources and their occurrence in disparate regions (Liang et al., 2017; Chen et al., 2018). In most water quality studies, researchers determine the health status of a river by using a water quality index (Hasan et al., 2015). Common water quality indicators include dissolved oxygen (DO), chemical oxygen demand (COD), ammonium (NH3-N), total nitrogen (TN), total phosphorus (TP), and total suspended solids (TSS) (Hasan et al., 2015; Wu et al., 2018; Yan et al., 2015). The Chinese Water Quality Standard identifies that three nutrient related indicators TN, TP, and chlorophyll (Chl) are the primary cause of eutrophication (Su et al., 2017). Meanwhile, Chl is selected as a criterion indicator, as it is closely related to TP (Varol, 2020). The increasing concentration of Chl causes many kinds of algae to grow in water and increases the absorption rate of water, thereby resulting in changes in spectral reflectance (Zhao et al., 2013). Furthermore, COD and biochemical oxygen demand (BOD5) are thought to relate closely to the organic pollutants (Mohtar et al., 2019). In addition, TSS is a good indicator of water pollution caused by nutrients and metals and also plays a key role in physical and aesthetic degradation (Packman et al., 1999). However, there are not many studies in the past to examine all of these indicators comprehensively. For instance, Ouyang et al. (2005) examined the river water quality and its main pollutants by analyzing physicochemical parameters DO, organic parameters COD, and major nutrient indices ammonium and TP. DO and biochemical oxygen demand (BOD) were investigated to analyze the unpleasant odor problem in urban rivers (Sado-Inamura et al., 2018). Our study includes all eight commonly analyzed water quality parameters, DO, NH3-N, TP, TN, COD, BOD₅, TSS, and Chl.

Current urban BOW studies heavily rely on traditional field water sampling and testing approaches, but seldom apply newly developed remote sensing technologies to monitor BOW pollution (Shen et al., 2017). However, regarding water quality management, it is a feasible and economic approach to use multispectral remote sensing images and remote sensing techniques to study water pollution (Gan et al., 2010; Vakili and Amanollahi, 2020). In addition, researchers have studied water pollution issues and developed water quality models using various spectroradiometers. They differ by resolution from medium to high and are often used to gather diverse spectral responses for inland water sites with differing concentrations of water quality parameters (Hestir et al., 2015; Jay et al., 2017; Su, 2017; Xu et al., 2018). Moreover, statistical methods are often used to examine relationships between in situ concentration data of water quality parameters and remote sensing reflectance data. Ekercin (2007) examined the water quality of an inland river based on high-resolution multispectral images, and accurately interpreted the water quality condition of the study area. Because of the immensely narrow bandwidth, which is a significant advantage of hyperspectral remote sensing data, slight radiometric differences of the target can be observed and quantified. Jiao et al. (2006) investigated in-water constitution of Chl through hyperspectral remote sensing data. Xu et al. (2018) studied chromophoric dissolved organic matter (CODM) absorption using in-situ hyperspectral reflectance data and water samples. Shafique et al. (2003) used remotely sensed spectral bands and in-situ data to study chlorophyll and phosphorus and how they were influencing the turbidity of water.

It is particularly useful to detect water pollution severity using hyperspectral reflectance data with relatively narrow bandwidth to analyze BOW pollution (Thiemanna and Kaufmannb, 2002). The Chinese high-resolution remote sensing satellite Gaofen-2 (GF-2, 0.8 m) was analyzed to study urban BOW phenomena and to develop several retrieval algorithms with high accuracy (Shen et al., 2019). Another interesting study used Landsat 8 OLI and Sentinel-2 MSI imageries to identify the black lake issue by retrieving water quality indicators (Kuster et al., 2016). Their findings indicate that remote sensing is especially useful in identifying polluted black lakes. It should also be noted that water pollution remote sensing in recent years has integrated a new optimal classification method with data regression analysis to upgrade the remote sensing inversion model of TP (Du et al., 2018). In brief, remote sensing presents a complementary approach for studying comprehensive BOW in urban environments (Shen et al., 2017).

The narrow reflectance bands are an advantage of hyperspectral remote sensing when studying the urban BOW phenomena (Thiemanna and Kaufmannb, 2002). On the other hand, they also increase the difficulty of identifying corresponding relationships between reflectance bands and different water quality indicators due to a substantially large number of bands involved. Band selection is commonly done by examining high spectral correlations with the variables of interest (Du and Yang, 2008). The vast data volume could cause compounding or inaccurate outcomes if correlations between hundreds of bands were conducted. Traditionally four main types of algorithms are applied to examine spectral indication of water pollution: 1) single band-ratio empirical algorithm, 2) semi-empirical baseline algorithm, 3) multiple bands linear regression algorithm, and 4) semi-analytical band-ratio algorithm (Mishra et al., 2013). For instance, Randolph et al. (2008) studied the optimal active pigments chlorophyll and phycocyanin in turbid productive water based on band ratios 709nm/665 nm and 709nm/620 nm, which were initially proposed by Hair et al. (1998) and Allison (1999). Xu et al. (2018) selected the band ratios of 767nm826nm and767nm/826 nm to establish a model for clean water and turbid water, respectively. The band-ratio algorithm is widely used because it can eliminate the influence of environmental conditions to improve the accuracy of the analysis. However, the band-ratio algorithm has the disadvantage of spectrally related reflectance error, which could be overcome by the band-difference algorithm (Lee et al., 2002). However, both the band-ratio and band-difference methods simply focus on a very limited number of bands.

In recent studies, researchers have analyzed water quality parameters and their relations by applying machine learning techniques (Haghiabi et al., 2018). To overcome the uncertainty of the estimation model of a single water parameter, machine learning algorithms are implemented to assess water quality with higher accuracy (Wang et al., 2017). Machine learning is a sub-field of computer science that focuses on algorithms for learning from data and, consequently, identifies highly representative features to make predictions for new unseen data (Yang et al., 2018). The goal of using machine learning technology on the urban BOW issue is to overcome the conventional band selection methods, which highly rely on empirical band combinations.

Although there is a growing literature dealing with applications of hyperspectral remote sensing and machine learning techniques for assessing water quality, no comprehensive or integrated analytical frameworks for measuring urban BOW have been reported. The purpose of this study is to develop an integrated analytical framework combining hyperspectral remote sensed data, statistical analysis and machine learning to identify an effective approach for examining relationships between water quality parameters and urban BOW, for selecting spectral bands that are most sensitive to BOW, and for validating the outcome. Three rivers in Guangzhou City of China were chosen as the study sites to confirm this analytical framework. The study area, data and methods will be presented in Section 2. The results will be explained and discussed in Section 3. The conclusions will be given in Section 4.

Section snippets

The study area

Guangzhou is the capital city and the economic, political, scientific, and cultural center of Guangdong Province. It is also at the heart of the most densely built-up metropolitan area in southern mainland China (Fig. 1). The high speed of economic and social growth came with a very severe level of environmental pollution. As one of the biggest population centers in China, the city experiences severe water pollution concerns, including provisioning its 3 million citizens with polluted drinking

Factorial data mining

The results of factorial data mining showed the eigenvalues and the percentages of variance associated with each factor and included the cumulative percentages of variance (Table 1). Eight factors were initially extracted by PCA and the first three components with high eigenvalues represented meaningful latent variables, which explained nearly 89.9% of the total variance. Factors 1, 2, and 3 explained 63.2%, 13.9%, and 12.7% of the total variance, respectively. The result of varimax rotation

Conclusion

In summary, urban BOW waterbodies need to be strictly monitored and eliminated. The traditional methods reported in current literature are inefficient in time and cost. This research provides a new framework to monitor urban BOW. Factorial data mining effectively uses a dimension reduction tool to group urban BOW water quality indicators into a few meaningful categories, which are interpretable from water quality characteristics and highly correlated with water spectral bands. The outcome of

Credit author statement

Sarigai Sarigai: formal analysis, figure design and drawing, writing - original draft; Ji Yang: field data collection of water quality parameters, writing - review; Alicia Zhou: data mining and machine leaning software and validation; Liusheng Han: field spectroradiometer data collection and processing; Yong Li: data validation, figure design and drawing, writing – review; Yichun Xie: conceptualization, methodology, writing – original draft & editing.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

This research was partially supported by Guangdong Innovative and Entrepreneurial Research Team Program, 2016ZT06D336, National Natural Science Foundation of China, 41976189, GDAS′ Special Project of Science and Technology Development, 2019GDASYL-0301001, Key Special Project for Introduced Talents Team of Southern Marine Science and Engineering Guangdong Laboratory (Guangzhou), GML2019ZD0301, and Guangzhou Science and Technology Program, 201806010106. We thank Mr. Robert Elliott from Eastern

References (65)

  • G.F. Wang et al.

    Analysis on the formation condition of the algae-induced odorous black water agglomerate

    Saudi J. Biol. Sci.

    (2014)
  • Q. Wang et al.

    Industrial water pollution, water environment treatment, and health risks in China

    Environ. Pollut.

    (2016)
  • Z.S. Wu et al.

    Assessing river water quality using water quality index in Lake Taihu Basin, China

    Sci. Total Environ.

    (2018)
  • Y. Xie et al.

    Socio-economic driving forces of arable land conversion: a case study of Wuxian City, China

    Global Environ. Change

    (2005)
  • J. Xu et al.

    Optical models for remote sensing of chromophoric dissolved organic matter (CDOM) absorption in Poyang Lake

    ISPRS J. Photogrammetry Remote Sens.

    (2018)
  • P.D. Allison

    Multiple Regression: A Primer

    (1999)
  • H. Boyyacioglu et al.

    Application of factor Analysis in the assessment of surface water quality in Buyuk Menderes river basin

    Eur. Water

    (2004)
  • D.G. Brown et al.

    Responses to climate and economic risks and opportunities across national and ecological boundaries: changing household strategies on the Mongolian plateau

    Environ. Res. Lett.

    (2013)
  • R.P. Bukata et al.

    Optical water quality model of Lake Ontario. 2: determination of chlorophyll a and suspended mineral concentrations of natural waters from submersible and low altitude optical sensors

    Appl. Optic.

    (1981)
  • J.X. Cao et al.

    A critical review of the appearance of black-odorous waterbodies in China and treatment methods

    J. Hazard Mater.

    (2020)
  • G.L. Chen et al.

    Characteristics and influencing factors of spatial differentiation of urban black-odorous waters in China

    Sustain. Open Access J.

    (2018)
  • Q.L. Cheng et al.

    The discussion on the treatment methods of urban malodorous river

    Shanghai Chem. Ind.

    (2011)
  • X.L. Ding et al.

    Analysis of absorption characteristics of urban black-odor water

    Environ. Sci.

    (2018)
  • M.İ. Doron et al.

    Estimation of light penetration, and horizontal and vertical visibility in oceanic and coastal waters from surface reflectance

    J. Geophys. Res. Oceans

    (2007)
  • Q. Du et al.

    Similarity-based unsupervised band selection for hyperspectral image analysis

    Geosci. Rem. Sens. Lett. IEEE

    (2008)
  • S. Ekercin

    Water quality retrievals from high resolution Ikonos multispectral Imagery: a case study in Istanbul, Turkey

    Water Air Soil Pollut.

    (2007)
  • M. Gaafar et al.

    A practical GIS-based hazard assessment framework for water quality in stormwater systems

    J. Clean. Prod.

    (2020)
  • T.Y. Gan et al.

    Retrieving seawater turbidity from Landsat TM data by regressions and an artificial neural network

    Int. J. Rem. Sens.

    (2010)
  • R. Garreta et al.

    Learning Scikit-Learn: Machine Learning in Python

    (2013)
  • A.F. Goetz

    Making Accurate Field Spectral Reflectance Measurements

    (2012)
  • G. Gu et al.

    Preliminary forecast of black-odor trends in Huangpu Creek

    Shanghai Environ. Sci.

    (1983)
  • A.H. Haghiabi et al.

    Water quality prediction using machine learning methods

    Water Qual. Res. J.

    (2018)
  • Cited by (31)

    View all citing articles on Scopus

    This paper has been recommended for acceptance by Eddy Y. Zeng.

    View full text