A machine learning approach for deriving spectral absorption coefficients of optically active oceanic constituents
Introduction
The total light absorption coefficient, a(λ),[m-1] is one of the bulk Inherent Optical Properties (BIOPs) that provides information about the interaction of light with natural water and its constituents (λ is the wavelength of light). The optically active seawater constituents that alter the light absorption are phytoplankton, detrital matter, colored dissolved organic matter (CDOM) or gelbstoff, and water itself. Except in extremely turbid environments, a(λ) is assumed to be a sum of the contributions from constituents absorption, also called absorption subcomponents which are absorption due to phytoplankton ((λ),m-1), CDOM ((λ),m-1), detrital matter ((λ),m-1) and water itself ((λ), m-1) (Twardowski et al., 2018). These absorption spectra are of significant interest in understanding the biogeochemical processes, aquatic ecology, and water quality in the upper ocean. Phytoplankton absorption coefficient plays a crucial role in primary production, assessing phytoplankton community composition, ecology and biogeochemical cycles (Sathyendranath, 2014). The abundance, composition and variability of CDOM are vital, owing to its contribution to organic carbon and, thereby, the global carbon budget (Dong et al., 2013). Detritus or non-algal particulate matter constitutes inorganic material that affects the light penetration and availability for photosynthesis (Lin et al., 2013). The absorption due to CDOM and the detrital matter are often combined and modeled as a single component (Colored Detrital Matter), (λ), owing to their similar spectral characteristics i.e., exponential decrease in absorption with wavelength. The absorption coefficient of water, (λ) is assumed to be known with 2% accuracy (Pope and Fry, 1997) and its variation with respect to temperature and salinity is studied (Twardowski et al., 2018).
The present state-of-art methods for measuring or deriving the a(λ) or (λ), the total non-water absorption coefficient (absorption due to water subtracted from a(λ)) includes flow-through systems, moorings, autonomous profilers and satellite-derived measurements (Zheng and Stramski, 2013a). These instruments are capable of providing high temporal data with absorption measurements in multispectral and hyperspectral wavelengths. With recent development in instruments like ac-s (Rhoades et al., 2004), a-sphere (Dana and Maffione, 2006) and Point Source Integrating Cavity Absorption Meter (PSICAM) (Röttgers et al., 2007), it is possible to obtain absorption coefficients with high accuracy.
Semi-analytical algorithms (Lee, 2006; Loisel et al., 2018; Werdell et al., 2013) are often used to derive the a(λ) and the subcomponent IOPs from satellite-derived remote-sensing reflectance, ((λ),) measurements. Owing to the synoptic measurements of satellite data, it is possible to obtain the subcomponent IOPs on a temporal scale to study the oceanic processes at various scales. Another class of algorithms called, the total absorption decomposition or partitioning models are used to partition the measured or derived a(λ) or (λ) into subcomponent IOPs, (λ) and (λ) (Ciotti and Bricaud, 2006; Lee et al., 2002; Oubelkheir et al., 2007; Roesler et al., 1989; Zhang et al., 2015) or further into (λ), (λ) and (λ) (Dong et al., 2013; Lin et al., 2013; Schofield et al., 2004; Zheng et al., 2015; Zheng and Stramski, 2013a). Other methods of deriving (λ) include partitioning particulate absorption coefficient, (λ) (m-1) into (λ) and (λ) (Bricaud and Stramski, 1990; Zheng and Stramski, 2013b). However, the present study focuses only on the total absorption spectrum partitioning algorithms.
Some of these absorption partitioning models either assume spectral shapes for subcomponent IOPs or require ancillary inputs like Chlorophyll-a (Chl-a) or direct in situ measurements of IOPs or developed for a particular environment. Parameterization of shapes of subcomponents using an empirical quadratic equation for phytoplankton as in Lin et al. (2013) limits the variation in spectral shapes observed naturally in various phytoplankton species and communities. Similarly, use of a single spectral shape model for modelling (λ), (λ) or (λ) (Ciotti et al., 2002; Zhang et al., 2015) may not entirely represent the variability in the absorption subcomponents observed in natural waters. Use of ancillary inputs like Chl-a concentration can be a source of additional error, either inherent with measurement method or if derived from (λ). Regional parameterization of shapes of absorption subcomponents limits the model's applicability at a global scale owing to a difference in range and variability in observed absorption subcomponents. While these methods perform well in their defined area or range of IOPs, their performance can be affected in a wide range of aquatic ecosystems (Stramski et al., 2019). By relaxing the assumptions about subcomponent IOP shapes and the requirement of ancillary variables, Zheng and Stramski (Zheng et al., 2015; Zheng and Stramski, 2013a) proposed stacked-constraints models that use various constraints on the shapes and slopes of subcomponent IOPs to find a set of possible optimal solutions from a large pool of candidate solutions. Apart from these limitations and assumptions, most of these models use an optimization procedure to find the optimal variables such as Chl-a concentration or to construct the shapes of subcomponent IOPs. The models using optimization procedure or any iterative method can thus take a substantial amount of time when applied to a large dataset like satellite imagery or in the case of a time series analysis. Hence, there is a need to develop less time-consuming algorithms capable of using a sole input of a(λ) or (λ) and meet the criteria of avoiding ancillary inputs and relax the assumptions for modelling spectral shapes of subcomponent IOPs.
With these requirements, an attempt is made to use Machine Learning (ML) models to derive the subcomponent IOPs at six light wavelengths corresponding to the SeaWiFS sensor in the visible range (400–700 nm) using (λ) as input. ML algorithms can be trained using simulated or in situ data, and their performance can be tested or validated using other measured in situ datasets. One of the key advantages of the ML algorithms lies in the very less computational time needed to derive subcomponent IOPs from large datasets such as satellite imagery. Also, ML algorithms are capable of learning complex non-linear patterns by using appropriate methods and do not require any prior assumptions. In the ocean color domain, ML algorithms like Neural Networks (NN) have been widely used for deriving IOPs and Chl-a concentration from (λ) (Chen et al., 2015; 2014; D'Alimonte et al., 2012; Doerffer and Schiller, 2007; Hieronymi et al., 2017; Ioannou et al., 2013a; 2011; Jamet et al., 2012; Schiller and Doerffer, 1999; Tanaka et al., 2004). NN's have also been used for atmospheric correction (Brockmann et al., 2016), deriving ocean salinity (Gueye et al., 2014) and ocean color data reconstruction (Krasnopolsky et al., 2016). Other ML approaches like ensemble type models using Random Forest (Chen et al., 2019), Extremely Randomized Trees (Park et al., 2019), linear regression models, Self-Organizing Maps, k-Nearest Neighbors (Keller et al., 2018a, 2018b), Support Vector Machine (SVM) based regression (Hu et al., 2020) have been used for various purposes in water quality modelling. Hence in this study, an attempt has been made to identify the best ML algorithms by comparing different ML approaches that use (λ) as input to derive subcomponent IOPs at six wavelengths corresponding to the Sea-viewing Wide Field-of-view Sensor (SeaWiFS) sensor. ML algorithms trained and validated in this study can be integrated into real-time continuous instruments like profilers, flow-through systems, moorings and gliders to obtain subcomponent IOP information.
The ML methods are trained using a simulated dataset covering a wide range of IOPs corresponding to the optical variability observed in natural aquatic environments. Two publicly available datasets encompassing a wide range of optical properties with measured (λ) and the associated subcomponent IOPs are used to test the performance of trained ML models. Various univariate statistics and a quantitative statistical methodology (Brewin et al., 2015; Dorji and Fearns, 2016) are used for comparing the performance of ML models. As the ML models are trained and tested using various datasets, they can be further tested in data collected over waters with different optical properties and ultimately be used with satellite imagery.
Section snippets
Simulated data for training ML models
To train various ML models, a simulated dataset consisting of a, , , at six wavelengths (412, 443, 490, 510, 555 and 670 nm) corresponding to the SeaWiFS is generated following the methodology in Ioannou et al. (2013b, 2011). Briefly, the dataset consists of 9000 spectra of (λ) and the subcomponent IOPs generated with Chl, (412) and non-algal particulate concentrations ranging from 0.02 to 70 mg/m3, 0.001–6 m−1 and 0.02–50 mg/m3 respectively. Pure water spectral absorption
Comparison of ML models in deriving absorption by phytoplankton
The results of a quantitative inter-comparison of ML models in deriving from of GBID are presented in Fig. 2, Fig. 3. The results indicate that all models captured the variability in the in situ data at all wavelengths (r > 0.9), except at 555 nm (0.8 < r < 0.9) (Fig. 2). The values of all the ML models are lower at 555 nm and increased at 670 nm. In agreement with the values, the and values are higher at 555 nm and lower at 670 nm. All the linear regression methods
Discussion
The ML-based models neither require ancillary inputs nor assume shapes in modelling the absorption subcomponents. In deriving absorption subcomponents from a large dataset like satellite imagery, the existing absorption partitioning algorithms implement can result in higher computational time, as they implement optimization scheme either once or more times. On the other hand, the trained ML models are computationally fast and hence are useful to process satellite imagery. The performance of
Conclusions
The present study compared seventeen ML models falling into six different approaches and two existing absorption partitioning models for deriving subcomponent IOPs from . The performance of the ML models is evaluated using the IOPs from two publicly available in situ datasets acquired over various aquatic environments covering a wide range of optical properties. A quantitative statistical methodology is used to rank various ML models according to their performance. Among the ML models,
Availability of data and materials
More information about the contributors to the data, quality control for all the parameters for GBID and CCRR datasets are available at https://doi.pangaea.de/10.1594/PANGAEA.854832/ and https://www.earth-syst-sci-data.net/7/319/2015/respectively.
Computer code availability statement
The ERT and EBG models developed in the present study, along with the sample datasets, are publicly accessible at https://github.com/kollurusrinivas1/Spectral_IOP_Decomp.
Name of the codes to run: Run_CCRR.m, Run_GBID.m Run_CCRR_noOrig.m (in the Github repository).
License: GNU v3.0 General Public License.
Program Language: MATLAB.
Author statement
Conceptualization, S.K., S.S.G.; Methodology, S.K.; Software, S.K.; Validation, S.K. and S.S.G.; Resources, S.S.G. and A.B.I.; Data Curation, S.K.; Writing-Original Draft Preparation, S.K; Writing-Review and Editing. S.S.G. and A.B.I; Supervision, S.S.G. and A.B.I.; Project Administration, S.S.G. and A.B.I.
Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
The authors thank all the organizations and researchers involved in collecting and compiling the GBID and CCRR datasets and making them publicly available.
References (74)
- et al.
the ocean colour climate change initiative: III. A round-robin comparison on in-water bio-optical algorithms
Remote Sens. Environ.
(2015) - et al.
Remote sensing of absorption and scattering coefficient using neural network model: development, validation, and application
Remote Sens. Environ.
(2014) - et al.
Improving ocean color data coverage through machine learning
Remote Sens. Environ.
(2019) - et al.
Performance and applicability of bio-optical algorithms in different European seas
Remote Sens. Environ.
(2012) - et al.
An algorithm to retrieve absorption coefficient of chromophoric dissolved organic matter from ocean color
Remote Sens. Environ.
(2013) WASI-2D: a software tool for regionally optimized analysis of imaging spectrometer data from deep and shallow waters
Comput. Geosci.
(2014)The water color simulator WASI: an integrating software tool for analysis and simulation of optical in situ spectra
Comput. Geosci.
(2004)- et al.
BOMBER: a tool for estimating water quality and bottom properties from remote sensing images
Comput. Geosci.
(2012) - et al.
Neural approach to inverting complex system: application to ocean salinity profile estimation from surface parameters
Comput. Geosci.
(2014) - et al.
Deriving ocean color products using neural networks
Remote Sens. Environ.
(2013)