Retrieval of cloud top properties from advanced geostationary satellite imager measurements based on machine learning algorithms
Introduction
As an essential role in the global weather and climate systems, various cloud properties are able to influence the radiation budget at the top and surface of the atmosphere (Baker, 1997; Sassen et al., 2007). The cloud top height (CTH, or cloud top pressure, CTP) is of particular importance for determining longwave radiation at the surface and aviation safety (Holz et al., 2008). In general, CTH can be derived from passive satellite multichannel imaging measurements, often using infrared (IR) window (IRW), CO2-slicing, and one-dimensional variational (1DVAR) methods (Heidinger and Pavolonis, 2009; Li et al., 2001; Menzel et al., 2008) based on physical properties of clouds. Some conventional spaceborne imagers, such as AVHRR (Advanced Very High Resolution Radiometer), HIRS (High resolution Infrared Radiation Sounder), MODIS (Moderate Resolution Imaging Spectroradiometer), and VIIRS (Visible infrared Imaging Radiometer), have produced CTH climate data records (CDR) (Baum et al., 2012b) to help us to further understand the Earth climate system. Since most of the CTH retrieval methods of passive sensors involve a radiative transfer model (RTM), and usually a RTM in cloudy skies has large uncertainties (Li et al., 2017; Li et al., 2013), CTHs have limited accuracy for optically thin or broken clouds (Baum et al., 2012a). Compared with passive sensors, the global observations from the spaceborne lidar CALIOP (Cloud-Aerosol Lidar with Orthogonal Polarization) onboard the CALIPSO (Cloud-Aerosol Lidar and Infrared Pathfinder Satellite Observations) satellite mission use laser return signals to retrieve CTH with higher accuracy, but with limited spatial coverage (nadir only) and limited temporal resolution (Li et al., 2018; Liu et al., 2019a; Winker et al., 2010). The CTHs derived from CALISPO measurements are seen as truth to validate the corresponding CTH product from passive sensor (Holz et al., 2008). Some previous studies also pointed out the significant biases or underestimations in the CTH product from passive sensor measurements (Baum et al., 2012a; Holz et al., 2008; Weisz et al., 2007). The biases or underestimations are mainly attributed to the CALIOP's high sensitivity to high and optically thin cirrus (Holz et al., 2008), which are most likely missed by passive sensor. In addition, there is the intrinsic difference in the CTHs from lidar versus IR measurements; lidar captures the very top while IR estimates the cloud radiation from optical depth of one.
In recent years, the Advanced Himawari Imager (AHI) (Husi et al., 2019) onboard Himawari-8/-9 (H8/9), the Advanced Baseline Imager (ABI) (Schmit et al., 2005; Schmit et al., 2009) onboard the new generation of the Geostationary Operational Environment Satellite (GOES)-R series, and the Advanced Geostationary Radiation Imager (AGRI) onboard Fengyun-4A (FY-4A) (Yang et al., 2017) have been successfully launched into the geostationary (GEO) orbit, which provide high temporal (every 10 to 15 min full disk coverage and more frequently for regional coverage) and high spatial (0.5–2 km at nadir for AHI and ABI, 0.5–4 km for AGRI) resolution measurements in 16 (14 for AGRI) spectral bands (Min et al., 2017). Thereby, combining spatially and temporally collocated CALIPSO and GEO imager (i.e. H8-9/AHI) measurements offers a good opportunity to establish retrievals of CTH with both high spatial and high temporal resolutions.
In addition, advanced machine learning (ML) techniques, such as K-nearest-neighbor (KNN), random forests (RF), support vector machines (SVM), artificial neural network (ANN), deep learning (DL, one kind of complex ANN algorithm), etc., offers a possible solution to some non-linear issues in remote sensing and geoscience fields (Kühnlein et al., 2014a; Kühnlein et al., 2014b; Min et al., 2019). A previous study (Håkansson et al., 2018) used a neural network algorithm to train and CTP and CTH for several passive sensors in polar-orbit. Here, advanced ML techniques (details in the Appendix B section) are used to build a connection between CALISPO and GEO imager CTH determinations.
With 16 spectral bands (of which 10 are infrared) viewing the same cloud system, H8/AHI has the capability to depict the cloud top properties with IR measurements during both day and night. The primary goal of this investigation is to derive CTHs using H8/AHI measurements day and night and to avoid using the radiative transfer model (RTM) in clear or cloudy skies. A spatially and temporally collocated AHI, CALIPSO, and numerical weather prediction (NWP) model dataset is used for training to develop a statistical model based on ML methods. Then, the statistical or prediction model is applied to the H8/AHI IR band measurements for deriving CTH products with high temporal and spatial resolutions over the full earth disk. Some independent validation tests will be conducted to compare the CTH results from the traditional 1DVAR and ML-based methods. This study also addresses the following questions related to new ML-based CTH retrieval approach: (1) How to derive CTH from combined H8/AHI radiances and CALIOP measurements? (2) Is there any CTH improvement from ML-based algorithm over other algorithms such as 1DVAR?
Section 2 briefly introduces H8/AHI, CALIPSO, and global forecast system (GFS) NWP data. In Section 3, the traditional physical (TRA) cloud-top pressure (CTP, which can be converted to CTH) algorithm is described along with validations for H8/AHI data. Section 4 introduces four classical ML algorithms, chooses and evaluates the optimal statistical model for CTH retrieval. Section 5 shows the CTH results from the new ML-based algorithm and a joint algorithm, and discusses the possible impact factors affecting the ML-based method. Finally, Section 6 provides a summary. In addition, two appendices are attached at the end of this study to further interpret the TRA and four ML algorithms.
Section snippets
Data
The new-generation Japanese geostationary meteorological satellite, Himawari-8 has been successfully launched into geosynchronous orbit on October 7, 2014 introducing the new AHI. Located at 140.7° E, AHI provides 16 bands of full disk earth-viewing imagery in visible (VIS, 4 bands), near-infrared (NIR, 2 bands), and infrared (IR, 10 bands) bands (central wavelengths from 0.47–13.4 μm) every 10 min with 0.5 (VIS, 1 band), 1.0 (VIS, 2 bands), and 2.0 (NIR/IR, 13 bands) km horizontal resolutions (
Algorithm description
The classical CO2-slicing algorithm (Menzel et al., 2008) for Aqua/Terra MODIS Collection 6 is used to retrieve operational cloud top properties. It uses several 13 and 14 μm spectral bands for ice clouds retrieval, and the IR-window approach (IRW) based on 11 μm band for water clouds retrieval along with a latitude dependent lapse rate for low clouds over ocean. However, it is not possible to apply or adapt this algorithm to the current or new-generation GEO satellite imager; some important 13
Four ML algorithms
In this investigation, we primarily use four classical ML algorithms to train a cloud top properties prediction model, including K-Nearest-Neighbor (KNN) (Altman, 1992; Coomans and Massart, 1982), Support Vector Machine (SVM) (Cao, 2003; Drucker et al., 1997), Random Forest (RF) (Breiman, 2001), and Gradient Boosting Decision Tree (GBDT) (Friedman, 2002). Compared with the H8/AHI CTH algorithms mentioned in the Section 3 (Heidinger and Pavolonis, 2009), a ML-based prediction retrieval algorithm
Results and discussions
After the optimal prediction model is determined, we develop a CTH retrieval program for H8/AHI data in line with the procedure in Fig. 4. Fig. 7 shows the validations of CTH of H8/AHI from TRA and ML (based on the optimal GBDT model with first guess) algorithms using CALIPSO data for the testing dataset (mentioned in Fig. 3). The sub-figures at the last column in Fig. 7 show the layered MAE, MBE, and STD of CTH for three different datasets at an interval of 1 km for TRA and ML algorithms. From
Summary
The objective of this study is to investigate a new approach for improving the CTH estimation through combined use of passive and active remote sensing measurements. The AHI radiance measurements from the first satellite of new generation of Japanese GEO series and the CALIPSO official cloud products (Version 4.1) are collocated spatially and temporally for developing the statistical CTH retrieval methods based on advanced machine learning techniques, and retrieved products are validated with
Author contributions
Min Min: Conceptualization, Methodology, Software, Investigation, Resources, Validation, Writing-Original draft preparation. Jun Li: Conceptualization, Methodology, Supervision, Software, Investigation, Data curation, Writing-Original draft preparation. Fu Wang: Software, Visualization, Investigation. Zijing Liu: Validation, Software. W. Paul Menzel: Writing-Reviewing and Editing, Supervision.
Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgements
The authors would like to acknowledge NASA, JMA, and NOAA for freely providing the MODIS (https://ladsweb.modaps.eosdis.nasa.gov/search), CALIPSO (https://subset.larc.nasa.gov/calipso/login.php), Himawari-8 (ftp.ptree.jaxa.jp), and GFS NWP (ftp://nomads.ncdc.noaa.gov/GFS/Grid4) data online. Special thanks go to the GOES-R Algorithm Working Group for guiding the TRA algorithm applications. Also the authors sincerely appreciate the power computer tools developed by the Python and scikit-learn
References (54)
Support vector machines experts for time series forecasting
Neurocomputing
(2003)- et al.
Alternative k-nearest neighbour rules in supervised pattern recognition: part 1. k-Nearest neighbour classification by using alternative voting rules
Anal. Chim. Acta
(1982) - et al.
Improving the accuracy of rainfall rates from optical satellite sensors with machine learning — a random forests-based approach applied to MSG SEVIRI
Remote Sens. Environ.
(2014) - et al.
Long-term variation of cloud droplet number concentrations from space-based Lidar
Remote Sens. Environ.
(2018) - et al.
On the influence of cloud fraction diurnal cycle and sub-grid cloud optical thickness variability on all-sky direct aerosol radiative forcing
J. Quant. Spectrosc. Radiat. Transf.
(2014) - et al.
Support vector machines in remote sensing: a review
ISPRS J. Photogramm. Remote Sens.
(2011) - et al.
Extinction effects of atmospheric compositions on return signals of space-based lidar from numerical simulation
J. Quant. Spectrosc. Radiat. Transf.
(2018) An introduction to kernel and nearest-neighbor nonparametric regression
Am. Stat.
(1992)Cloud microphysics and climate
Science
(1997)- et al.
MODIS cloud top property refinements for Collection 6
J. Appl. Meteorol. Climatol.
(2012)