Deep learning based quantitative property-consequence relationship (QPCR) models for toxic dispersion prediction

doi:10.1016/j.psep.2021.06.019

Process Safety and Environmental Protection

Volume 152, August 2021, Pages 352-360

https://doi.org/10.1016/j.psep.2021.06.019 Get rights and content

Abstract

It is crucial for emergency responders to makes a quick and accurate prediction of toxic chemical dispersions, which can lead to massive injuries and casualties. In this study, a toxic dispersion database is constructed by PHAST simulations, which consist of 30,022 toxic release scenarios of 19 chemicals. A quantitative consequence prediction model is then developed based on this database to efficiently and accurately predict dispersion downwind distances. Random forest, gradient boosting, and deep neural network algorithms are implemented and compared to find the best performing method for the model construction. The deep neural network is found to have the highest accuracy with the test set R² higher than 0.994 and RMSE less than 0.1 for all key dispersion ranges. The developed toxic dispersion prediction models can be used to quickly generate instant toxic dispersion range estimations for any toxic chemicals at much lower computational costs.

Graphical abstract

Introduction

Incidental release and dispersion of toxic chemicals may lead to serious short-term and long-term consequences to humanity as well as the environment. The Bhopal disaster that happened on December 3rd, 1984 killed at least 3800 people immediately with morbidity and premature death skyrocketed afterward (Broughton, 2005). Although the laws and regulations became more stringent over the storage and use of toxic chemicals and the emergency response and process safety have developed significantly in the past 30 years, toxic release incidents continue to happen. In 2020, a styrene monomer release in Visakhapatnam India caused 15 death with more than 1000 people sick after exposure, which shows that the control and prevention of toxic release and dispersion still need more attention (Feng et al., 2021). Unlike other safety-related accidents, like fire and explosion that the impact is instantaneous, the toxic release on the other hand can also cause significant long-term effect. The toxic chemicals can pollute the soil and water body in the vicinity which can cause an increase of carcinogenic and deformity rate, death of animal and plant and deposition of toxic substances in the soil which can cause every longer adverse effect to the environment.

In order to prevent and control the dispersion of toxic chemicals at the beginning stage, an instant and accurate prediction of the dispersion ranges is crucial for emergency response planning and consequences analysis. Currently, there are three types of toxic dispersion prediction and calculation methods: empirical models, computational fluid dynamics (CFD) models, and integrated models. For empirical models, one of the most used models is the Pasquill-Gifford and Britter-McQuaid model, which can provide rapid predictions of the downwind plume distances using pre-derived equations and computation graph (McQuaid, 1982). However, the Pasquill-Gifford dispersion model is only applicable to neutrally buoyant dispersions of gases, and the Britter-McQuaid model, on the other hand, can only be implemented for dense gas dispersion, the calculation procedure is complex and it is not able to account for the influence of obstacles (Crowl and Louvar, 2019).

CFD simulation tools for toxic dispersion modeling is capable of capturing the influence of surface roughness and can investigate the influence of the obstacles since the geometry can be constructed aligned to the actual case. It has been well-studied for different toxic dispersion scenarios (Carboni et al., 2021; Joshi et al., 2016; Pontiggia et al., 2009; Scargiali et al., 2005; Shen et al., 2020; Tauseef et al., 2011; Wang et al., 2020; Zhang and Chen, 2010). However, the geometry and boundary condition setup is extremely time-consuming and the calculation is extremely resource demanding, which is not very suitable for instant prediction and emergency response. Integral prediction method such as HEGADIS, NCAR, and DRIFT, which overcome the disadvantage of empirical and CFD method, can provide relatively accurate toxic dispersion prediction result with much lower computation source (Gant et al., 2018). However, integral methods also have limitations which only able to simulate free-field dispersion, and the dispersion simulation is only limited to the toxic chemical from the built-in database (Jiao et al., 2020c).

PHAST is one of the most popular process hazard analysis software that has a dispersion prediction tool that uses UDM (Unified Dispersion Model) to simulate complete course of incidents scenarios from initial leakage to far-field dispersion, it is also able to model the rainout and subsequent vaporization. The PHAST UDM module has also been widely validated against toxic release experimental results for both buyout and heavy gases, with very high prediction accuracy (Gerbec et al., 2017; Witlox et al., 2018). Pandya et al. (2012) also conducted sensitivity analysis of PHAST’s toxic dispersion prediction, which shows its capability in accurate toxic chemical dispersion prediction.

Machine learning and deep learning have been widely applied in the research of the field of chemical health and safety in recent years, especially in consequence modeling (Jiao et al., 2020a). Ma and Zhang (2016) combined the classical Gaussian dispersion model with the support vector machine (SVM) algorithm to develop a point source dispersion prediction model for contaminant dispersion, which shows satisfactory results. However, the Gaussian model is proved less accurate comparing to CFD and integral method, which makes the dataset developed for machine learning algorithms implementation less convincing. Wang et al. (2018) compared two different machine-learning methods in gas dispersion using project Prairie Grass experiment data that shows the practicality of implementing machine learning algorithms hazardous gas dispersion simulation. Qian et al. (2019) used the long short-term memory (LSTM) network to develop a toxic gas dispersion model based on the same dataset. Ni et al. (2020) further compared the performance of empirical, CFD, and machine-learning models in Prairie Grass field experiment data prediction, which shows that the convolution neural networks have the better performance over CFD and empirical method. However, both studies only have limited experimental data, which makes the model less universally applicable for different leak scenarios.

For machine learning and deep learning base consequence prediction model, one of the major challenges is the availability of data since the accuracy of the model is largely depends on the database size (Ji et al., 2021a). As mentioned above, the experimental data is limited due to the cost and hazardous nature of the experiment. The experiments are also mostly having only limited leak scenarios that make the developed model only have narrowed applicability. Therefore, the database created by CFD or integral model that has been widely validated and proved, is also a viable choice for machine learning and deep learning based model construction. Among those, the PHAST UDM module is the mode prevalent one for dispersion consequence data generation, Wang et al. (2015) used the PHAST simulated database and sensor data to validate the neural network-based chlorine dispersion model, which shows the effectiveness of using PHAST simulation in dispersion prediction. Sun et al. (2019) use artificial neural networks to construct prediction models for fire radiation distances of jet fire, early and late pool fire using PHAST simulation data, which shows very satisfactory accuracy. Jiao et al. (2020c) have used PHAST generated data for flammable chemical dispersion prediction, which also shows promising results.

Another challenge for the toxic chemical dispersion prediction model is to have a suitable framework to embed all scenario-specific parameters and chemical-specific parameters for final model development. Furthermore, the relationship between these parameters with the dispersion ranges is highly non-linear and the interaction mechanism remains unknown. This makes the machine learning and deep learning algorithms more suitable for the toxic dispersion prediction model development.

The quantitative property consequence relationship (QPCR) method was firstly proposed by Jiao et al. (2020c) in 2020. The QPCR method was originally inspired by the quantitative structure-property relationship (QSPR) analysis method that uses structural attributes as descriptors to build mathematical relationships between the property of interest and structures at a quantum chemistry level. QSPR method is a well-developed methodology for property prediction that has been widely implemented for hazardous property prediction (Jiao et al., 2020d; Ji et al., 2021b; Jiao et al., 2019, 2020a; Wang et al., 2017). QPCR combines the advantage of the QSPR method with the characteristics of dispersion modeling which uses the scenario properties and chemical properties as property descriptors as independent variables for dispersion ranges prediction. The QPCR method can serve to bridge the gap between microscale leak scenarios and chemical properties with macroscale dispersion consequences, which can perfectly solve the toxic dispersion prediction model development problem.

In this study, the toxic dispersion database was constructed using PHAST UDM toxic dispersion simulations, which were conducted for 450 different leak scenarios of 19 common toxic chemicals in the chemical industries. The three key toxic dispersion parameters, which are maximum downwind distance, minimum downwind distance, and maximum vapor cloud width, were obtained from the simulation. The constructed database has a total of 30,022 data points, which is the largest among all published work.

Furthermore, both machine learning (random forest and XGBoost) and deep learning algorithms (deep neural networks) will be implemented for the toxic dispersion QPCR model development, the performance of these algorithms are compared and discussed to find the algorithms with the optimal performance for further model development. Finally, the selected algorithm will be trained based on the constructed database to construct the final toxic dispersion QPCR model.

Section snippets

Toxic dispersion database

Before toxic dispersion QPCR model development, a comprehensive database of toxic dispersion consequences considering different leak scenarios of different toxic chemicals is necessary for a credible prediction model. In this study, toxic chemical dispersion consequence database was constructed using PHAST UDM simulation. The leak condition parameters consist of several components: source condition (release material, location, quantity, etc.), weather condition (wind speed, atmospheric

Data analysis and preprocessing

Before the construction of toxic dispersion QPCR models, the toxic chemical dispersion database needs to be examined to ensure the viability of machine learning and deep learning model development. The distribution and statistical properties also need to be investigated to confirm the necessity of data transformation beforehand. The scatter plots and histograms of the toxic dispersion database are shown in Fig. 2, toxic dispersion distance data of different chemicals are distinguished by color.

Conclusions

In this study, a comprehensive toxic chemical dispersion consequence prediction model is successfully constructed using more than 30,000 dispersion simulations with different leak scenarios using PHAST. Different machine learning (random forest, gradient boosting) and deep learning (deep neural networks) algorithms are compared and implemented to find the optimal method for the final model development. The result shows that the structure-optimized deep neural network has the highest prediction

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.