QSPR study of the Henry’s law constant for heterogeneous compounds

https://doi.org/10.1016/j.cherd.2019.12.009Get rights and content

Highlights

  • A QSPR model for Henry’s law constant of 530 heterogeneous compounds is established.

  • Only non-conformational molecular descriptors are used to develop the QSPR model.

  • The results are comparable with those obtained through the HENRYWIN program.

Abstract

We establish a Quantitative Structure-Property Relationships study on the Henry’s law constant of 530 heterogeneous compounds, including pesticides, solvents, aromatic hydrocarbons and persistent pollutants. The multivariable linear regression models are established with the Replacement Method (RM) technique, by searching the best 1–8 molecular descriptors on 26,795 available non-conformational structural variables. These descriptors are derived from different freely available softwares, such as PaDEL, Mold2, DataWarrior, QuBiLs-MAS and CORAL. The present results are compared with the estimations provided by the HENRYWIN module of EPI Suite, and serve as a tool for predicting the Henry’s law constant on related chemical structures.

Introduction

One of the first concepts that we learn in science is the Henry's law. It relates the equilibrium liquid- and vapor-phase concentrations of a solute in dilute solutions at moderate pressures and constant temperature. Within this law, the amount of a given gas dissolved in a given volume of liquid to form an ideal mixture is directly proportional to the partial pressure of that gas in equilibrium with that liquid. The proportionality constant is denoted as the Henry's law constant (KH [L atm mol−1]) (Altschuh et al., 1999; Brennan et al., 1998) and represents a partition coefficient that encounters remarkable applications in both Chemical and Environmental Engineering. The study of mass transport between aquatic ecosystems and the atmosphere requires the accurate knowledge of KH. For example, precise Henry's law constant values of pollutants are essential for the correct design of air-stripping processes used in groundwater and industrial wastewater treatment for removal of volatile and semi-volatile contaminants (Camarillo et al., 2016). It is known that organic compounds having KH values higher than 10−3atm m3 mol−1 are feasible to remove from water by dissolved air/gas flotation. The knowledge of KH is also important to determine the fate of many organic pollutants in the environment since it is an indicator of the compound’s volatility (Hodzic et al., 2014). The Henry’s law has also important applications in food processing because information about KH is basic for the understanding of gas transfer in food matrices that contribute to the development of additives and packaging that extend the shelf-life of products (Chaix et al., 2014).

In recent years, there has been much work measuring and tabulating the KH parameter. For example, Sander collected 17,350 KH values for 4632 species, based on 689 references (Sander, 2015). The experimental measurements are reported by using difference techniques, including headspace gas chromatography (Kieckbusch and King, 1979), modified headspace techniques (Chai and Zhu, 1998), phase ratio variation (Ettre et al., 1993), differential headspace method (Chai and Zhu, 2003), and dilutor techniques (Richon, 2011). However, at present accurate KH values are not available for many compounds. Instrumental problems, detection limits of low concentrations of hydrophobic compounds, among other factors, make the experimental determination of KH values difficult and expensive (Staudinger and Roberts, 1996). Consequently, the application of theoretical methods for the truthful prediction of this important parameter is essential for diverse compound types.

During last years, Quantitative Structure-Property Relationships (QSPR) models have become a modern, inexpensive, and rapid tool for predicting the physical, biological or chemical properties of compounds (Austin et al., 2016). In fact, the European Chemicals Bureau promote the use of QSPR models under the European REACH Regulation (EC), No 1907/2006 (Worth et al., 2007). The QSPR theory is based on the premise that a property of a chemical compound can be determined solely by its molecular structure. The structure is quantified through a set of adequate molecular descriptors, which are numerical quantities carrying specific information about relevant constitutional, topological, geometrical, hydrophobic, and electronic characteristics of the investigated compounds (Diudea, 2001; Garro Martinez et al., 2008; Katritzky and Gordeeva, 1993; Todeschini and Consonni, 2009; Tosso et al., 2017). The statistical correlation of an experimental property with an appropriate set of molecular descriptors results in a mathematical model which can be used to predict theoretical values of the property for new compounds and to find out practical relationships and trends.

Although many QSPR analyses have been developed for the prediction of KH in homologous sets of compounds (Duchowicz et al., 2008; Gharagheizi et al., 2010, 2012; Goodarzi et al., 2010; Modarresi et al., 2005, 2007; O’Loughlin and English, 2015; Razdan et al., 2017), more general studies modeling the KH parameter of unrelated organic compounds is limited. A QSPR study of Modarresi et al. in the air-water system for 189 aliphatic hydrocarbons finds a linear relationship between the logarithm of KH and the standard Gibbs free energy of salvation (Modarresi et al., 2005). With the aid of three dimensional (3D) molecular descriptors including a novel descriptor for the molecular cavity shape factor, the proposed model is valuable for polar and charged solutes. A three descriptors-artificial neural network (ANN) model leads to a correlation coefficient (R) of 0.90 and a root mean square error (RMS) of 0.22. The same authors, using a larger set of 940 organic compounds, propose six QSPR models. In that study, the molecular structures are optimized using the PM3 method. They report that a radial basis function neural network model of ten parameters presents the best performance with R ranging from 0.88-0.98, depending on the set, and RMS of 0.56 (Modarresi et al., 2007).

Using the Replacement Method for 150 aliphatic hydrocarbons, Duchowicz et al. (2008) design a Multivariable Linear Regression (MLR) QSPR model of seven parameters leading to R = 0.996 and standard deviation (S) of 0.065. Later on, in a QSPR work on 96 organic pesticides, they find that the best performing model uses the Levenberg–Marquardt with Bayesian regularization weighting function (R = 0.74-0.79 and RMS = 0.93  1.29) (Goodarzi et al., 2010). In another study (Gharagheizi et al., 2010), a complex ANN model based on 107 functional groups is allied to a set of 1940 organic compounds, obtaining R = 0.990 and a low RMS of 0.1. Consequently, they develop a simpler model of eight 3D parameters, obtaining R = 0.98 and RMS of 0.285 (Gharagheizi et al., 2012). O’Loughlin and English working with several hundreds of organic compounds in water find that MLR models perform better for class-specific sets of compounds while ANN models have more predictive accuracy in general sets of compounds (O’Loughlin and English, 2015). Recently, the KH values for six families of persistent organic pollutants are modeled using the group-contribution method based on scaled-particle theory. The reported R and RMS for the fitting and validation sets are 0.89, 0.22; and 0.88, 0.27, respectively (Razdan et al., 2017).

In this work, we search for a QSPR model that accurately predicts the Henry’s law constant of 530 structurally diverse compounds, posing different structures and of chemical relevance. The goal is to obtain reliable models that do not require the knowledge of the conformational characteristics of the molecules to predict KH. Therefore, only conformation-independent molecular descriptors are considered. Finally, the best QSPR relationship found here is applied to predict the KH parameter of 809 structures not considered in the model development and compared with the values predicted by the HENRYWIN module of the EPI Suite freeware (US EPA, 2016).

Section snippets

Materials and methods

The 530 experimental Henry’s law constant values at 25 °C ([atm.m3. mol−1]) are extracted from the HENRYWIN module of EPI Suite, which includes pesticides, solvents, aromatic hydrocarbons and persistent pollutants. The logarithm of the KH values ranges in the interval (−12.99, 1.30). Specific details of the studied compounds are provided in the Supplementary Material (Table 1S).

The molecules are generated in both SMILES notation and bi-dimensional structures drawn with the free Discovery Studio

Results and discussion

The general methodology applied is first to verify the predictive ability of the molecular descriptors through the training and validation sets, and then to evaluate the models for the experimental KH data in the test set. The best linear regressions of 1–8 molecular descriptors are presented in Table 1.

The model with better results is the one that has 7 descriptors (highlighted in bold in Table 1), since RMStrain does not improve significantly when an additional descriptor is included:logkH=

Conclusions

The present work has allowed establishing a QSPR model based on independent conformation descriptors for predicting the Henry’s law constant of a diverse set of compounds. Although the dataset is made up of very heterogeneous chemical structures, the predictions from the proposed linear equation prove to be acceptable when compared with those obtained with the HENRYWIN program. In this specific studied property (KH), it is not possible to improve the predictive quality of the model by including

Conflicts of interest

None.

Authors’ contributions

All authors contributed equally to this work. All authors read and approved the final manuscript.

Acknowledgments

We thank the financial support provided by the National Research Council of Argentina (CONICET) PIP11220130100311 project, and to Ministerio de Ciencia, Tecnología e Innovación Productiva for the electronic library facilities. PRD, SEF, and DEB are members of the scientific researcher career of CONICET.

References (50)

  • H. Modarresi et al.

    QSPR model of Henry’s law constant for a diverse set of organic chemicals based on genetic algorithm-radial basis function network approach

    Chemosphere

    (2007)
  • D.R. O’Loughlin et al.

    Prediction of Henry’s Law Constants via group-specific quantitative structure property relationships

    Chemosphere

    (2015)
  • K. Roy et al.

    Be aware of error measures. Further studies on validation of predictive QSAR models

    Chemometr. Intell. Lab. Syst.

    (2016)
  • A. Toropova et al.

    CORAL: QSAR modeling of toxicity of organic chemicals towards Daphnia magna

    Chemometr. Intell. Lab. Syst.

    (2012)
  • R.D. Tosso et al.

    The electronic density obtained from a QTAIM analysis used as molecular descriptor. A study performed in a new series of DHFR inhibitors

    J. Mol. Struct.

    (2017)
  • D.S. BIOVIA

    Discovery Studio Modeling Environment

    (2017)
  • E. Chaix et al.

    Oxygen and carbon dioxide solubility and diffusivity in solid food matrices: a review of past and current knowledge

    Compr. Rev. Food Sci. Food Saf.

    (2014)
  • N. Chirico et al.

    Real external predictivity of QSAR models. Part 2. New intercomparable thresholds for different validation criteria and the need for scatter plot inspection

    J. Chem. Inf. Model.

    (2012)
  • M.V. Diudea

    QSPR/QSAR Studies by Molecular Descriptors

    (2001)
  • P. Duchowicz

    Linear regression QSAR models for polo-like kinase-1 inhibitors

    Cells

    (2018)
  • P.R. Duchowicz et al.

    Alternative algorithm for the search of an optimal set of descriptors in QSAR-QSPR studies

    MATCH Commun. Math. Comput. Chem

    (2006)
  • L. Eriksson et al.

    Methods for reliability and uncertainty assessment and for applicability evaluations of classification-and regression-based QSARs

    Environ. Health Perspect.

    (2003)
  • L. Ettre et al.

    Determination of gas-liquid partition coefficients by automatic equilibrium headspace-gas chromatography utilizing the phase ratio variation method

    Chromatographia

    (1993)
  • D. Gadaleta et al.

    Applicability domain for QSAR models: where theory meets reality

    Int. J. Quant. Struct.-Prop. Relat.

    (2016)
  • J. Garro Martinez et al.

    An exploratory study to investigate possible simple descriptors in order to predict relative activity of antiepileptic enaminones

    J. Phys. Org. Chem.

    (2008)
  • Cited by (16)

    • A simple approach for prediction of Henry's law constant of pesticides, solvents, aromatic hydrocarbons, and persistent pollutants without using complex computer codes and descriptors

      2022, Process Safety and Environmental Protection
      Citation Excerpt :

      They usually require three-dimensional (3D) molecular descriptors, complex computer code, and expert users. Duchowicz et al., 2020 established recently a QSPR model for the prediction of Henry’s law constant of 530 heterogeneous chemicals, which include solvents, pesticides, persistent pollutants, and aromatic hydrocarbons. Their model does not need the information of the conformational characteristics of the molecules for the prediction of Henry’s law constant.

    • Usage and disposal strategies of environmental micropollutants

      2022, Environmental Micropollutants: A Volume in Advances in Pollution Research
    • Toxicity: 77 Must-Know Predictions of Organic Compounds: Including Ionic Liquids

      2023, Toxicity: 77 Must-Know Predictions of Organic Compounds: Including Ionic Liquids
    View all citing articles on Scopus
    View full text