Prediction and design of cyclodextrin inclusion complexes formation via machine learning-based strategies

doi:10.1016/j.ces.2022.117946

Chemical Engineering Science

Volume 261, 2 November 2022, 117946

https://doi.org/10.1016/j.ces.2022.117946 Get rights and content

Highlights

•
Establishing three cyclodextrin inclusion complexes predicting models based on machine learning.
•
Establishing the recall-first strategy to avoid predicting missing.
•
Establishing the precision-first strategy to reduce the number of experimental verifications.
•
Finding three new inclusion complexes using prednisolone, 9-fluorenone, and saccharin with cyclodextrin.

Abstract

This study reports a machine-learning (ML) method to develop multi-purpose prediction strategies for the formation of cyclodextrin inclusion complexes (ICs) in aqueous solutions. A balanced dataset of pharmaceutically relevant molecules was constructed using experimental verification. Three ML models (artificial neural network, support vector machine, and logistic regression) were established and optimized to predict IC formation. To provide more reliable approaches for different prediction requirements, ML-based linear, recall-first, and precision-first strategies were further established based on the ML models for the maximum recall or precision values. The proposed recall-first strategy identified all positive samples to avoid missing data in the prediction, and the precision-first strategy accurately identified positive samples to reduce the number of validation experiments. The ML-based prediction strategies for IC formation were first established and showed high accuracy and reliability. These strategies provide higher efficiency and lower processing cost solutions for IC screening.

Graphical abstract

Introduction

Inclusion complexes (ICs) are crystalline materials that spontaneously form due to non-covalent intermolecular interactions between guest and host molecules in a solution (Putseys et al., 2010). Similar to cocrystals and salts, ICs are also a type of a multicomponent crystalline solid stabilized by hydrogen bonds and hydrophobic and van der Waals interactions (Biedermann and Schneider, 2016, Park et al., 2022). This supramolecular strategy is one of the most widely used methods in the fields of environmental engineering, food and drug delivery, supramolecular chemistry, and agriculture (Morin-Crini et al., 2021, Peixiao et al., 2018). Cyclodextrins (CDs) have a hollow ring structure with a stable hydrophobic cavity and an external hydrophilic surface (Loftsson and Duchene, 2007). Therefore, they can be used in combination with drugs or small organic molecules to improve the water solubility and stability of small hydrophobic organic molecules (Brewster and Loftsson, 2007). In addition, CDs can affect the physicochemical properties of guest molecules, such as bitterness, controlling the release rate, decreasing the volatility of the compound, and implementing targeted therapy (Assadpour and Mahdi Jafari, 2019, Szejtli and Szente, 2005, Tian et al., 2021, Topuz and Uyar, 2019, Zhao et al., 2019a). CD-ICs are classified as “new-improved drugs” with superior efficacy and to the global trend of new-drug research and development in an environmentally friendly manner (Hu et al., 2014, Topuz et al., 2021). Moreover, the wide industrial applications of CD-ICs, such as dissolving organic pollutants in water and soil and improving the bioavailability and sustainability of food formulations, also prove that they are indeed favorable tools with high efficiency, green fabrication, and low energy consumption (Del Valle, 2004, Landy et al., 2012, Rezaei et al., 2019).

CD-ICs exhibit considerable potential for the development of new drugs and functional materials; however, there are no efficient-screening approaches for researchers owing to the poor understanding of the mechanism and complex interactions in ICs (Gao et al., 2020, Zhang et al., 2018). Trial-and-error experiments can be used as a screening approach; however, they have poor efficiencies and low success rates in the discovery of ICs (Dhoot et al., 2019, Ouyang, 2015). Therefore, it is essential to establish robust-screening strategies for the prediction and directional design of CD-IC formulations. Machine-learning (ML) algorithms are powerful tools for identifying patterns and making decisions using minimal background information for a particular issue. It has been successfully applied in multiple areas, particularly drug discovery and materials science (Song et al., 2021, Sousa et al., 2021, Zhao et al., 2021). Recently, researchers have attempted to apply a single ML model or statistical approach to determine the critical factors of CD-IC formation (Butler et al., 2018, Di et al., 2020, Stephenson et al., 2019, Zhao et al., 2019b). However, the lack of experimental verification of the positive (CD-ICs can form) and negative (CD-ICs cannot form) datasets and imbalanced samples result in difficulties in model training and hyperparameter tuning. Meanwhile, there are limitations to using a single ML model to accurately identify the most suitable CD-ICs. Therefore, developing the prediction models based on multiple ML methods and using verified and balanced datasets can provide more accurate results, reduce wasteful experimental screening, and improve the efficiency of early-stage drug development or expanding capacities for expensive compounds (Zhang et al., 2017, Zhang et al., 2021).

Researchers are accustomed to selecting an ML model that shows the best performance during testing for drug discovery and material sciences. It is relatively easy to identify the best algorithm via a single or several evaluation metrics in the testing set. However, as previously discussed, the use of only a single parameter reduces the comprehensibility of the model and leads to a poor generalization performance. Attempts are being made to optimize ML models from several aspects to adapt to practical applications, including application of new methodologies for addressing limitations (Vriza et al., 2020), and selecting descriptors based on the in-depth understanding of specific behaviors to improve the prediction performance (Devogelaer et al., 2020, Wang et al., 2020b, Yiming Ma et al., 2021). However, challenges are encountered in efficiently selecting an ML model that is widely applicable to multiple needs because it is difficult to optimize all evaluation metrics, such as accuracy, precision, and recall, using a single ML model. Besides, only using one evaluation metric will lead to errors in the classification of samples near the decision boundary, which still requires supplementation by several experimental verifications to avoid errors in prediction (Pereira, 2020, Wang et al., 2020a). Therefore, it is necessary to construct corresponding ML-based design strategies according to the different requirements of the actual application scenarios.

Here, β-CD was considered the host molecule exhibiting the advantages of high stability and low price. The positive and negative datasets adopted in the models and strategies each comprised 100 experimentally validated samples, which were selected based on their high commercial value and application potential. Three-supervised ML algorithms, artificial neural networks (ANNs), support vector machines (SVMs), and logistic regression (LR), were explored for the CD-IC prediction of pharmaceutically relevant molecules in an aqueous solution. Three strategies using a combination of ANN, SVM, and LR models were developed and applied to satisfy the various prediction requirements. Additionally, five compounds, isonicotinamide, levulinic acid, prednisolone, 9-fluorenone, and saccharin, were used as validation sets to verify the accuracy and reliability of the ML-based screening strategies. Based on the prediction results, three CD-ICs of prednisolone, 9-fluorenone, and saccharin in the validation set were successfully experimentally prepared for the first time.

Section snippets

Methodology

Three successive steps, including construction, judgment, and validation were performed, as illustrated in Fig. 1. A large number of compounds were screened and experimentally corrected, and classified into positive IC (recorded as label 1) and negative non-IC (recorded as label 0) groups. The experiments were conducted using a cooling crystallization process to identify the new CD-ICs. Two hundred drugs or relevant molecules with actual or potential pharmaceutical applications were selected to

Construction and assessment of the ML Models: ANN, SVM, and LR

A slight change in the model structure and hyperparameters of the same ML algorithm may lead to significant differences in the classification performance of the ML models. The prediction ability of the ANN model was determined by factors, such as the number of nodes, network structure, and epochs. The results of a comprehensive test of the influence of the network structure on the prediction ability are presented in Table S2. The optimization result indicated that the selected ANN structure,

Conclusion and outlook

Here, we first developed three ML models, namely ANN, SVM and LR, for predicting the formation of the CD-ICs, and the three ML-based strategies that were based on different prediction purposes demonstrated high efficiency and accuracy. A total of 200 compounds were screened in cooling crystallization experiments to collect the balanced positive (forming CD-ICs) and negative datasets of the compound samples. The samples were quantitatively described according to the structural description and

CRediT authorship contribution statement

Yiming Ma: Conceptualization, Methodology, Validation, Formal analysis, Investigation, Data curation, Writing – original draft, Writing – review & editing, Visualization. Yue Niu: Data curation, Formal analysis, Validation, Writing – original draft, Writing – review & editing, Validation, Investigation. Huaiyu Yang: . Jiayu Dai: Formal analysis, Writing – original draft. Jiawei Lin: Validation, Investigation. Huiqi Wang: Validation, Investigation. Songgu Wu: . Qiuxiang Yin: Writing – review &

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

Y. Ma acknowledges the support of China Scholarship Council. The authors are grateful to the financial support of National Natural Science Foundation of China (No. NNSFC 22111530115) and the Tianjin Municipal Natural Science Foundation (No.21JCYBJC00600).

References (50)

O. Aleem et al.
Effect of beta-cyclodextrin and hydroxypropyl beta-cyclodextrin complexation on physicochemical properties and antimicrobial activity of cefdinir
J. Pharm. Biomed. Anal.
(2008)
M.E. Brewster et al.
Cyclodextrins as pharmaceutical solubilizers
Adv Drug Deliv Rev
(2007)
E.M.M. Del Valle
Cyclodextrins and their uses: a review
Process Biochem.
(2004)
C.D. Jayaweera et al.
Multi-objective dynamic optimization of seeded suspension polymerization process
Chemical Engineering Journal 426.
(2021)
J. Lee et al.
Understanding the effect of specialization on hospital performance through knowledge-guided machine learning
Comput. Chem. Eng.
(2019)
T. Loftsson et al.
Cyclodextrins and their pharmaceutical applications
Int J Pharm
(2007)
J. Park et al.
Size compatibility and concentration dependent supramolecular host-guest interactions at interfaces
Nat Commun
(2022)
J.A. Putseys et al.
Amylose-inclusion complexes: Formation, identity and physico-chemical properties
J. Cereal Sci.
(2010)
A. Rezaei et al.
Nanoencapsulation of hydrophobic and low-soluble food bioactive compounds within different nanocarriers
Food Hydrocolloids
(2019)
J. Szejtli et al.
Elimination of bitter, disgusting tastes of drugs and foods by cyclodextrins
Eur. J. Pharm. Biopharm.
(2005)

M.K. Zhang et al.

Enhanced solubility and antimicrobial activity of alamethicin in aqueous solution by complexation with gamma-cyclodextrin

J. Funct. Foods

(2018)

L. Zhang et al.

From machine learning to deep learning: progress in machine intelligence for rational drug discovery

Drug Discovery Today

(2017)

Q. Zhao et al.

Predicting complexation performance between cyclodextrins and guest molecules by integrated machine learning and molecular modeling techniques

Acta Pharm Sin B

(2019)

E. Assadpour et al.

A systematic review on nanoencapsulation of food bioactive ingredients and nutraceuticals by various nanocarriers

Crit. Rev. Food Sci. Nutr.

(2019)

Y. Bian et al.

Prediction of Orthosteric and Allosteric Regulations on Cannabinoid Receptors Using Supervised Machine Learning Classifiers

Mol Pharm

(2019)

F. Biedermann et al.

Experimental Binding Energies in Supramolecular Complexes

Chem Rev

(2016)

K.T. Butler et al.

Machine learning for molecular and materials science

Nature

(2018)

J.-J. Devogelaer et al.

Co-crystal Prediction by Artificial Neural Networks

Angew Chem Int Ed Engl

(2020)

A.S. Dhoot et al.

Design of Experiments in Pharmaceutical Development

Pharm. Chem. J.

(2019)

P.W. Di et al.

In silico prediction of binding capacity and interaction forces of organic compounds with alpha- and beta-cyclodextrins

J. Mol. Liq.

(2020)

Molecular Operating Environment (MOE), Version 2019.0102; Chemical Computing Group ULC: Montreal,...

S. Gao et al.

Encapsulation of thiabendazole in hydroxypropyl-beta-cyclodextrin nanofibers via polymer-free electrospinning and its characterization

Pest Manag. Sci.

(2020)

J. Hu et al.

Plasma-Induced Grafting of Cyclodextrin onto Multiwall Carbon Nanotube/Iron Oxides for Adsorbent Application

J. Phys. Chem. B

(2010)

Q.-D. Hu et al.

Cyclodextrin-Based Host-Guest Supramolecular Nanoparticles for Delivery: From Design to Applications

Acc. Chem. Res.

(2014)

M. Kshirsagar et al.

Techniques for transferring host-pathogen protein interactions knowledge to new tasks

Front Microbiol

(2015)

Cited by (7)

Efficient cocrystal coformer screening based on a Machine learning Strategy: A case study for the preparation of imatinib cocrystal with enhanced physicochemical properties
2024, European Journal of Pharmaceutics and Biopharmaceutics
Cocrystal engineering, which involves the self-assembly of two or more components into a solid-state supramolecular structure through non-covalent interactions, has emerged as a promising approach to tailor the physicochemical properties of active pharmaceutical ingredient (API). Efficient coformer screening for cocrystal remains a challenge. Herein, a prediction strategy based on machine learning algorithms was employed to predict cocrystal formation and seven reliable models with accuracy over 0.890 were successfully constructed. Imatinib was selected as the model drug and the models established were applied to screen 31 potential coformers. Experimental verification results indicated RF-8 is the optimal model among seven models with an accuracy of 0.839. When the seven models were combined for coformer screening of Imatinib, the combinational model achieved an accuracy of 0.903, and eight new solid forms were observed and characterized. Benefiting from intermolecular interactions, the obtained multicomponent crystals displayed enhanced physicochemical properties. Dissolution and solubility experiments showed the prepared multicomponent crystals had higher cumulative dissolution rate and remarkably improved the solubility of imatinib, and IM-MC exhibited comparable solubility to Imatinib mesylate α form. Stability test and cytotoxicity results showed that multicomponent crystals exhibited excellent stability and the drug-drug cocrystal IM-5F exhibited higher cytotoxicity than pure API.
QSPR models for complexation performance of α-cyclodextrin and β-cyclodextrin complexes by norm indices
2024, Chemical Engineering Science
A generic method for predicting the complexation performance of cyclodextrins complexes is desirable. Two quantitative structure–property relationship models are established for the complexation performance of α- and β-cyclodextrins. The datasets consist of 229 α-cyclodextrin and 330 β-cyclodextrin complexes, including conventional organic molecules, ionic organic compounds, and drugs. Internal validation, indicated by the squared correlation coefficient (Q²) of 0.8949 and 0.8954 for α- and β-cyclodextrin models, demonstrates the stability and robustness of these models. For external validation, the squared correlation coefficients of test sets (R²_test) are 0.9315 (α-cyclodextrin model) and 0.9171 (β-cyclodextrin model), confirming the predictive ability of both models. Furthermore, the complexation performances of two ionic liquids are measured using UV spectrometric titration for our models’ external prediction. The mean absolute error between measured and calculated values for β-cyclodextrin complexes is 0.20 M⁻¹, indicating the satisfactory transferability of the model. For the α-cyclodextrin model, the mean absolute error of 1.25 M⁻¹ is deemed acceptable. Both the model validations and experimental results underscore the robustness and predictive performance of these models. In conclusion, these developed models could effectively predict binding constants of several types of molecules with cyclodextrins, which can become valuable tools for cyclodextrin design.
A new tool to predict the advanced oxidation process efficiency: Using machine learning methods to predict the degradation of organic pollutants with Fe-carbon catalyst as a sample
2023, Chemical Engineering Science
Herein, machine learning approaches were employed to predict the kinetic constant of the organic pollutant degradation process in a peroxymonosulfate environment with a typical Fe-carbon catalyst. After adjusting the hyperparameters and missing data imputation, an artificial neural network model was established, and the R² value reached 0.9272. The model shows that catalyst dosage (12.4145%), pore volume (7.0642%), pollutant dosage (6.3571%), S value (5.3543%), and B value (4.2421%) of the linear solvation energy relation (LSER) model of pollutant are the top five important variables of all. Additionally, in the catalyst properties, pore volume, Fe-Nx content and graphitic N content have strongly positive effects, while specific surface area and oxygen content significantly inhibit the procedure. This work demonstrates a new optimization method for predicting the AOP efficiency, which further helps researchers recognize the process from a broad, comprehensive and innovative perspective.
Machine learning assisted photothermal conversion efficiency prediction of anticancer photothermal agents
2023, Chemical Engineering Science
Photothermal therapy (PTT) is a minimally invasive and promisingly effective strategy for thermal ablation of tumors. There is an urgent need for the development of ideal organic photothermal agents (PTAs) with high photothermal conversion efficiency (PCE). Machine learning (ML)-assisted predictions of PCE could offer an efficient way for early screening of PTAs. Herein, 44 organic PTAs were collected from the literature as a dataset to establish a best-performed regression model by comparing different ML methods, in which R², Pears, and RMSE were 0.761, 0.913, and 0.058, respectively. Then, the reliability of the model was further verified by predicting two newly designed PTAs. The double bond of tetraphenylethylene (TPE) was found to be an important substructure to enhance PCE by the Shapley additive explanations method. The results show that ML can provide a valuable tool for predicting PCE of PTAs, thus promoting the development of photothermal therapy for cancer.
Machine Learning Prediction of Flavonoid Cocrystal Formation Combined with Experimental Validation
2023, Industrial and Engineering Chemistry Research
Insights into Structural Features and Ternary Phase Diagrams of Prednisolone/β-Cyclodextrin Inclusion Complex
2023, Crystal Growth and Design

View all citing articles on Scopus

¹: These authors contributed equally to this work.

View full text

Prediction and design of cyclodextrin inclusion complexes formation via machine learning-based strategies

Highlights

Abstract

Graphical abstract

Introduction

Section snippets

Methodology

Construction and assessment of the ML Models: ANN, SVM, and LR

Conclusion and outlook

CRediT authorship contribution statement

Declaration of Competing Interest

Acknowledgements

J. Pharm. Biomed. Anal.

Adv Drug Deliv Rev

Process Biochem.

Chemical Engineering Journal 426.

Comput. Chem. Eng.

Int J Pharm

Nat Commun

J. Cereal Sci.

Food Hydrocolloids

Eur. J. Pharm. Biopharm.

J. Funct. Foods

Drug Discovery Today

Acta Pharm Sin B

A systematic review on nanoencapsulation of food bioactive ingredients and nutraceuticals by various nanocarriers

Crit. Rev. Food Sci. Nutr.

Prediction of Orthosteric and Allosteric Regulations on Cannabinoid Receptors Using Supervised Machine Learning Classifiers

Mol Pharm

Experimental Binding Energies in Supramolecular Complexes

Chem Rev

Machine learning for molecular and materials science

Nature

Co-crystal Prediction by Artificial Neural Networks

Angew Chem Int Ed Engl

Design of Experiments in Pharmaceutical Development

Pharm. Chem. J.

In silico prediction of binding capacity and interaction forces of organic compounds with alpha- and beta-cyclodextrins

J. Mol. Liq.

Encapsulation of thiabendazole in hydroxypropyl-beta-cyclodextrin nanofibers via polymer-free electrospinning and its characterization

Pest Manag. Sci.

Plasma-Induced Grafting of Cyclodextrin onto Multiwall Carbon Nanotube/Iron Oxides for Adsorbent Application

J. Phys. Chem. B

Cyclodextrin-Based Host-Guest Supramolecular Nanoparticles for Delivery: From Design to Applications

Acc. Chem. Res.

Techniques for transferring host-pathogen protein interactions knowledge to new tasks

Front Microbiol