Prediction and design of cyclodextrin inclusion complexes formation via machine learning-based strategies
Graphical abstract
Introduction
Inclusion complexes (ICs) are crystalline materials that spontaneously form due to non-covalent intermolecular interactions between guest and host molecules in a solution (Putseys et al., 2010). Similar to cocrystals and salts, ICs are also a type of a multicomponent crystalline solid stabilized by hydrogen bonds and hydrophobic and van der Waals interactions (Biedermann and Schneider, 2016, Park et al., 2022). This supramolecular strategy is one of the most widely used methods in the fields of environmental engineering, food and drug delivery, supramolecular chemistry, and agriculture (Morin-Crini et al., 2021, Peixiao et al., 2018). Cyclodextrins (CDs) have a hollow ring structure with a stable hydrophobic cavity and an external hydrophilic surface (Loftsson and Duchene, 2007). Therefore, they can be used in combination with drugs or small organic molecules to improve the water solubility and stability of small hydrophobic organic molecules (Brewster and Loftsson, 2007). In addition, CDs can affect the physicochemical properties of guest molecules, such as bitterness, controlling the release rate, decreasing the volatility of the compound, and implementing targeted therapy (Assadpour and Mahdi Jafari, 2019, Szejtli and Szente, 2005, Tian et al., 2021, Topuz and Uyar, 2019, Zhao et al., 2019a). CD-ICs are classified as “new-improved drugs” with superior efficacy and to the global trend of new-drug research and development in an environmentally friendly manner (Hu et al., 2014, Topuz et al., 2021). Moreover, the wide industrial applications of CD-ICs, such as dissolving organic pollutants in water and soil and improving the bioavailability and sustainability of food formulations, also prove that they are indeed favorable tools with high efficiency, green fabrication, and low energy consumption (Del Valle, 2004, Landy et al., 2012, Rezaei et al., 2019).
CD-ICs exhibit considerable potential for the development of new drugs and functional materials; however, there are no efficient-screening approaches for researchers owing to the poor understanding of the mechanism and complex interactions in ICs (Gao et al., 2020, Zhang et al., 2018). Trial-and-error experiments can be used as a screening approach; however, they have poor efficiencies and low success rates in the discovery of ICs (Dhoot et al., 2019, Ouyang, 2015). Therefore, it is essential to establish robust-screening strategies for the prediction and directional design of CD-IC formulations. Machine-learning (ML) algorithms are powerful tools for identifying patterns and making decisions using minimal background information for a particular issue. It has been successfully applied in multiple areas, particularly drug discovery and materials science (Song et al., 2021, Sousa et al., 2021, Zhao et al., 2021). Recently, researchers have attempted to apply a single ML model or statistical approach to determine the critical factors of CD-IC formation (Butler et al., 2018, Di et al., 2020, Stephenson et al., 2019, Zhao et al., 2019b). However, the lack of experimental verification of the positive (CD-ICs can form) and negative (CD-ICs cannot form) datasets and imbalanced samples result in difficulties in model training and hyperparameter tuning. Meanwhile, there are limitations to using a single ML model to accurately identify the most suitable CD-ICs. Therefore, developing the prediction models based on multiple ML methods and using verified and balanced datasets can provide more accurate results, reduce wasteful experimental screening, and improve the efficiency of early-stage drug development or expanding capacities for expensive compounds (Zhang et al., 2017, Zhang et al., 2021).
Researchers are accustomed to selecting an ML model that shows the best performance during testing for drug discovery and material sciences. It is relatively easy to identify the best algorithm via a single or several evaluation metrics in the testing set. However, as previously discussed, the use of only a single parameter reduces the comprehensibility of the model and leads to a poor generalization performance. Attempts are being made to optimize ML models from several aspects to adapt to practical applications, including application of new methodologies for addressing limitations (Vriza et al., 2020), and selecting descriptors based on the in-depth understanding of specific behaviors to improve the prediction performance (Devogelaer et al., 2020, Wang et al., 2020b, Yiming Ma et al., 2021). However, challenges are encountered in efficiently selecting an ML model that is widely applicable to multiple needs because it is difficult to optimize all evaluation metrics, such as accuracy, precision, and recall, using a single ML model. Besides, only using one evaluation metric will lead to errors in the classification of samples near the decision boundary, which still requires supplementation by several experimental verifications to avoid errors in prediction (Pereira, 2020, Wang et al., 2020a). Therefore, it is necessary to construct corresponding ML-based design strategies according to the different requirements of the actual application scenarios.
Here, β-CD was considered the host molecule exhibiting the advantages of high stability and low price. The positive and negative datasets adopted in the models and strategies each comprised 100 experimentally validated samples, which were selected based on their high commercial value and application potential. Three-supervised ML algorithms, artificial neural networks (ANNs), support vector machines (SVMs), and logistic regression (LR), were explored for the CD-IC prediction of pharmaceutically relevant molecules in an aqueous solution. Three strategies using a combination of ANN, SVM, and LR models were developed and applied to satisfy the various prediction requirements. Additionally, five compounds, isonicotinamide, levulinic acid, prednisolone, 9-fluorenone, and saccharin, were used as validation sets to verify the accuracy and reliability of the ML-based screening strategies. Based on the prediction results, three CD-ICs of prednisolone, 9-fluorenone, and saccharin in the validation set were successfully experimentally prepared for the first time.
Section snippets
Methodology
Three successive steps, including construction, judgment, and validation were performed, as illustrated in Fig. 1. A large number of compounds were screened and experimentally corrected, and classified into positive IC (recorded as label 1) and negative non-IC (recorded as label 0) groups. The experiments were conducted using a cooling crystallization process to identify the new CD-ICs. Two hundred drugs or relevant molecules with actual or potential pharmaceutical applications were selected to
Construction and assessment of the ML Models: ANN, SVM, and LR
A slight change in the model structure and hyperparameters of the same ML algorithm may lead to significant differences in the classification performance of the ML models. The prediction ability of the ANN model was determined by factors, such as the number of nodes, network structure, and epochs. The results of a comprehensive test of the influence of the network structure on the prediction ability are presented in Table S2. The optimization result indicated that the selected ANN structure,
Conclusion and outlook
Here, we first developed three ML models, namely ANN, SVM and LR, for predicting the formation of the CD-ICs, and the three ML-based strategies that were based on different prediction purposes demonstrated high efficiency and accuracy. A total of 200 compounds were screened in cooling crystallization experiments to collect the balanced positive (forming CD-ICs) and negative datasets of the compound samples. The samples were quantitatively described according to the structural description and
CRediT authorship contribution statement
Yiming Ma: Conceptualization, Methodology, Validation, Formal analysis, Investigation, Data curation, Writing – original draft, Writing – review & editing, Visualization. Yue Niu: Data curation, Formal analysis, Validation, Writing – original draft, Writing – review & editing, Validation, Investigation. Huaiyu Yang: . Jiayu Dai: Formal analysis, Writing – original draft. Jiawei Lin: Validation, Investigation. Huiqi Wang: Validation, Investigation. Songgu Wu: . Qiuxiang Yin: Writing – review &
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgements
Y. Ma acknowledges the support of China Scholarship Council. The authors are grateful to the financial support of National Natural Science Foundation of China (No. NNSFC 22111530115) and the Tianjin Municipal Natural Science Foundation (No.21JCYBJC00600).
References (50)
- et al.
Effect of beta-cyclodextrin and hydroxypropyl beta-cyclodextrin complexation on physicochemical properties and antimicrobial activity of cefdinir
J. Pharm. Biomed. Anal.
(2008) - et al.
Cyclodextrins as pharmaceutical solubilizers
Adv Drug Deliv Rev
(2007) Cyclodextrins and their uses: a review
Process Biochem.
(2004)- et al.
Multi-objective dynamic optimization of seeded suspension polymerization process
Chemical Engineering Journal 426.
(2021) - et al.
Understanding the effect of specialization on hospital performance through knowledge-guided machine learning
Comput. Chem. Eng.
(2019) - et al.
Cyclodextrins and their pharmaceutical applications
Int J Pharm
(2007) - et al.
Size compatibility and concentration dependent supramolecular host-guest interactions at interfaces
Nat Commun
(2022) - et al.
Amylose-inclusion complexes: Formation, identity and physico-chemical properties
J. Cereal Sci.
(2010) - et al.
Nanoencapsulation of hydrophobic and low-soluble food bioactive compounds within different nanocarriers
Food Hydrocolloids
(2019) - et al.
Elimination of bitter, disgusting tastes of drugs and foods by cyclodextrins
Eur. J. Pharm. Biopharm.
(2005)
Enhanced solubility and antimicrobial activity of alamethicin in aqueous solution by complexation with gamma-cyclodextrin
J. Funct. Foods
From machine learning to deep learning: progress in machine intelligence for rational drug discovery
Drug Discovery Today
Predicting complexation performance between cyclodextrins and guest molecules by integrated machine learning and molecular modeling techniques
Acta Pharm Sin B
A systematic review on nanoencapsulation of food bioactive ingredients and nutraceuticals by various nanocarriers
Crit. Rev. Food Sci. Nutr.
Prediction of Orthosteric and Allosteric Regulations on Cannabinoid Receptors Using Supervised Machine Learning Classifiers
Mol Pharm
Experimental Binding Energies in Supramolecular Complexes
Chem Rev
Machine learning for molecular and materials science
Nature
Co-crystal Prediction by Artificial Neural Networks
Angew Chem Int Ed Engl
Design of Experiments in Pharmaceutical Development
Pharm. Chem. J.
In silico prediction of binding capacity and interaction forces of organic compounds with alpha- and beta-cyclodextrins
J. Mol. Liq.
Encapsulation of thiabendazole in hydroxypropyl-beta-cyclodextrin nanofibers via polymer-free electrospinning and its characterization
Pest Manag. Sci.
Plasma-Induced Grafting of Cyclodextrin onto Multiwall Carbon Nanotube/Iron Oxides for Adsorbent Application
J. Phys. Chem. B
Cyclodextrin-Based Host-Guest Supramolecular Nanoparticles for Delivery: From Design to Applications
Acc. Chem. Res.
Techniques for transferring host-pathogen protein interactions knowledge to new tasks
Front Microbiol
Cited by (7)
Efficient cocrystal coformer screening based on a Machine learning Strategy: A case study for the preparation of imatinib cocrystal with enhanced physicochemical properties
2024, European Journal of Pharmaceutics and BiopharmaceuticsQSPR models for complexation performance of α-cyclodextrin and β-cyclodextrin complexes by norm indices
2024, Chemical Engineering ScienceMachine learning assisted photothermal conversion efficiency prediction of anticancer photothermal agents
2023, Chemical Engineering ScienceMachine Learning Prediction of Flavonoid Cocrystal Formation Combined with Experimental Validation
2023, Industrial and Engineering Chemistry ResearchInsights into Structural Features and Ternary Phase Diagrams of Prednisolone/β-Cyclodextrin Inclusion Complex
2023, Crystal Growth and Design
- 1
These authors contributed equally to this work.