Prediction and design of cyclodextrin inclusion complexes formation via machine learning-based strategies

https://doi.org/10.1016/j.ces.2022.117946Get rights and content

Highlights

  • Establishing three cyclodextrin inclusion complexes predicting models based on machine learning.

  • Establishing the recall-first strategy to avoid predicting missing.

  • Establishing the precision-first strategy to reduce the number of experimental verifications.

  • Finding three new inclusion complexes using prednisolone, 9-fluorenone, and saccharin with cyclodextrin.

Abstract

This study reports a machine-learning (ML) method to develop multi-purpose prediction strategies for the formation of cyclodextrin inclusion complexes (ICs) in aqueous solutions. A balanced dataset of pharmaceutically relevant molecules was constructed using experimental verification. Three ML models (artificial neural network, support vector machine, and logistic regression) were established and optimized to predict IC formation. To provide more reliable approaches for different prediction requirements, ML-based linear, recall-first, and precision-first strategies were further established based on the ML models for the maximum recall or precision values. The proposed recall-first strategy identified all positive samples to avoid missing data in the prediction, and the precision-first strategy accurately identified positive samples to reduce the number of validation experiments. The ML-based prediction strategies for IC formation were first established and showed high accuracy and reliability. These strategies provide higher efficiency and lower processing cost solutions for IC screening.

Introduction

Inclusion complexes (ICs) are crystalline materials that spontaneously form due to non-covalent intermolecular interactions between guest and host molecules in a solution (Putseys et al., 2010). Similar to cocrystals and salts, ICs are also a type of a multicomponent crystalline solid stabilized by hydrogen bonds and hydrophobic and van der Waals interactions (Biedermann and Schneider, 2016, Park et al., 2022). This supramolecular strategy is one of the most widely used methods in the fields of environmental engineering, food and drug delivery, supramolecular chemistry, and agriculture (Morin-Crini et al., 2021, Peixiao et al., 2018). Cyclodextrins (CDs) have a hollow ring structure with a stable hydrophobic cavity and an external hydrophilic surface (Loftsson and Duchene, 2007). Therefore, they can be used in combination with drugs or small organic molecules to improve the water solubility and stability of small hydrophobic organic molecules (Brewster and Loftsson, 2007). In addition, CDs can affect the physicochemical properties of guest molecules, such as bitterness, controlling the release rate, decreasing the volatility of the compound, and implementing targeted therapy (Assadpour and Mahdi Jafari, 2019, Szejtli and Szente, 2005, Tian et al., 2021, Topuz and Uyar, 2019, Zhao et al., 2019a). CD-ICs are classified as “new-improved drugs” with superior efficacy and to the global trend of new-drug research and development in an environmentally friendly manner (Hu et al., 2014, Topuz et al., 2021). Moreover, the wide industrial applications of CD-ICs, such as dissolving organic pollutants in water and soil and improving the bioavailability and sustainability of food formulations, also prove that they are indeed favorable tools with high efficiency, green fabrication, and low energy consumption (Del Valle, 2004, Landy et al., 2012, Rezaei et al., 2019).

CD-ICs exhibit considerable potential for the development of new drugs and functional materials; however, there are no efficient-screening approaches for researchers owing to the poor understanding of the mechanism and complex interactions in ICs (Gao et al., 2020, Zhang et al., 2018). Trial-and-error experiments can be used as a screening approach; however, they have poor efficiencies and low success rates in the discovery of ICs (Dhoot et al., 2019, Ouyang, 2015). Therefore, it is essential to establish robust-screening strategies for the prediction and directional design of CD-IC formulations. Machine-learning (ML) algorithms are powerful tools for identifying patterns and making decisions using minimal background information for a particular issue. It has been successfully applied in multiple areas, particularly drug discovery and materials science (Song et al., 2021, Sousa et al., 2021, Zhao et al., 2021). Recently, researchers have attempted to apply a single ML model or statistical approach to determine the critical factors of CD-IC formation (Butler et al., 2018, Di et al., 2020, Stephenson et al., 2019, Zhao et al., 2019b). However, the lack of experimental verification of the positive (CD-ICs can form) and negative (CD-ICs cannot form) datasets and imbalanced samples result in difficulties in model training and hyperparameter tuning. Meanwhile, there are limitations to using a single ML model to accurately identify the most suitable CD-ICs. Therefore, developing the prediction models based on multiple ML methods and using verified and balanced datasets can provide more accurate results, reduce wasteful experimental screening, and improve the efficiency of early-stage drug development or expanding capacities for expensive compounds (Zhang et al., 2017, Zhang et al., 2021).

Researchers are accustomed to selecting an ML model that shows the best performance during testing for drug discovery and material sciences. It is relatively easy to identify the best algorithm via a single or several evaluation metrics in the testing set. However, as previously discussed, the use of only a single parameter reduces the comprehensibility of the model and leads to a poor generalization performance. Attempts are being made to optimize ML models from several aspects to adapt to practical applications, including application of new methodologies for addressing limitations (Vriza et al., 2020), and selecting descriptors based on the in-depth understanding of specific behaviors to improve the prediction performance (Devogelaer et al., 2020, Wang et al., 2020b, Yiming Ma et al., 2021). However, challenges are encountered in efficiently selecting an ML model that is widely applicable to multiple needs because it is difficult to optimize all evaluation metrics, such as accuracy, precision, and recall, using a single ML model. Besides, only using one evaluation metric will lead to errors in the classification of samples near the decision boundary, which still requires supplementation by several experimental verifications to avoid errors in prediction (Pereira, 2020, Wang et al., 2020a). Therefore, it is necessary to construct corresponding ML-based design strategies according to the different requirements of the actual application scenarios.

Here, β-CD was considered the host molecule exhibiting the advantages of high stability and low price. The positive and negative datasets adopted in the models and strategies each comprised 100 experimentally validated samples, which were selected based on their high commercial value and application potential. Three-supervised ML algorithms, artificial neural networks (ANNs), support vector machines (SVMs), and logistic regression (LR), were explored for the CD-IC prediction of pharmaceutically relevant molecules in an aqueous solution. Three strategies using a combination of ANN, SVM, and LR models were developed and applied to satisfy the various prediction requirements. Additionally, five compounds, isonicotinamide, levulinic acid, prednisolone, 9-fluorenone, and saccharin, were used as validation sets to verify the accuracy and reliability of the ML-based screening strategies. Based on the prediction results, three CD-ICs of prednisolone, 9-fluorenone, and saccharin in the validation set were successfully experimentally prepared for the first time.

Section snippets

Methodology

Three successive steps, including construction, judgment, and validation were performed, as illustrated in Fig. 1. A large number of compounds were screened and experimentally corrected, and classified into positive IC (recorded as label 1) and negative non-IC (recorded as label 0) groups. The experiments were conducted using a cooling crystallization process to identify the new CD-ICs. Two hundred drugs or relevant molecules with actual or potential pharmaceutical applications were selected to

Construction and assessment of the ML Models: ANN, SVM, and LR

A slight change in the model structure and hyperparameters of the same ML algorithm may lead to significant differences in the classification performance of the ML models. The prediction ability of the ANN model was determined by factors, such as the number of nodes, network structure, and epochs. The results of a comprehensive test of the influence of the network structure on the prediction ability are presented in Table S2. The optimization result indicated that the selected ANN structure,

Conclusion and outlook

Here, we first developed three ML models, namely ANN, SVM and LR, for predicting the formation of the CD-ICs, and the three ML-based strategies that were based on different prediction purposes demonstrated high efficiency and accuracy. A total of 200 compounds were screened in cooling crystallization experiments to collect the balanced positive (forming CD-ICs) and negative datasets of the compound samples. The samples were quantitatively described according to the structural description and

CRediT authorship contribution statement

Yiming Ma: Conceptualization, Methodology, Validation, Formal analysis, Investigation, Data curation, Writing – original draft, Writing – review & editing, Visualization. Yue Niu: Data curation, Formal analysis, Validation, Writing – original draft, Writing – review & editing, Validation, Investigation. Huaiyu Yang: . Jiayu Dai: Formal analysis, Writing – original draft. Jiawei Lin: Validation, Investigation. Huiqi Wang: Validation, Investigation. Songgu Wu: . Qiuxiang Yin: Writing – review &

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

Y. Ma acknowledges the support of China Scholarship Council. The authors are grateful to the financial support of National Natural Science Foundation of China (No. NNSFC 22111530115) and the Tianjin Municipal Natural Science Foundation (No.21JCYBJC00600).

References (50)

  • M.K. Zhang et al.

    Enhanced solubility and antimicrobial activity of alamethicin in aqueous solution by complexation with gamma-cyclodextrin

    J. Funct. Foods

    (2018)
  • L. Zhang et al.

    From machine learning to deep learning: progress in machine intelligence for rational drug discovery

    Drug Discovery Today

    (2017)
  • Q. Zhao et al.

    Predicting complexation performance between cyclodextrins and guest molecules by integrated machine learning and molecular modeling techniques

    Acta Pharm Sin B

    (2019)
  • E. Assadpour et al.

    A systematic review on nanoencapsulation of food bioactive ingredients and nutraceuticals by various nanocarriers

    Crit. Rev. Food Sci. Nutr.

    (2019)
  • Y. Bian et al.

    Prediction of Orthosteric and Allosteric Regulations on Cannabinoid Receptors Using Supervised Machine Learning Classifiers

    Mol Pharm

    (2019)
  • F. Biedermann et al.

    Experimental Binding Energies in Supramolecular Complexes

    Chem Rev

    (2016)
  • K.T. Butler et al.

    Machine learning for molecular and materials science

    Nature

    (2018)
  • J.-J. Devogelaer et al.

    Co-crystal Prediction by Artificial Neural Networks

    Angew Chem Int Ed Engl

    (2020)
  • A.S. Dhoot et al.

    Design of Experiments in Pharmaceutical Development

    Pharm. Chem. J.

    (2019)
  • P.W. Di et al.

    In silico prediction of binding capacity and interaction forces of organic compounds with alpha- and beta-cyclodextrins

    J. Mol. Liq.

    (2020)
  • Molecular Operating Environment (MOE), Version 2019.0102; Chemical Computing Group ULC: Montreal,...
  • S. Gao et al.

    Encapsulation of thiabendazole in hydroxypropyl-beta-cyclodextrin nanofibers via polymer-free electrospinning and its characterization

    Pest Manag. Sci.

    (2020)
  • J. Hu et al.

    Plasma-Induced Grafting of Cyclodextrin onto Multiwall Carbon Nanotube/Iron Oxides for Adsorbent Application

    J. Phys. Chem. B

    (2010)
  • Q.-D. Hu et al.

    Cyclodextrin-Based Host-Guest Supramolecular Nanoparticles for Delivery: From Design to Applications

    Acc. Chem. Res.

    (2014)
  • M. Kshirsagar et al.

    Techniques for transferring host-pathogen protein interactions knowledge to new tasks

    Front Microbiol

    (2015)
  • Cited by (7)

    View all citing articles on Scopus
    1

    These authors contributed equally to this work.

    View full text