Abstract
Schistosomiasis is a neglected tropical disease caused by helminths of the Schistosoma genus. Despite its high morbidity and socio-economic burden, therapeutics are just a handful with praziquantel being the main drug. Praziquantel is an old drug registered for human use in 1982 and has since been administered en masse for chemotherapy, risking the development of resistance, thus the need for new drugs with different mechanisms of action. This review examines the use of machine learning (ML) in this era of big data to aid in the prediction of novel antischistosomal molecules. It first discusses the challenges of drug discovery in schistosomiasis. Explanations are then offered for big data, its characteristics and then, some open databases where large biochemical data on schistosomiasis can be obtained for ML model development are examined. The concepts of artificial intelligence, ML, and deep learning and their drug applications are explored in schistosomiasis. The use of binary classification in predicting antischistosomal compounds and some algorithms that have been applied including random forest and naive Bayesian are discussed. For this review, some deep learning algorithms (deep neural networks) are proposed as novel algorithms for predicting antischistosomal molecules via binary classification. Databases specifically designed for housing bioactivity data on antischistosomal molecules enriched with functional genomic datasets and ontologies are thus urgently needed for developing predictive ML models.
Graphic abstract
This shows the application of machine learning techniques for the discovery of novel antischistosomal small molecules via binary classification in the era of big data.
Similar content being viewed by others
References
LoVerde PT (2019) Schistosomiasis. Advances in Experimental Medicine and Biology. Springer, New York LLC, pp 45–70
Adenowo AF, Oyinloye BE, Ogunyinka BI, Kappo AP (2015) Impact of human schistosomiasis in sub-Saharan Africa. Braz J Infect Dis 19:196–205. https://doi.org/10.1016/j.bjid.2014.11.004
Vos T, Abajobir AA, Abbafati C et al (2017) Global, regional, and national incidence, prevalence, and years lived with disability for 328 diseases and injuries for 195 countries, 1990–2016: A systematic analysis for the Global Burden of Disease Study 2016. Lancet 390:1211–1259. https://doi.org/10.1016/S0140-6736(17)32154-2
Freer JB, Bourke CD, Durhuus GH et al (2018) Schistosomiasis in the first 1000 days. Lancet Infect Dis 18:e193–e203
Neves BJ, Dantas RF, Senger MR et al (2016) Discovery of new anti-schistosomal hits by integration of QSAR-based virtual screening and high content screening. J Med Chem 59:7075–7088. https://doi.org/10.1021/acs.jmedchem.5b02038
Moreira-Filho JT, Dantas RF, Senger MR, et al (2019) Shortcuts to schistosomiasis drug discovery: The state-of-the-art. In: Annual Reports in Medicinal Chemistry. Academic Press Inc., pp 139–180
da Siqueira L, P, Fontes DAF, Aguilera CSB, et al (2017) Schistosomiasis: drugs used and treatment strategies. Acta Trop 176:179–187
Bergquist R, Elmorshedy H (2018) Artemether and praziquantel: Origin, mode of action, impact, and suggested application for effective control of human schistosomiasis. Trop. Med. Infect. Dis. 3
Tavares NC, de Aguiar PHN, Gava SG, et al (2016) Schistosomiasis: Setting Routes for Drug Discovery. In: Special Topics in Drug Discovery. InTech
Xu JF, Xu J, Li SZ et al (2013) Transmission risks of schistosomiasis japonica: extraction from back-propagation artificial neural network and logistic regression model. PLoS Negl Trop Dis. https://doi.org/10.1371/journal.pntd.0002123
Caffrey CR, Secor WE (2011) Schistosomiasis: from drug deployment to drug development. Curr Opin Infect Dis 24:410–417
Gouveia M, Brindley P, Gärtner F et al (2018) Drug repurposing for schistosomiasis: combinations of drugs or biomolecules. Pharmaceuticals 11:15. https://doi.org/10.3390/ph11010015
Ponder EL, Freundlich JS, Sarker M, Ekins S (2014) Computational models for neglected diseases: gaps and opportunities. Pharm Res 31:271–277. https://doi.org/10.1007/s11095-013-1170-9
Winkler DA (2021) Use of artificial intelligence and machine learning for discovery of drugs for neglected tropical diseases. Front Chem 9:1–15. https://doi.org/10.3389/fchem.2021.614073
Fusco T, Bi Y, Wang H, Browne F (2020) Data mining and machine learning approaches for prediction modelling of schistosomiasis disease vectors: epidemic disease prediction modelling. Int J Mach Learn Cybern 11:1159–1178. https://doi.org/10.1007/s13042-019-01029-x
Shen Y, Sung MH, King CH et al (2020) Modeling approaches to predicting persistent hotspots in score studies for gaining control of schistosomiasis Mansoni in Kenya and Tanzania. J Infect Dis 221:796–803. https://doi.org/10.1093/infdis/jiz529
Li G, Zhou X, Liu J et al (2018) Comparison of three data mining models for prediction of advanced schistosomiasis prognosis in the Hubei province. PLoS Negl Trop Dis. https://doi.org/10.1371/journal.pntd.0006262
Holmström O, Linder N, Ngasala B et al (2017) Point-of-care mobile digital microscopy and deep learning for the detection of soil-transmitted helminths and Schistosoma haematobium. Glob Health Action. https://doi.org/10.1080/16549716.2017.1337325
Angela MU, Oluwatosin AM (2016) Predicting the Risk of Infection with SCHISTOSOMA HAEMATOBIUM using Machine Learning
Garcia FP, Guedes GP, Belloze KT (2020) Identifying Schistosoma mansoni essential protein candidates based on machine learning. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Springer, pp 123–128
Campos TDL, Young ND, Korhonen PK, et al (2014) Identification of G protein-coupled receptors in Schistosoma haematobium and S. mansoni by comparative genomics. Parasit Vectors 7: 242. https://doi.org/10.1186/1756-3305-7-242
Rojo-Arreola L, Long T, Asarnow D et al (2014) Chemical and genetic validation of the statin drug target to treat the helminth disease. Schistosomiasis. https://doi.org/10.1371/journal.pone.0087594
Gaba S, Jamal S, Drug Discovery Consortium OS, Scaria V (2014) Cheminformatics models for inhibitors of Schistosoma mansoni Thioredoxin glutathione reductase. Sci World J 2014:1–9. https://doi.org/10.1155/2014/957107
Min S, Lee B, Yoon S (2017) Deep learning in bioinformatics. Brief Bioinform 18:851–869. https://doi.org/10.1093/bib/bbw068
Jin X, Wah BW, Cheng X, Wang Y (2015) Significance and challenges of big data research. Big Data Res 2:59–64. https://doi.org/10.1016/j.bdr.2015.01.006
Zhu H (2020) Big data and artificial intelligence modeling for drug discovery. Annu Rev Pharmacol Toxicol 60:573–589
Baro E, Degoul S, Beuscart R, Chazard E (2015) Toward a literature-driven definition of big data in healthcare. Biomed Res Int. https://doi.org/10.1155/2015/639021
Zhao L, Ciallella HL, Aleksunes LM, Zhu H (2020) Advancing computer-aided drug discovery (CADD) by big data and data-driven machine learning modeling. Drug Discov Today 25:1624–1638. https://doi.org/10.1016/j.drudis.2020.07.005
Lo YC, Rensi SE, Torng W, Altman RB (2018) Machine learning in chemoinformatics and drug discovery. Drug Discov Today 23:1538–1546
Kim S (2016) Getting the most out of PubChem for virtual screening. Expert Opin Drug Discov 11:843–855
Gaulton A, Bellis LJ, Bento AP et al (2012) ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res 40:D1100–D1107. https://doi.org/10.1093/nar/gkr777
Wishart DS, Feunang YD, Guo AC et al (2018) DrugBank 5.0: a major update to the drugbank database for 2018. Nucleic Acids Res 46:D1074–D1082. https://doi.org/10.1093/nar/gkx1037
Gilson MK, Liu T, Baitaluk M et al (2016) BindingDB in 2015: a public database for medicinal chemistry, computational chemistry and systems pharmacology. Nucleic Acids Res 44:D1045–D1053. https://doi.org/10.1093/nar/gkv1072
Burley SK, Berman HM, Kleywegt GJ, et al (2017) Protein Data Bank (PDB): The single global macromolecular structure archive. In: Methods in Molecular Biology. Humana Press Inc., pp 627–641
Zerlotini A, Aguiar ERGR, Yu F et al (2013) SchistoDB: An updated genome resource for the three key schistosomes of humans. Nucleic Acids Res. https://doi.org/10.1093/nar/gks1087
Kim S, Chen J, Cheng T et al (2021) PubChem in 2021: new data content and improved web interfaces. Nucleic Acids Res 49:D1388–D1395. https://doi.org/10.1093/nar/gkaa971
Mendez D, Gaulton A, Bento AP et al (2019) ChEMBL: towards direct deposition of bioassay data. Nucleic Acids Res 47:D930–D940. https://doi.org/10.1093/nar/gky1075
David L, Thakkar A, Mercado R, Engkvist O (2020) Molecular representations in AI-driven drug discovery: a review and practical guide. J Cheminform 12:1–22. https://doi.org/10.1186/s13321-020-00460-5
Kim H, Kim E, Lee I et al (2020) Artificial intelligence in drug discovery: a comprehensive review of data-driven and machine learning approaches. Biotechnol Bioprocess Eng 25:895–930. https://doi.org/10.1007/s12257-020-0049-y
Hong H, Xie Q, Ge W et al (2008) Mold2, molecular descriptors from 2D structures for chemoinformatics and toxicoinformatics. J Chem Inf Model 48:1337–1344. https://doi.org/10.1021/ci800038f
Ponzoni I, Sebastián-Pérez V, Requena-Triguero C et al (2017) Hybridizing feature selection and feature learning approaches in QSAR modeling for drug discovery /631/114/2248 /631/154/309 /639/638/563/606 /119/118 article. Sci Rep 7:1–19. https://doi.org/10.1038/s41598-017-02114-3
Moriwaki H, Tian Y-S, Kawashita N, Takagi T (2018) Mordred: a molecular descriptor calculator. J Cheminform 10:4. https://doi.org/10.1186/s13321-018-0258-y
Krstajic D, Buturovic L, Thomas S, Leahy DE (2017) Binary classification models with “Uncertain” predictions
Uddin S, Khan A, Hossain ME, Moni MA (2019) Comparing different supervised machine learning algorithms for disease prediction. BMC Med Inform Decis Mak 19:281. https://doi.org/10.1186/s12911-019-1004-8
Armutlu P, Ozdemir ME, Uney-Yuksektepe F et al (2008) Classification of drug molecules considering their IC50 values using mixed-integer linear programming based hyper-boxes method. BMC Bioinformatics 9:411. https://doi.org/10.1186/1471-2105-9-411
Uçar MK, Nour M, Sindi H, Polat K (2020) The effect of training and testing process on machine learning in biomedical datasets. Math Probl Eng. https://doi.org/10.1155/2020/2836236
Patel L, Shukla T, Huang X et al (2020) Machine learning methods in drug discovery. Molecules 25:5277. https://doi.org/10.3390/molecules25225277
Schmitz S, Adams R, Walsh C (2012) The use of continuous data versus binary data in MTC models: a case study in rheumatoid arthritis. BMC Med Res Methodol. https://doi.org/10.1186/1471-2288-12-167
Bagherian M, Sabeti E, Wang K et al (2021) Machine learning approaches and databases for prediction of drug-target interaction: a survey paper. Brief Bioinform 22:247–269. https://doi.org/10.1093/bib/bbz157
Krawczyk B (2016) Learning from imbalanced data: open challenges and future directions. Prog Artif Intell 5:221–232. https://doi.org/10.1007/s13748-016-0094-0
Stefanowski J Dealing with Data Difficulty Factors while Learning from Imbalanced Data
Raj KP, Raju KVS (2017) Using Machine Learning Algorithms To. 1:2007
Lago EM, Xavier RP, Teixeira TR et al (2018) Antischistosomal agents: state of art and perspectives. Future Med Chem 10:89–120. https://doi.org/10.4155/fmc-2017-0112
Feng C, Wang H, Lu N et al (2014) Log-transformation and its implications for data analysis. Shanghai Arch Psychiatry 26:105–109. https://doi.org/10.3969/j.issn.1002-0829.2014.02
Richman MB, Trafalis TB, Adrianto I (2009) Missing data imputation through machine learning algorithms. In: Artificial Intelligence Methods in the Environmental Sciences. Eds: Sue Ellen Haupt, Antonello Pasini, Caren Marzban. Springer Netherlands, pp 153–169
Cheng CY, Tseng WL, Chang CF et al (2020) A deep learning approach for missing data imputation of rating scales assessing attention-deficit hyperactivity disorder. Front Psychiatry 11:673. https://doi.org/10.3389/fpsyt.2020.00673
Zhou ZH, Liu XY (2010) On multi-class cost-sensitive learning. Comput Intell 26:232–257. https://doi.org/10.1111/j.1467-8640.2010.00358.x
Hamet P, Tremblay J (2017) Artificial intelligence in medicine. Metabolism 69:S36–S40. https://doi.org/10.1016/j.metabol.2017.01.011
de Jong J, Cutcutache I, Page M et al (2021) Towards realizing the vision of precision medicine: AI based prediction of clinical drug response. Brain. https://doi.org/10.1093/brain/awab108
Keshavarzi Arshadi A, Webb J, Salem M et al (2020) Artificial intelligence for COVID-19 drug discovery and vaccine development. Front Artif Intell. https://doi.org/10.3389/frai.2020.00065
Mak KK, Pichika MR (2019) Artificial intelligence in drug development: present status and future prospects. Drug Discov Today 24:773–780. https://doi.org/10.1016/j.drudis.2018.11.014
Bruno S, Pharmaceutical T, Healthcare GNS, et al (2017) AI-powered drug discovery captures pharma interest. 35: https://doi.org/10.1038/nature22322
Kim S, Thiessen PA, Bolton EE et al (2016) PubChem substance and compound databases. Nucleic Acids Res 44:D1202–D1213. https://doi.org/10.1093/nar/gkv951
Polishchuk PG, Madzhidov TI, Varnek A (2013) Estimation of the size of drug-like chemical space based on GDB-17 data. J Comput Aided Mol Des. https://doi.org/10.1007/s10822-013-9672-4
Williams K, Bilsland E, Sparkes A et al (2015) Cheaper faster drug development validated by the repositioning of drugs against neglected tropical diseases. J R Soc Interface. https://doi.org/10.1098/rsif.2014.1289
Sparkes A, Aubrey W, Byrne E et al (2010) Towards Robot Scientists for autonomous scientific discovery. Autom Exp 2:1. https://doi.org/10.1186/1759-4499-2-1
Abbasi B, Goldenholz DM (2019) Machine learning applications in epilepsy. Epilepsia 60:2037–2047
Vamathevan J, Clark D, Czodrowski P et al (2019) Applications of machine learning in drug discovery and development. Nat Rev Drug Discov 18:463–477. https://doi.org/10.1038/s41573-019-0024-5
Larrañaga P, Calvo B, Santana R et al (2006) Machine learning in bioinformatics. Brief Bioinform 7:86–112
Nguyen G, Dlugolinsky S, Bobák M et al (2019) Machine learning and deep learning frameworks and libraries for large-scale data mining: a survey. Artif Intell Rev 52:77–124. https://doi.org/10.1007/s10462-018-09679-z
Fawagreh K, Gaber MM, Elyan E (2014) Random forests: from early developments to recent advancements. Syst Sci Control Eng 2:602–609. https://doi.org/10.1080/21642583.2014.956265
Mahesh JU, Naganjaneyulu KV, Likitha P, Aishwarya KNSS (2014) Analysis of J48 algorithm in classification-ebola virus. Int J Emerg Trends Sci Technol 1:1289–1292
Peña-Guerrero J, Nguewa PA, García-Sosa AT (2021) Machine learning, artificial intelligence, and data science breaking into drug design and neglected diseases. Wiley Interdiscip Rev Comput Mol Sci. https://doi.org/10.1002/wcms.1513
Zhang Z (2016) Naïve bayes classification in R. Ann Transl Med. 4: 1–5. https://doi.org/10.21037/atm.2016.03.38
Zorn KM, Sun S, McConnon CL et al (2021) A Machine learning strategy for drug discovery identifies anti-schistosomal small molecules. ACS Infect Dis 7:406–420. https://doi.org/10.1021/acsinfecdis.0c00754
Cervantes J, Garcia-Lamont F, Rodríguez-Mazahua L, Lopez A (2020) A comprehensive survey on support vector machine classification: applications, challenges and trends. Neurocomputing 408:189–215. https://doi.org/10.1016/j.neucom.2019.10.118
Jing Y, Bian Y, Hu Z et al (2018) Deep learning for drug design: an artificial intelligence paradigm for drug discovery in the big data era. AAPS J. https://doi.org/10.1208/s12248-018-0210-0
Shrestha A, Mahmood A (2019) Review of deep learning algorithms and architectures. IEEE Access 7:53040–53065
Yamashita R, Nishio M, Do RKG, Togashi K (2018) Convolutional neural networks: an overview and application in radiology. Insights Imaging 9:611–629
Koutsoukas A, Monaghan KJ, Li X, Huan J (2017) Deep-learning: investigating deep neural networks hyper-parameters and comparison of performance to shallow methods for modeling bioactivity data. J Cheminform 9:42. https://doi.org/10.1186/s13321-017-0226-y
Nwankpa C, Ijomah W, Gachagan A, Marshall S (2018) Activation Functions: Comparison of trends in Practice and Research for Deep Learning. arXiv
Winkler DA, Le TC (2017) Performance of deep and shallow neural networks, the universal approximation theorem, activity cliffs, and QSAR. Mol Inform 36:1–6. https://doi.org/10.1002/minf.201600118
Lusci A, Pollastri G, Baldi P (2013) Deep architectures and deep learning in chemoinformatics: the prediction of aqueous solubility for drug-like molecules. J Chem Inf Model 53:1563–1575. https://doi.org/10.1021/ci400187y
Mahmud M, Shamim Kaiser M, Hussain A et al (2018) of deep learning and reinforcement learning to biological data. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/TNNLS.2018.2790388
Lenselink EB, Ten Dijke N, Bongers B et al (2017) Beyond the hype: deep neural networks outperform established methods using a ChEMBL bioactivity benchmark set. J Cheminform. https://doi.org/10.1186/s13321-017-0232-0
Bengio Y, Courville A, Vincent P (2013) Representation Learning : A Review and New Perspectives 35:1798–1828
Ying X (2019) An Overview of Overfitting and its Solutions. In: Journal of Physics: Conference Series. Institute of Physics Publishing
Lei S, Zhang H, Wang K, Su Z (2018) How training data affect the accuracy and robustness of neural networks for image classification
Kuc-Czarnecka M, Olczyk M (2020) How ethics combine with big data: a bibliometric analysis. Humanit Soc Sci Commun 7:1–9. https://doi.org/10.1057/s41599-020-00638-0
Mafud AC, Ferreira LG, Mascarenhas YP et al (2016) Discovery of Novel Antischistosomal Agents by Molecular Modeling Approaches. Trends Parasitol 32:874–886. https://doi.org/10.1016/j.pt.2016.08.002
Cai C, Wang S, Xu Y et al (2020) Transfer Learning for Drug Discovery. J Med Chem 63:8683–8694. https://doi.org/10.1021/acs.jmedchem.9b02147
Funding
The work was not funded.
Author information
Authors and Affiliations
Contributions
SKK conceptualized the review. SKK and KAM co-wrote the first draft with contributions from WAM, EB and MDW. All authors read and accepted the final draft for submission.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no conflicts of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Kwofie, S.K., Agyenkwa-Mawuli, K., Broni, E. et al. Prediction of antischistosomal small molecules using machine learning in the era of big data. Mol Divers 26, 1597–1607 (2022). https://doi.org/10.1007/s11030-021-10288-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11030-021-10288-2