-
Learnt representations of proteins can be used for accurate prediction of small molecule binding sites on experimentally determined and predicted protein structures J. Cheminfom. (IF 8.6) Pub Date : 2024-03-14 Anna Carbery, Martin Buttenschoen, Rachael Skyner, Frank von Delft, Charlotte M. Deane
Protein-ligand binding site prediction is a useful tool for understanding the functional behaviour and potential drug-target interactions of a novel protein of interest. However, most binding site prediction methods are tested by providing crystallised ligand-bound (holo) structures as input. This testing regime is insufficient to understand the performance on novel protein targets where experimental
-
Advancing material property prediction: using physics-informed machine learning models for viscosity J. Cheminfom. (IF 8.6) Pub Date : 2024-03-14 Alex K. Chew, Matthew Sender, Zachary Kaplan, Anand Chandrasekaran, Jackson Chief Elk, Andrea R. Browning, H. Shaun Kwak, Mathew D. Halls, Mohammad Atif Faiz Afzal
In materials science, accurately computing properties like viscosity, melting point, and glass transition temperatures solely through physics-based models is challenging. Data-driven machine learning (ML) also poses challenges in constructing ML models, especially in the material science domain where data is limited. To address this, we integrate physics-informed descriptors from molecular dynamics
-
A new workflow for the effective curation of membrane permeability data from open ADME information J. Cheminfom. (IF 8.6) Pub Date : 2024-03-14 Tsuyoshi Esaki, Tomoki Yonezawa, Kazuyoshi Ikeda
Membrane permeability is an in vitro parameter that represents the apparent permeability (Papp) of a compound, and is a key absorption, distribution, metabolism, and excretion parameter in drug development. Although the Caco-2 cell lines are the most used cell lines to measure Papp, other cell lines, such as the Madin-Darby Canine Kidney (MDCK), LLC-Pig Kidney 1 (LLC-PK1), and Ralph Russ Canine Kidney
-
Automated molecular structure segmentation from documents using ChemSAM J. Cheminfom. (IF 8.6) Pub Date : 2024-03-12 Bowen Tang, Zhangming Niu, Xiaofeng Wang, Junjie Huang, Chao Ma, Jing Peng, Yinghui Jiang, Ruiquan Ge, Hongyu Hu, Luhao Lin, Guang Yang
Chemical structure segmentation constitutes a pivotal task in cheminformatics, involving the extraction and abstraction of structural information of chemical compounds from text-based sources, including patents and scientific articles. This study introduces a deep learning approach to chemical structure segmentation, employing a Vision Transformer (ViT) to discern the structural patterns of chemical
-
Systematic analysis, aggregation and visualisation of interaction fingerprints for molecular dynamics simulation data J. Cheminfom. (IF 8.6) Pub Date : 2024-03-12 Sabrina Jaeger-Honz, Karsten Klein, Falk Schreiber
Computational methods such as molecular docking or molecular dynamics (MD) simulations have been developed to simulate and explore the interactions between biomolecules. However, the interactions obtained using these methods are difficult to analyse and evaluate. Interaction fingerprints (IFPs) have been proposed to derive interactions from static 3D coordinates and transform them into 1D bit vectors
-
Prediction of compound-target interaction using several artificial intelligence algorithms and comparison with a consensus-based strategy J. Cheminfom. (IF 8.6) Pub Date : 2024-03-07 Karina Jimenes-Vargas, Alejandro Pazos, Cristian R. Munteanu, Yunierkis Perez-Castillo, Eduardo Tejera
For understanding a chemical compound’s mechanism of action and its side effects, as well as for drug discovery, it is crucial to predict its possible protein targets. This study examines 15 developed target-centric models (TCM) employing different molecular descriptions and machine learning algorithms. They were contrasted with 17 third-party models implemented as web tools (WTCM). In both sets of
-
Small molecule autoencoders: architecture engineering to optimize latent space utility and sustainability J. Cheminfom. (IF 8.6) Pub Date : 2024-03-05 Marie Oestreich, Iva Ewert, Matthias Becker
Autoencoders are frequently used to embed molecules for training of downstream deep learning models. However, evaluation of the chemical information quality in the latent spaces is lacking and the model architectures are often arbitrarily chosen. Unoptimized architectures may not only negatively affect latent space quality but also increase energy consumption during training, making the models unsustainable
-
Improving chemical reaction yield prediction using pre-trained graph neural networks J. Cheminfom. (IF 8.6) Pub Date : 2024-03-01 Jongmin Han, Youngchun Kwon, Youn-Suk Choi, Seokho Kang
Graph neural networks (GNNs) have proven to be effective in the prediction of chemical reaction yields. However, their performance tends to deteriorate when they are trained using an insufficient training dataset in terms of quantity or diversity. A promising solution to alleviate this issue is to pre-train a GNN on a large-scale molecular database. In this study, we investigate the effectiveness of
-
Correction: DecoyFinder, a tool for finding decoy molecules J. Cheminfom. (IF 8.6) Pub Date : 2024-02-28 Adrià Cereto-Massagué, S. Garcia-Vallvé, G. Pujadas
Correction: Journal of Cheminformatics 2012, 4(Suppl 1):P2 https://doi.org/10.1186/1758-2946-4-s1-p2 From 7th German Conference on Chemoinformatics: 25 CIC-Workshop Goslar, Germany. 6–8 November 2011 Following publication of the original article [1], we have been informed that first name and last name of the author Adrià Cereto Massagué has been switched. The incorrect name is: Cereto Massagué Adrià
-
Preventing lipophilic aggregation in cosolvent molecular dynamics simulations with hydrophobic probes using Plumed Automatic Restraining Tool (PART) J. Cheminfom. (IF 8.6) Pub Date : 2024-02-27 Olivier Beyens, Hans De Winter
Cosolvent molecular dynamics (MD) simulations are molecular dynamics simulations used to identify preferable locations of small organic fragments on a protein target. Most cosolvent molecular dynamics workflows make use of only water-soluble fragments, as hydrophobic fragments would cause lipophilic aggregation. To date the two approaches that allow usage of hydrophobic cosolvent molecules are to use
-
Prediction of chemical reaction yields with large-scale multi-view pre-training J. Cheminfom. (IF 8.6) Pub Date : 2024-02-25 Runhan Shi, Gufeng Yu, Xiaohong Huo, Yang Yang
Developing machine learning models with high generalization capability for predicting chemical reaction yields is of significant interest and importance. The efficacy of such models depends heavily on the representation of chemical reactions, which has commonly been learned from SMILES or graphs of molecules using deep neural networks. However, the progression of chemical reactions is inherently determined
-
“DompeKeys”: a set of novel substructure-based descriptors for efficient chemical space mapping, development and structural interpretation of machine learning models, and indexing of large databases J. Cheminfom. (IF 8.6) Pub Date : 2024-02-23 Candida Manelfi, Valerio Tazzari, Filippo Lunghini, Carmen Cerchia, Anna Fava, Alessandro Pedretti, Pieter F. W. Stouten, Giulio Vistoli, Andrea Rosario Beccari
The conversion of chemical structures into computer-readable descriptors, able to capture key structural aspects, is of pivotal importance in the field of cheminformatics and computer-aided drug design. Molecular fingerprints represent a widely employed class of descriptors; however, their generation process is time-consuming for large databases and is susceptible to bias. Therefore, descriptors able
-
Reinvent 4: Modern AI–driven generative molecule design J. Cheminfom. (IF 8.6) Pub Date : 2024-02-21 Hannes H. Loeffler, Jiazhen He, Alessandro Tibo, Jon Paul Janet, Alexey Voronov, Lewis H. Mervin, Ola Engkvist
REINVENT 4 is a modern open-source generative AI framework for the design of small molecules. The software utilizes recurrent neural networks and transformer architectures to drive molecule generation. These generators are seamlessly embedded within the general machine learning optimization algorithms, transfer learning, reinforcement learning and curriculum learning. REINVENT 4 enables and facilitates
-
Free and open-source QSAR-ready workflow for automated standardization of chemical structures in support of QSAR modeling J. Cheminfom. (IF 8.6) Pub Date : 2024-02-20 Kamel Mansouri, José T. Moreira-Filho, Charles N. Lowe, Nathaniel Charest, Todd Martin, Valery Tkachenko, Richard Judson, Mike Conway, Nicole C. Kleinstreuer, Antony J. Williams
The rapid increase of publicly available chemical structures and associated experimental data presents a valuable opportunity to build robust QSAR models for applications in different fields. However, the common concern is the quality of both the chemical structure information and associated experimental data. This is especially true when those data are collected from multiple sources as chemical substance
-
POSEIDON: Peptidic Objects SEquence-based Interaction with cellular DOmaiNs: a new database and predictor J. Cheminfom. (IF 8.6) Pub Date : 2024-02-16 António J. Preto, Ana B. Caniceiro, Francisco Duarte, Hugo Fernandes, Lino Ferreira, Joana Mourão, Irina S. Moreira
Cell-penetrating peptides (CPPs) are short chains of amino acids that have shown remarkable potential to cross the cell membrane and deliver coupled therapeutic cargoes into cells. Designing and testing different CPPs to target specific cells or tissues is crucial to ensure high delivery efficiency and reduced toxicity. However, in vivo/in vitro testing of various CPPs can be both time-consuming and
-
Simultaneously improving accuracy and computational cost under parametric constraints in materials property prediction tasks J. Cheminfom. (IF 8.6) Pub Date : 2024-02-16 Vishu Gupta, Youjia Li, Alec Peltekian, Muhammed Nur Talha Kilic, Wei-keng Liao, Alok Choudhary, Ankit Agrawal
Modern data mining techniques using machine learning (ML) and deep learning (DL) algorithms have been shown to excel in the regression-based task of materials property prediction using various materials representations. In an attempt to improve the predictive performance of the deep neural network model, researchers have tried to add more layers as well as develop new architectural components to create
-
Ontologies4Cat: investigating the landscape of ontologies for catalysis research data management J. Cheminfom. (IF 8.6) Pub Date : 2024-02-07 Alexander S. Behr, Hendrik Borgelt, Norbert Kockmann
As scientific digitization advances it is imperative ensuring data is Findable, Accessible, Interoperable, and Reusable (FAIR) for machine-processable data. Ontologies play a vital role in enhancing data FAIRness by explicitly representing knowledge in a machine-understandable format. Research data in catalysis research often exhibits complexity and diversity, necessitating a respectively broad collection
-
AdductHunter: identifying protein-metal complex adducts in mass spectra J. Cheminfom. (IF 8.6) Pub Date : 2024-02-06 Derek Long, Liam Eade, Matthew P. Sullivan, Katharina Dost, Samuel M. Meier-Menches, David C. Goldstone, Christian G. Hartinger, Jörg S. Wicker, Katerina Taškova
Mass spectrometry (MS) is an analytical technique for molecule identification that can be used for investigating protein-metal complex interactions. Once the MS data is collected, the mass spectra are usually interpreted manually to identify the adducts formed as a result of the interactions between proteins and metal-based species. However, with increasing resolution, dataset size, and species complexity
-
DLM-DTI: a dual language model for the prediction of drug-target interaction with hint-based learning J. Cheminfom. (IF 8.6) Pub Date : 2024-02-01 Jonghyun Lee, Dae Won Jun, Ildae Song, Yun Kim
The drug discovery process is demanding and time-consuming, and machine learning-based research is increasingly proposed to enhance efficiency. A significant challenge in this field is predicting whether a drug molecule’s structure will interact with a target protein. A recent study attempted to address this challenge by utilizing an encoder that leverages prior knowledge of molecular and protein structures
-
Large-scale comparison of machine learning methods for profiling prediction of kinase inhibitors J. Cheminfom. (IF 8.6) Pub Date : 2024-01-30 Jiangxia Wu, Yihao Chen, Jingxing Wu, Duancheng Zhao, Jindi Huang, MuJie Lin, Ling Wang
Conventional machine learning (ML) and deep learning (DL) play a key role in the selectivity prediction of kinase inhibitors. A number of models based on available datasets can be used to predict the kinase profile of compounds, but there is still controversy about the advantages and disadvantages of ML and DL for such tasks. In this study, we constructed a comprehensive benchmark dataset of kinase
-
CRAFT: a web-integrated cavity prediction tool based on flow transfer algorithm J. Cheminfom. (IF 8.6) Pub Date : 2024-01-30 Anuj Gahlawat, Anjali Singh, Hardeep Sandhu, Prabha Garg
Numerous computational methods, including evolutionary-based, energy-based, and geometrical-based methods, are utilized to identify cavities inside proteins. Cavity information aids protein function annotation, drug design, poly-pharmacology, and allosteric site investigation. This article introduces “flow transfer algorithm” for rapid and effective identification of diverse protein cavities through
-
Enhancing chemical synthesis: a two-stage deep neural network for predicting feasible reaction conditions J. Cheminfom. (IF 8.6) Pub Date : 2024-01-24 Lung-Yi Chen, Yi-Pei Li
In the field of chemical synthesis planning, the accurate recommendation of reaction conditions is essential for achieving successful outcomes. This work introduces an innovative deep learning approach designed to address the complex task of predicting appropriate reagents, solvents, and reaction temperatures for chemical reactions. Our proposed methodology combines a multi-label classification model
-
Decrypting orphan GPCR drug discovery via multitask learning J. Cheminfom. (IF 8.6) Pub Date : 2024-01-23 Wei-Cheng Huang, Wei-Ting Lin, Ming-Shiu Hung, Jinq-Chyi Lee, Chun-Wei Tung
The drug discovery of G protein-coupled receptors (GPCRs) superfamily using computational models is often limited by the availability of protein three-dimensional (3D) structures and chemicals with experimentally measured bioactivities. Orphan GPCRs without known ligands further complicate the process. To enable drug discovery for human orphan GPCRs, multitask models were proposed for predicting half
-
MATEO: intermolecular α-amidoalkylation theoretical enantioselectivity optimization. Online tool for selection and design of chiral catalysts and products J. Cheminfom. (IF 8.6) Pub Date : 2024-01-23 Paula Carracedo-Reboredo, Eider Aranzamendi, Shan He, Sonia Arrasate, Cristian R. Munteanu, Carlos Fernandez-Lozano, Nuria Sotomayor, Esther Lete, Humberto González-Díaz
The enantioselective Brønsted acid-catalyzed α-amidoalkylation reaction is a useful procedure is for the production of new drugs and natural products. In this context, Chiral Phosphoric Acid (CPA) catalysts are versatile catalysts for this type of reactions. The selection and design of new CPA catalysts for different enantioselective reactions has a dual interest because new CPA catalysts (tools) and
-
IDSL_MINT: a deep learning framework to predict molecular fingerprints from mass spectra J. Cheminfom. (IF 8.6) Pub Date : 2024-01-18 Sadjad Fakouri Baygi, Dinesh Kumar Barupal
The majority of tandem mass spectrometry (MS/MS) spectra in untargeted metabolomics and exposomics studies lack any annotation. Our deep learning framework, Integrated Data Science Laboratory for Metabolomics and Exposomics—Mass INTerpreter (IDSL_MINT) can translate MS/MS spectra into molecular fingerprint descriptors. IDSL_MINT allows users to leverage the power of the transformer model for mass spectrometry
-
Are new ideas harder to find? A note on incremental research and Journal of Cheminformatics’ Scientific Contribution Statement J. Cheminfom. (IF 8.6) Pub Date : 2024-01-15 Barbara Zdrazil, Rajarshi Guha, Karina Martinez-Mayorga, Nina Jeliazkova
In the field of cheminformatics, technological advancements in recent times include, e.g., the way chemical information is being represented for large scale screening and de novo drug design. Especially, chemical language models originating from natural language processing offer new opportunities for molecular design [1]. However, for science in general and compared to past decades, recent paucity
-
BioisoIdentifier: an online free tool to investigate local structural replacements from PDB J. Cheminfom. (IF 8.6) Pub Date : 2024-01-13 Tinghao Zhang, Shaohua Sun, Runzhou Wang, Ting Li, Bicheng Gan, Yuezhou Zhang
Within the realm of contemporary medicinal chemistry, bioisosteres are empirically used to enhance potency and selectivity, improve adsorption, distribution, metabolism, excretion and toxicity profiles of drug candidates. It is believed that bioisosteric know-how may help bypass granted patents or generate novel intellectual property for commercialization. Beside the synthetic expertise, the drug discovery
-
Cobdock: an accurate and practical machine learning-based consensus blind docking method J. Cheminfom. (IF 8.6) Pub Date : 2024-01-11 Sadettin Y. Ugurlu, David McDonald, Huangshu Lei, Alan M. Jones, Shu Li, Henry Y. Tong, Mark S. Butler, Shan He
Probing the surface of proteins to predict the binding site and binding affinity for a given small molecule is a critical but challenging task in drug discovery. Blind docking addresses this issue by performing docking on binding regions randomly sampled from the entire protein surface. However, compared with local docking, blind docking is less accurate and reliable because the docking space is too
-
DBPP-Predictor: a novel strategy for prediction of chemical drug-likeness based on property profiles J. Cheminfom. (IF 8.6) Pub Date : 2024-01-05 Yaxin Gu, Yimeng Wang, Keyun Zhu, Weihua Li, Guixia Liu, Yun Tang
Evaluation of chemical drug-likeness is essential for the discovery of high-quality drug candidates while avoiding unwarranted biological and clinical trial costs. A high-quality drug candidate should have promising drug-like properties, including pharmacological activity, suitable physicochemical and ADMET properties. Hence, in silico prediction of chemical drug-likeness has been proposed while being
-
Relative molecule self-attention transformer J. Cheminfom. (IF 8.6) Pub Date : 2024-01-03 Łukasz Maziarka, Dawid Majchrowski, Tomasz Danel, Piotr Gaiński, Jacek Tabor, Igor Podolak, Paweł Morkisz, Stanisław Jastrzębski
The prediction of molecular properties is a crucial aspect in drug discovery that can save a lot of money and time during the drug design process. The use of machine learning methods to predict molecular properties has become increasingly popular in recent years. Despite advancements in the field, several challenges remain that need to be addressed, like finding an optimal pre-training procedure to
-
Structure-based, deep-learning models for protein-ligand binding affinity prediction J. Cheminfom. (IF 8.6) Pub Date : 2024-01-03 Debby D. Wang, Wenhui Wu, Ran Wang
The launch of AlphaFold series has brought deep-learning techniques into the molecular structural science. As another crucial problem, structure-based prediction of protein-ligand binding affinity urgently calls for advanced computational techniques. Is deep learning ready to decode this problem? Here we review mainstream structure-based, deep-learning approaches for this problem, focusing on molecular
-
InterDILI: interpretable prediction of drug-induced liver injury through permutation feature importance and attention mechanism J. Cheminfom. (IF 8.6) Pub Date : 2024-01-03 Soyeon Lee, Sunyong Yoo
Safety is one of the important factors constraining the distribution of clinical drugs on the market. Drug-induced liver injury (DILI) is the leading cause of safety problems produced by drug side effects. Therefore, the DILI risk of approved drugs and potential drug candidates should be assessed. Currently, in vivo and in vitro methods are used to test DILI risk, but both methods are labor-intensive
-
Applying atomistic neural networks to bias conformer ensembles towards bioactive-like conformations J. Cheminfom. (IF 8.6) Pub Date : 2023-12-21 Benoit Baillif, Jason Cole, Ilenia Giangreco, Patrick McCabe, Andreas Bender
Identifying bioactive conformations of small molecules is an essential process for virtual screening applications relying on three-dimensional structure such as molecular docking. For most small molecules, conformer generators retrieve at least one bioactive-like conformation, with an atomic root-mean-square deviation (ARMSD) lower than 1 Å, among the set of low-energy conformers generated. However
-
A workflow for deriving chemical entities from crystallographic data and its application to the Crystallography Open Database J. Cheminfom. (IF 8.6) Pub Date : 2023-12-19 Antanas Vaitkus, Andrius Merkys, Thomas Sander, Miguel Quirós, Paul A. Thiessen, Evan E. Bolton, Saulius Gražulis
Knowledge about the 3-dimensional structure, orientation and interaction of chemical compounds is important in many areas of science and technology. X-ray crystallography is one of the experimental techniques capable of providing a large amount of structural information for a given compound, and it is widely used for characterisation of organic and metal-organic molecules. The method provides precise
-
Ilm-NMR-P31: an open-access 31P nuclear magnetic resonance database and data-driven prediction of 31P NMR shifts J. Cheminfom. (IF 8.6) Pub Date : 2023-12-18 Jasmin Hack, Moritz Jordan, Alina Schmitt, Melissa Raru, Hannes Sönke Zorn, Alex Seyfarth, Isabel Eulenberger, Robert Geitner
This publication introduces a novel open-access 31P Nuclear Magnetic Resonance (NMR) shift database. With 14,250 entries encompassing 13,730 distinct molecules from 3,648 references, this database offers a comprehensive repository of organic and inorganic compounds. Emphasizing single-phosphorus atom compounds, the database facilitates data mining and machine learning endeavors, particularly in signal
-
Uncertain of uncertainties? A comparison of uncertainty quantification metrics for chemical data sets J. Cheminfom. (IF 8.6) Pub Date : 2023-12-18 Maria H. Rasmussen, Chenru Duan, Heather J. Kulik, Jan H. Jensen
With the increasingly more important role of machine learning (ML) models in chemical research, the need for putting a level of confidence to the model predictions naturally arises. Several methods for obtaining uncertainty estimates have been proposed in recent years but consensus on the evaluation of these have yet to be established and different studies on uncertainties generally uses different
-
AI-driven molecular generation of not-patented pharmaceutical compounds using world open patent data J. Cheminfom. (IF 8.6) Pub Date : 2023-12-13 Yugo Shimizu, Masateru Ohta, Shoichi Ishida, Kei Terayama, Masanori Osawa, Teruki Honma, Kazuyoshi Ikeda
Developing compounds with novel structures is important for the production of new drugs. From an intellectual perspective, confirming the patent status of newly developed compounds is essential, particularly for pharmaceutical companies. The generation of a large number of compounds has been made possible because of the recent advances in artificial intelligence (AI). However, confirming the patent
-
SIMPD: an algorithm for generating simulated time splits for validating machine learning approaches J. Cheminfom. (IF 8.6) Pub Date : 2023-12-11 Gregory A. Landrum, Maximilian Beckers, Jessica Lanini, Nadine Schneider, Nikolaus Stiefl, Sereina Riniker
Time-split cross-validation is broadly recognized as the gold standard for validating predictive models intended for use in medicinal chemistry projects. Unfortunately this type of data is not broadly available outside of large pharmaceutical research organizations. Here we introduce the SIMPD (simulated medicinal chemistry project data) algorithm to split public data sets into training and test sets
-
HybridGCN for protein solubility prediction with adaptive weighting of multiple features J. Cheminfom. (IF 8.6) Pub Date : 2023-12-08 Long Chen, Rining Wu, Feixiang Zhou, Huifeng Zhang, Jian K. Liu
The solubility of proteins stands as a pivotal factor in the realm of pharmaceutical research and production. Addressing the imperative to enhance production efficiency and curtail experimental costs, the demand arises for computational models adept at accurately predicting solubility based on provided datasets. Prior investigations have leveraged deep learning models and feature engineering techniques
-
PDBe CCDUtils: an RDKit-based toolkit for handling and analysing small molecules in the Protein Data Bank J. Cheminfom. (IF 8.6) Pub Date : 2023-12-02 Ibrahim Roshan Kunnakkattu, Preeti Choudhary, Lukas Pravda, Nurul Nadzirin, Oliver S. Smart, Qi Yuan, Stephen Anyango, Sreenath Nair, Mihaly Varadi, Sameer Velankar
While the Protein Data Bank (PDB) contains a wealth of structural information on ligands bound to macromolecules, their analysis can be challenging due to the large amount and diversity of data. Here, we present PDBe CCDUtils, a versatile toolkit for processing and analysing small molecules from the PDB in PDBx/mmCIF format. PDBe CCDUtils provides streamlined access to all the metadata for small molecules
-
Towards a partial order graph for interactive pharmacophore exploration: extraction of pharmacophores activity delta J. Cheminfom. (IF 8.6) Pub Date : 2023-11-29 Etienne Lehembre, Johanna Giovannini, Damien Geslin, Alban Lepailleur, Jean-Luc Lamotte, David Auber, Abdelkader Ouali, Bruno Cremilleux, Albrecht Zimmermann, Bertrand Cuissart, Ronan Bureau
This paper presents a novel approach called Pharmacophore Activity Delta for extracting outstanding pharmacophores from a chemogenomic dataset, with a specific focus on a kinase target known as BCR-ABL. The method involves constructing a Hasse diagram, referred to as the pharmacophore network, by utilizing the subgraph partial order as an initial step, leading to the identification of pharmacophores
-
EMNPD: a comprehensive endophytic microorganism natural products database for prompt the discovery of new bioactive substances J. Cheminfom. (IF 8.6) Pub Date : 2023-11-28 Hong-Quan Xu, Huan Xiao, Jin-Hui Bu, Yan-Feng Hong, Yu-Hong Liu, Zi-Yue Tao, Shu-Fan Ding, Yi-Tong Xia, E Wu, Zhen Yan, Wei Zhang, Gong-Xing Chen, Feng Zhu, Lin Tao
The discovery and utilization of natural products derived from endophytic microorganisms have garnered significant attention in pharmaceutical research. While remarkable progress has been made in this field each year, the absence of dedicated open-access databases for endophytic microorganism natural products research is evident. To address the increasing demand for mining and sharing of data resources
-
NMR shift prediction from small data quantities J. Cheminfom. (IF 8.6) Pub Date : 2023-11-27 Herman Rull, Markus Fischer, Stefan Kuhn
Prediction of chemical shift in NMR using machine learning methods is typically done with the maximum amount of data available to achieve the best results. In some cases, such large amounts of data are not available, e.g. for heteronuclei. We demonstrate a novel machine learning model that is able to achieve better results than other models for relevant datasets with comparatively low amounts of data
-
An explainability framework for deep learning on chemical reactions exemplified by enzyme-catalysed reaction classification J. Cheminfom. (IF 8.6) Pub Date : 2023-11-23 Daniel Probst
Assigning or proposing a catalysing enzyme given a chemical or biochemical reaction is of great interest to life sciences and chemistry alike. The exploration and design of metabolic pathways and the challenge of finding more sustainable enzyme-catalysed alternatives to traditional organic reactions are just two examples of tasks that require an association between reaction and enzyme. However, given
-
On the difficulty of validating molecular generative models realistically: a case study on public and proprietary data J. Cheminfom. (IF 8.6) Pub Date : 2023-11-21 Koichi Handa, Morgan C. Thomas, Michiharu Kageyama, Takeshi Iijima, Andreas Bender
While a multitude of deep generative models have recently emerged there exists no best practice for their practically relevant validation. On the one hand, novel de novo-generated molecules cannot be refuted by retrospective validation (so that this type of validation is biased); but on the other hand prospective validation is expensive and then often biased by the human selection process. In this
-
YoDe-Segmentation: automated noise-free retrieval of molecular structures from scientific publications J. Cheminfom. (IF 8.6) Pub Date : 2023-11-20 Chong Zhou, Wei Liu, Xiyue Song, Mengling Yang, Xiaowang Peng
In chemistry-related disciplines, a vast repository of molecular structural data has been documented in scientific publications but remains inaccessible to computational analyses owing to its non-machine-readable format. Optical chemical structure recognition (OCSR) addresses this gap by converting images of chemical molecular structures into a format accessible to computers and convenient for storage
-
BBB-PEP-prediction: improved computational model for identification of blood–brain barrier peptides using blending position relative composition specific features and ensemble modeling J. Cheminfom. (IF 8.6) Pub Date : 2023-11-18 Ansar Naseem, Fahad Alturise, Tamim Alkhalifah, Yaser Daanial Khan
BBPs have the potential to facilitate the delivery of drugs to the brain, opening up new avenues for the development of treatments targeting diseases of the central nervous system (CNS). The obstacle faced in central nervous system disorders stems from the formidable task of traversing the blood–brain barrier (BBB) for pharmaceutical agents. Nearly 98% of small molecule-based drugs and nearly 100%
-
HD_BPMDS: a curated binary pattern multitarget dataset of Huntington’s disease–targeting agents J. Cheminfom. (IF 8.6) Pub Date : 2023-11-17 Sven Marcel Stefan, Jens Pahnke, Vigneshwaran Namasivayam
The discovery of both distinctive lead molecules and novel drug targets is a great challenge in drug discovery, which particularly accounts for orphan diseases. Huntington’s disease (HD) is an orphan, neurodegenerative disease of which the pathology is well-described. However, its pathophysiological background and molecular mechanisms are poorly understood. To date, only 2 drugs have been approved
-
Correction: MOFGalaxyNet: a social network analysis for predicting guest accessibility in metal–organic frameworks utilizing graph convolutional networks J. Cheminfom. (IF 8.6) Pub Date : 2023-11-15 Mehrdad Jalali, A. D. Dinga Wonanke, Christof Wöll
Correction: Journal of Cheminformatics (2023) 15:94 https://doi.org/10.1186/s13321-023-00764-2 Following publication of the original article [1], we have been informed that Fig. 4 is missing, and instead of Fig. 4, Fig. 3 has been repeated as Fig. 4. The original article [1] has been corrected. Jalali M, Wonanke ADD, Wöll C (2023) MOFGalaxyNet: a social network analysis for predicting guest accessibility
-
Exploring the known chemical space of the plant kingdom: insights into taxonomic patterns, knowledge gaps, and bioactive regions J. Cheminfom. (IF 8.6) Pub Date : 2023-11-10 Daniel Domingo-Fernández, Yojana Gadiya, Sarah Mubeen, David Healey, Bryan H. Norman, Viswa Colluru
Plants are one of the primary sources of natural products for drug development. However, despite centuries of research, only a limited region of the phytochemical space has been studied. To understand the scope of what is explored versus unexplored in the phytochemical space, we begin by reconstructing the known chemical space of the plant kingdom, mapping the distribution of secondary metabolites
-
Continuous symmetry and chirality measures: approximate algorithms for large molecular structures J. Cheminfom. (IF 8.6) Pub Date : 2023-11-09 Gil Alon, Yuval Ben-Haim, Inbal Tuvi-Arad
Quantifying imperfect symmetry of molecules can help explore the sources, roles and extent of structural distortion. Based on the established methodology of continuous symmetry and chirality measures, we develop a set of three-dimensional molecular descriptors to estimate distortion of large structures. These three-dimensional geometrical descriptors quantify the gap between the desirable symmetry
-
Evaluating uncertainty-based active learning for accelerating the generalization of molecular property prediction J. Cheminfom. (IF 8.6) Pub Date : 2023-11-08 Tianzhixi Yin, Gihan Panapitiya, Elizabeth D. Coda, Emily G. Saldanha
Deep learning models have proven to be a powerful tool for the prediction of molecular properties for applications including drug design and the development of energy storage materials. However, in order to learn accurate and robust structure–property mappings, these models require large amounts of data which can be a challenge to collect given the time and resource-intensive nature of experimental
-
Determining the parent and associated fragment formulae in mass spectrometry via the parent subformula graph J. Cheminfom. (IF 8.6) Pub Date : 2023-11-07 Sean Li, Björn Bohman, Gavin R. Flematti, Dylan Jayatilaka
Identifying the molecular formula and fragmentation reactions of an unknown compound from its mass spectrum is crucial in areas such as natural product chemistry and metabolomics. We propose a method for identifying the correct candidate formula of an unidentified natural product from its mass spectrum. The method involves scoring the plausibility of parent candidate formulae based on a parent subformula
-
DeepSA: a deep-learning driven predictor of compound synthesis accessibility J. Cheminfom. (IF 8.6) Pub Date : 2023-11-02 Shihang Wang, Lin Wang, Fenglei Li, Fang Bai
With the continuous development of artificial intelligence technology, more and more computational models for generating new molecules are being developed. However, we are often confronted with the question of whether these compounds are easy or difficult to synthesize, which refers to synthetic accessibility of compounds. In this study, a deep learning based computational model called DeepSA, was
-
EasyDock: customizable and scalable docking tool J. Cheminfom. (IF 8.6) Pub Date : 2023-11-01 Guzel Minibaeva, Aleksandra Ivanova, Pavel Polishchuk
Docking of large compound collections becomes an important procedure to discover new chemical entities. Screening of large sets of compounds may also occur in de novo design projects guided by molecular docking. To facilitate these processes, there is a need for automated tools capable of efficiently docking a large number of molecules using multiple computational nodes within a reasonable timeframe
-
DeepDelta: predicting ADMET improvements of molecular derivatives with deep learning J. Cheminfom. (IF 8.6) Pub Date : 2023-10-27 Zachary Fralish, Ashley Chen, Paul Skaluba, Daniel Reker
Established molecular machine learning models process individual molecules as inputs to predict their biological, chemical, or physical properties. However, such algorithms require large datasets and have not been optimized to predict property differences between molecules, limiting their ability to learn from smaller datasets and to directly compare the anticipated properties of two molecules. Many
-
Art driven by visual representations of chemical space J. Cheminfom. (IF 8.6) Pub Date : 2023-10-21 Daniela Gaytán-Hernández, Ana L. Chávez-Hernández, Edgar López-López, Jazmín Miranda-Salas, Fernanda I. Saldívar-González, José L. Medina-Franco
Science and art have been connected for centuries. With the development of new computational methods, new scientific disciplines have emerged, such as computational chemistry, and related fields, such as cheminformatics. Chemoinformatics is grounded on the chemical space concept: a multi-descriptor space in which chemical structures are described. In several practical applications, visual representations
-
Prediction of organic compound aqueous solubility using machine learning: a comparison study of descriptor-based and fingerprints-based models J. Cheminfom. (IF 8.6) Pub Date : 2023-10-18 Arash Tayyebi, Ali S Alshami, Zeinab Rabiei, Xue Yu, Nadhem Ismail, Musabbir Jahan Talukder, Jason Power
A reliable and practical determination of a chemical species’ solubility in water continues to be examined using empirical observations and exhaustive experimental studies alone. Predictions of chemical solubility in water using data-driven algorithms can allow us to create a rationally designed, efficient, and cost-effective tool for next-generation materials and chemical formulations. We present
-
Cheminformatics Microservice: unifying access to open cheminformatics toolkits J. Cheminfom. (IF 8.6) Pub Date : 2023-10-16 Venkata Chandrasekhar, Nisha Sharma, Jonas Schaub, Christoph Steinbeck, Kohulan Rajan
In recent years, cheminformatics has experienced significant advancements through the development of new open-source software tools based on various cheminformatics programming toolkits. However, adopting these toolkits presents challenges, including proper installation, setup, deployment, and compatibility management. In this work, we present the Cheminformatics Microservice. This open-source solution
-
Pmf-cpi: assessing drug selectivity with a pretrained multi-functional model for compound–protein interactions J. Cheminfom. (IF 8.6) Pub Date : 2023-10-14 Nan Song, Ruihan Dong, Yuqian Pu, Ercheng Wang, Junhai Xu, Fei Guo
Compound–protein interactions (CPI) play significant roles in drug development. To avoid side effects, it is also crucial to evaluate drug selectivity when binding to different targets. However, most selectivity prediction models are constructed for specific targets with limited data. In this study, we present a pretrained multi-functional model for compound–protein interaction prediction (PMF-CPI)