-
Towards a partial order graph for interactive pharmacophore exploration: extraction of pharmacophores activity delta J. Cheminfom. (IF 8.6) Pub Date : 2023-11-29 Etienne Lehembre, Johanna Giovannini, Damien Geslin, Alban Lepailleur, Jean-Luc Lamotte, David Auber, Abdelkader Ouali, Bruno Cremilleux, Albrecht Zimmermann, Bertrand Cuissart, Ronan Bureau
This paper presents a novel approach called Pharmacophore Activity Delta for extracting outstanding pharmacophores from a chemogenomic dataset, with a specific focus on a kinase target known as BCR-ABL. The method involves constructing a Hasse diagram, referred to as the pharmacophore network, by utilizing the subgraph partial order as an initial step, leading to the identification of pharmacophores
-
EMNPD: a comprehensive endophytic microorganism natural products database for prompt the discovery of new bioactive substances J. Cheminfom. (IF 8.6) Pub Date : 2023-11-28 Hong-Quan Xu, Huan Xiao, Jin-Hui Bu, Yan-Feng Hong, Yu-Hong Liu, Zi-Yue Tao, Shu-Fan Ding, Yi-Tong Xia, E Wu, Zhen Yan, Wei Zhang, Gong-Xing Chen, Feng Zhu, Lin Tao
The discovery and utilization of natural products derived from endophytic microorganisms have garnered significant attention in pharmaceutical research. While remarkable progress has been made in this field each year, the absence of dedicated open-access databases for endophytic microorganism natural products research is evident. To address the increasing demand for mining and sharing of data resources
-
NMR shift prediction from small data quantities J. Cheminfom. (IF 8.6) Pub Date : 2023-11-27 Herman Rull, Markus Fischer, Stefan Kuhn
Prediction of chemical shift in NMR using machine learning methods is typically done with the maximum amount of data available to achieve the best results. In some cases, such large amounts of data are not available, e.g. for heteronuclei. We demonstrate a novel machine learning model that is able to achieve better results than other models for relevant datasets with comparatively low amounts of data
-
An explainability framework for deep learning on chemical reactions exemplified by enzyme-catalysed reaction classification J. Cheminfom. (IF 8.6) Pub Date : 2023-11-23 Daniel Probst
Assigning or proposing a catalysing enzyme given a chemical or biochemical reaction is of great interest to life sciences and chemistry alike. The exploration and design of metabolic pathways and the challenge of finding more sustainable enzyme-catalysed alternatives to traditional organic reactions are just two examples of tasks that require an association between reaction and enzyme. However, given
-
On the difficulty of validating molecular generative models realistically: a case study on public and proprietary data J. Cheminfom. (IF 8.6) Pub Date : 2023-11-21 Koichi Handa, Morgan C. Thomas, Michiharu Kageyama, Takeshi Iijima, Andreas Bender
While a multitude of deep generative models have recently emerged there exists no best practice for their practically relevant validation. On the one hand, novel de novo-generated molecules cannot be refuted by retrospective validation (so that this type of validation is biased); but on the other hand prospective validation is expensive and then often biased by the human selection process. In this
-
YoDe-Segmentation: automated noise-free retrieval of molecular structures from scientific publications J. Cheminfom. (IF 8.6) Pub Date : 2023-11-20 Chong Zhou, Wei Liu, Xiyue Song, Mengling Yang, Xiaowang Peng
In chemistry-related disciplines, a vast repository of molecular structural data has been documented in scientific publications but remains inaccessible to computational analyses owing to its non-machine-readable format. Optical chemical structure recognition (OCSR) addresses this gap by converting images of chemical molecular structures into a format accessible to computers and convenient for storage
-
BBB-PEP-prediction: improved computational model for identification of blood–brain barrier peptides using blending position relative composition specific features and ensemble modeling J. Cheminfom. (IF 8.6) Pub Date : 2023-11-18 Ansar Naseem, Fahad Alturise, Tamim Alkhalifah, Yaser Daanial Khan
BBPs have the potential to facilitate the delivery of drugs to the brain, opening up new avenues for the development of treatments targeting diseases of the central nervous system (CNS). The obstacle faced in central nervous system disorders stems from the formidable task of traversing the blood–brain barrier (BBB) for pharmaceutical agents. Nearly 98% of small molecule-based drugs and nearly 100%
-
HD_BPMDS: a curated binary pattern multitarget dataset of Huntington’s disease–targeting agents J. Cheminfom. (IF 8.6) Pub Date : 2023-11-17 Sven Marcel Stefan, Jens Pahnke, Vigneshwaran Namasivayam
The discovery of both distinctive lead molecules and novel drug targets is a great challenge in drug discovery, which particularly accounts for orphan diseases. Huntington’s disease (HD) is an orphan, neurodegenerative disease of which the pathology is well-described. However, its pathophysiological background and molecular mechanisms are poorly understood. To date, only 2 drugs have been approved
-
Correction: MOFGalaxyNet: a social network analysis for predicting guest accessibility in metal–organic frameworks utilizing graph convolutional networks J. Cheminfom. (IF 8.6) Pub Date : 2023-11-15 Mehrdad Jalali, A. D. Dinga Wonanke, Christof Wöll
Correction: Journal of Cheminformatics (2023) 15:94 https://doi.org/10.1186/s13321-023-00764-2 Following publication of the original article [1], we have been informed that Fig. 4 is missing, and instead of Fig. 4, Fig. 3 has been repeated as Fig. 4. The original article [1] has been corrected. Jalali M, Wonanke ADD, Wöll C (2023) MOFGalaxyNet: a social network analysis for predicting guest accessibility
-
Exploring the known chemical space of the plant kingdom: insights into taxonomic patterns, knowledge gaps, and bioactive regions J. Cheminfom. (IF 8.6) Pub Date : 2023-11-10 Daniel Domingo-Fernández, Yojana Gadiya, Sarah Mubeen, David Healey, Bryan H. Norman, Viswa Colluru
Plants are one of the primary sources of natural products for drug development. However, despite centuries of research, only a limited region of the phytochemical space has been studied. To understand the scope of what is explored versus unexplored in the phytochemical space, we begin by reconstructing the known chemical space of the plant kingdom, mapping the distribution of secondary metabolites
-
Continuous symmetry and chirality measures: approximate algorithms for large molecular structures J. Cheminfom. (IF 8.6) Pub Date : 2023-11-09 Gil Alon, Yuval Ben-Haim, Inbal Tuvi-Arad
Quantifying imperfect symmetry of molecules can help explore the sources, roles and extent of structural distortion. Based on the established methodology of continuous symmetry and chirality measures, we develop a set of three-dimensional molecular descriptors to estimate distortion of large structures. These three-dimensional geometrical descriptors quantify the gap between the desirable symmetry
-
Evaluating uncertainty-based active learning for accelerating the generalization of molecular property prediction J. Cheminfom. (IF 8.6) Pub Date : 2023-11-08 Tianzhixi Yin, Gihan Panapitiya, Elizabeth D. Coda, Emily G. Saldanha
Deep learning models have proven to be a powerful tool for the prediction of molecular properties for applications including drug design and the development of energy storage materials. However, in order to learn accurate and robust structure–property mappings, these models require large amounts of data which can be a challenge to collect given the time and resource-intensive nature of experimental
-
Determining the parent and associated fragment formulae in mass spectrometry via the parent subformula graph J. Cheminfom. (IF 8.6) Pub Date : 2023-11-07 Sean Li, Björn Bohman, Gavin R. Flematti, Dylan Jayatilaka
Identifying the molecular formula and fragmentation reactions of an unknown compound from its mass spectrum is crucial in areas such as natural product chemistry and metabolomics. We propose a method for identifying the correct candidate formula of an unidentified natural product from its mass spectrum. The method involves scoring the plausibility of parent candidate formulae based on a parent subformula
-
DeepSA: a deep-learning driven predictor of compound synthesis accessibility J. Cheminfom. (IF 8.6) Pub Date : 2023-11-02 Shihang Wang, Lin Wang, Fenglei Li, Fang Bai
With the continuous development of artificial intelligence technology, more and more computational models for generating new molecules are being developed. However, we are often confronted with the question of whether these compounds are easy or difficult to synthesize, which refers to synthetic accessibility of compounds. In this study, a deep learning based computational model called DeepSA, was
-
EasyDock: customizable and scalable docking tool J. Cheminfom. (IF 8.6) Pub Date : 2023-11-01 Guzel Minibaeva, Aleksandra Ivanova, Pavel Polishchuk
Docking of large compound collections becomes an important procedure to discover new chemical entities. Screening of large sets of compounds may also occur in de novo design projects guided by molecular docking. To facilitate these processes, there is a need for automated tools capable of efficiently docking a large number of molecules using multiple computational nodes within a reasonable timeframe
-
DeepDelta: predicting ADMET improvements of molecular derivatives with deep learning J. Cheminfom. (IF 8.6) Pub Date : 2023-10-27 Zachary Fralish, Ashley Chen, Paul Skaluba, Daniel Reker
Established molecular machine learning models process individual molecules as inputs to predict their biological, chemical, or physical properties. However, such algorithms require large datasets and have not been optimized to predict property differences between molecules, limiting their ability to learn from smaller datasets and to directly compare the anticipated properties of two molecules. Many
-
Art driven by visual representations of chemical space J. Cheminfom. (IF 8.6) Pub Date : 2023-10-21 Daniela Gaytán-Hernández, Ana L. Chávez-Hernández, Edgar López-López, Jazmín Miranda-Salas, Fernanda I. Saldívar-González, José L. Medina-Franco
Science and art have been connected for centuries. With the development of new computational methods, new scientific disciplines have emerged, such as computational chemistry, and related fields, such as cheminformatics. Chemoinformatics is grounded on the chemical space concept: a multi-descriptor space in which chemical structures are described. In several practical applications, visual representations
-
Prediction of organic compound aqueous solubility using machine learning: a comparison study of descriptor-based and fingerprints-based models J. Cheminfom. (IF 8.6) Pub Date : 2023-10-18 Arash Tayyebi, Ali S Alshami, Zeinab Rabiei, Xue Yu, Nadhem Ismail, Musabbir Jahan Talukder, Jason Power
A reliable and practical determination of a chemical species’ solubility in water continues to be examined using empirical observations and exhaustive experimental studies alone. Predictions of chemical solubility in water using data-driven algorithms can allow us to create a rationally designed, efficient, and cost-effective tool for next-generation materials and chemical formulations. We present
-
Cheminformatics Microservice: unifying access to open cheminformatics toolkits J. Cheminfom. (IF 8.6) Pub Date : 2023-10-16 Venkata Chandrasekhar, Nisha Sharma, Jonas Schaub, Christoph Steinbeck, Kohulan Rajan
In recent years, cheminformatics has experienced significant advancements through the development of new open-source software tools based on various cheminformatics programming toolkits. However, adopting these toolkits presents challenges, including proper installation, setup, deployment, and compatibility management. In this work, we present the Cheminformatics Microservice. This open-source solution
-
Pmf-cpi: assessing drug selectivity with a pretrained multi-functional model for compound–protein interactions J. Cheminfom. (IF 8.6) Pub Date : 2023-10-14 Nan Song, Ruihan Dong, Yuqian Pu, Ercheng Wang, Junhai Xu, Fei Guo
Compound–protein interactions (CPI) play significant roles in drug development. To avoid side effects, it is also crucial to evaluate drug selectivity when binding to different targets. However, most selectivity prediction models are constructed for specific targets with limited data. In this study, we present a pretrained multi-functional model for compound–protein interaction prediction (PMF-CPI)
-
Analysis of metabolites in human gut: illuminating the design of gut-targeted drugs J. Cheminfom. (IF 8.6) Pub Date : 2023-10-13 Alberto Gil-Pichardo, Andrés Sánchez-Ruiz, Gonzalo Colmenarejo
Gut-targeted drugs provide a new drug modality besides that of oral, systemic molecules, that could tap into the growing knowledge of gut metabolites of bacterial or host origin and their involvement in biological processes and health through their interaction with gut targets (bacterial or host, too). Understanding the properties of gut metabolites can provide guidance for the design of gut-targeted
-
Bloom filters for molecules J. Cheminfom. (IF 8.6) Pub Date : 2023-10-12 Jorge Medina, Andrew D. White
Ultra-large chemical libraries are reaching 10s to 100s of billions of molecules. A challenge for these libraries is to efficiently check if a proposed molecule is present. Here we propose and study Bloom filters for testing if a molecule is present in a set using either string or fingerprint representations. Bloom filters are small enough to hold billions of molecules in just a few GB of memory and
-
MOFGalaxyNet: a social network analysis for predicting guest accessibility in metal–organic frameworks utilizing graph convolutional networks J. Cheminfom. (IF 8.6) Pub Date : 2023-10-11 Mehrdad Jalali, A. D. Dinga Wonanke, Christof Wöll
Metal–organic frameworks (MOFs), are porous crystalline structures comprising of metal ions or clusters intricately linked with organic entities, displaying topological diversity and effortless chemical flexibility. These characteristics render them apt for multifarious applications such as adsorption, separation, sensing, and catalysis. Predominantly, the distinctive properties and prospective utility
-
Paths to cheminformatics: Q&A with Ann M. Richard J. Cheminfom. (IF 8.6) Pub Date : 2023-10-05 Ann M. Richard
Recently we described [1] an initiative to put a spotlight on diversity within the cheminformatics community. As part of that we initiated a series of interviews, and this article continues that series. Prior to her retirement in March of this year, Ann M. Richard, PhD, worked as a Research Chemist at the U.S. Environmental Protection Agency (EPA) for the entirety of her 36-year professional career
-
Paths to cheminformatics: Q&A with Nathaniel Charest J. Cheminfom. (IF 8.6) Pub Date : 2023-10-05 Nathaniel Charest
Recently we described [1] an initiative to put a spotlight on diversity within the cheminformatics community. As part of that we initiated a series of interviews, and this article is part of that series. Nathaniel Charest, Ph.D. (Chemistry), is a Chemist at the U.S. Environmental Protection Agency, where he develops in-silico and theoretical techniques to advance toxicological analysis and non-targeted
-
ScaffoldGVAE: scaffold generation and hopping of drug molecules via a variational autoencoder based on multi-view graph neural networks J. Cheminfom. (IF 8.6) Pub Date : 2023-10-04 Chao Hu, Song Li, Chenxing Yang, Jun Chen, Yi Xiong, Guisheng Fan, Hao Liu, Liang Hong
In recent years, drug design has been revolutionized by the application of deep learning techniques, and molecule generation is a crucial aspect of this transformation. However, most of the current deep learning approaches do not explicitly consider and apply scaffold hopping strategy when performing molecular generation. In this work, we propose ScaffoldGVAE, a variational autoencoder based on multi-view
-
Reliable and accurate prediction of basic pK\(_a\) values in nitrogen compounds: the pK\(_a\) shift in supramolecular systems as a case study J. Cheminfom. (IF 8.6) Pub Date : 2023-09-28 Jackson J. Alcázar, Alessandra C. Misad Saide, Paola R. Campodónico
This article presents a quantitative structure–activity relationship (QSAR) approach for predicting the acid dissociation constant (pK $$_a$$ ) of nitrogenous compounds, including those within supramolecular complexes based on cucurbiturils. The model combines low-cost quantum mechanical calculations with QSAR methodology and linear regressions to achieve accurate predictions for a broad range of nitrogen-containing
-
A molecule perturbation software library and its application to study the effects of molecular design constraints J. Cheminfom. (IF 8.6) Pub Date : 2023-09-26 Alan Kerstjens, Hans De Winter
Computational molecular design can yield chemically unreasonable compounds when performed carelessly. A popular strategy to mitigate this risk is mimicking reference chemistry. This is commonly achieved by restricting the way in which molecules are constructed or modified. While it is well established that such an approach helps in designing chemically appealing molecules, concerns about these restrictions
-
Probabilistic generative transformer language models for generative design of molecules J. Cheminfom. (IF 8.6) Pub Date : 2023-09-25 Lai Wei, Nihang Fu, Yuqi Song, Qian Wang, Jianjun Hu
Self-supervised neural language models have recently found wide applications in the generative design of organic molecules and protein sequences as well as representation learning for downstream structure classification and functional prediction. However, most of the existing deep learning models for molecule design usually require a big dataset and have a black-box architecture, which makes it difficult
-
Mass-Suite: a novel open-source python package for high-resolution mass spectrometry data analysis J. Cheminfom. (IF 8.6) Pub Date : 2023-09-23 Ximin Hu, Derek Mar, Nozomi Suzuki, Bowei Zhang, Katherine T. Peter, David A. C. Beck, Edward P. Kolodziej
Mass-Suite (MSS) is a Python-based, open-source software package designed to analyze high-resolution mass spectrometry (HRMS)-based non-targeted analysis (NTA) data, particularly for water quality assessment and other environmental applications. MSS provides flexible, user-defined workflows for HRMS data processing and analysis, including both basic functions (e.g., feature extraction, data reduction
-
Iterative machine learning-based chemical similarity search to identify novel chemical inhibitors J. Cheminfom. (IF 8.6) Pub Date : 2023-09-23 Prasannavenkatesh Durai, Sue Jung Lee, Jae Wook Lee, Cheol-Ho Pan, Keunwan Park
Machine learning-based chemical screening has made substantial progress in recent years. However, these predictions often have low accuracy and high uncertainty when identifying new active chemical scaffolds. Hence, a high proportion of retrieved compounds are not structurally novel. In this study, we proposed a strategy to address this issue by iteratively optimizing an evolutionary chemical binding
-
Novel multi-objective affinity approach allows to identify pH-specific μ-opioid receptor agonists J. Cheminfom. (IF 8.6) Pub Date : 2023-09-19 Christopher Secker, Konstantin Fackeldey, Marcus Weber, Sourav Ray, Christoph Gorgulla, Christof Schütte
Opioids are essential pharmaceuticals due to their analgesic properties, however, lethal side effects, addiction, and opioid tolerance are extremely challenging. The development of novel molecules targeting the $$\mu$$ -opioid receptor (MOR) in inflamed, but not in healthy tissue, could significantly reduce these unwanted effects. Finding such novel molecules can be achieved by maximizing the binding
-
Exploring the ability of machine learning-based virtual screening models to identify the functional groups responsible for binding J. Cheminfom. (IF 8.6) Pub Date : 2023-09-19 Thomas E. Hadfield, Jack Scantlebury, Charlotte M. Deane
Many recently proposed structure-based virtual screening models appear to be able to accurately distinguish high affinity binders from non-binders. However, several recent studies have shown that they often do so by exploiting ligand-specific biases in the dataset, rather than identifying favourable intermolecular interactions in the input protein-ligand complex. In this work we propose a novel approach
-
Integrating synthetic accessibility with AI-based generative drug design J. Cheminfom. (IF 8.6) Pub Date : 2023-09-19 Maud Parrot, Hamza Tajmouati, Vinicius Barros Ribeiro da Silva, Brian Ross Atwood, Robin Fourcade, Yann Gaston-Mathé, Nicolas Do Huu, Quentin Perron
Generative models are frequently used for de novo design in drug discovery projects to propose new molecules. However, the question of whether or not the generated molecules can be synthesized is not systematically taken into account during generation, even though being able to synthesize the generated molecules is a fundamental requirement for such methods to be useful in practice. Methods have been
-
School of cheminformatics in Latin America J. Cheminfom. (IF 8.6) Pub Date : 2023-09-19 Karla Gonzalez-Ponce, Carolina Horta Andrade, Fiona Hunter, Johannes Kirchmair, Karina Martinez-Mayorga, José L. Medina-Franco, Matthias Rarey, Alexander Tropsha, Alexandre Varnek, Barbara Zdrazil
We report the major highlights of the School of Cheminformatics in Latin America, Mexico City, November 24–25, 2022. Six lectures, one workshop, and one roundtable with four editors were presented during an online public event with speakers from academia, big pharma, and public research institutions. One thousand one hundred eighty-one students and academics from seventy-nine countries registered for
-
Extended study on atomic featurization in graph neural networks for molecular property prediction J. Cheminfom. (IF 8.6) Pub Date : 2023-09-19 Agnieszka Wojtuch, Tomasz Danel, Sabina Podlewska, Łukasz Maziarka
Graph neural networks have recently become a standard method for analyzing chemical compounds. In the field of molecular property prediction, the emphasis is now on designing new model architectures, and the importance of atom featurization is oftentimes belittled. When contrasting two graph neural networks, the use of different representations possibly leads to incorrect attribution of the results
-
rMSIfragment: improving MALDI-MSI lipidomics through automated in-source fragment annotation J. Cheminfom. (IF 8.6) Pub Date : 2023-09-15 Gerard Baquer, Lluc Sementé, Pere Ràfols, Lucía Martín-Saiz, Christoph Bookmeyer, José A. Fernández, Xavier Correig, María García-Altares
Matrix-Assisted Laser Desorption Ionization Mass Spectrometry Imaging (MALDI-MSI) spatially resolves the chemical composition of tissues. Lipids are of particular interest, as they influence important biological processes in health and disease. However, the identification of lipids in MALDI-MSI remains a challenge due to the lack of chromatographic separation or untargeted tandem mass spectrometry
-
pyPept: a python library to generate atomistic 2D and 3D representations of peptides J. Cheminfom. (IF 8.6) Pub Date : 2023-09-12 Rodrigo Ochoa, J. B. Brown, Thomas Fox
We present pyPept, a set of executables and underlying python-language classes to easily create, manipulate, and analyze peptide molecules using the FASTA, HELM, or recently-developed BILN notations. The framework enables the analysis of both pure proteinogenic peptides as well as those with non-natural amino acids, including support to assemble a customizable monomer library, without requiring programming
-
Patch seriation to visualize data and model parameters J. Cheminfom. (IF 8.6) Pub Date : 2023-09-09 Rita Lasfar, Gergely Tóth
We developed a new seriation merit function for enhancing the visual information of data matrices. A local similarity matrix is calculated, where the average similarity of neighbouring objects is calculated in a limited variable space and a global function is constructed to maximize the local similarities and cluster them into patches by simple row and column ordering. The method identifies data clusters
-
LOGICS: Learning optimal generative distribution for designing de novo chemical structures J. Cheminfom. (IF 8.6) Pub Date : 2023-09-07 Bongsung Bae, Haelee Bae, Hojung Nam
In recent years, the field of computational drug design has made significant strides in the development of artificial intelligence (AI) models for the generation of de novo chemical compounds with desired properties and biological activities, such as enhanced binding affinity to target proteins. These high-affinity compounds have the potential to be developed into more potent therapeutics for a broad
-
LogD7.4 prediction enhanced by transferring knowledge from chromatographic retention time, microscopic pKa and logP J. Cheminfom. (IF 8.6) Pub Date : 2023-09-05 Yitian Wang, Jiacheng Xiong, Fu Xiao, Wei Zhang, Kaiyang Cheng, Jingxin Rao, Buying Niu, Xiaochu Tong, Ning Qu, Runze Zhang, Dingyan Wang, Kaixian Chen, Xutong Li, Mingyue Zheng
Lipophilicity is a fundamental physical property that significantly affects various aspects of drug behavior, including solubility, permeability, metabolism, distribution, protein binding, and toxicity. Accurate prediction of lipophilicity, measured by the logD7.4 value (the distribution coefficient between n-octanol and buffer at physiological pH 7.4), is crucial for successful drug discovery and
-
Similarity-based pairing improves efficiency of siamese neural networks for regression tasks and uncertainty quantification J. Cheminfom. (IF 8.6) Pub Date : 2023-08-30 Yumeng Zhang, Janosch Menke, Jiazhen He, Eva Nittinger, Christian Tyrchan, Oliver Koch, Hongtao Zhao
Siamese networks, representing a novel class of neural networks, consist of two identical subnetworks sharing weights but receiving different inputs. Here we present a similarity-based pairing method for generating compound pairs to train Siamese neural networks for regression tasks. In comparison with the conventional exhaustive pairing, it reduces the algorithm complexity from O(n2) to O(n). It also
-
3DDPDs: describing protein dynamics for proteochemometric bioactivity prediction. A case for (mutant) G protein-coupled receptors J. Cheminfom. (IF 8.6) Pub Date : 2023-08-28 Marina Gorostiola González, Remco L. van den Broek, Thomas G. M. Braun, Magdalini Chatzopoulou, Willem Jespers, Adriaan P. IJzerman, Laura H. Heitman, Gerard J. P. van Westen
Proteochemometric (PCM) modelling is a powerful computational drug discovery tool used in bioactivity prediction of potential drug candidates relying on both chemical and protein information. In PCM features are computed to describe small molecules and proteins, which directly impact the quality of the predictive models. State-of-the-art protein descriptors, however, are calculated from the protein
-
Practical guidelines for the use of gradient boosting for molecular property prediction J. Cheminfom. (IF 8.6) Pub Date : 2023-08-28 Davide Boldini, Francesca Grisoni, Daniel Kuhn, Lukas Friedrich, Stephan A. Sieber
Decision tree ensembles are among the most robust, high-performing and computationally efficient machine learning approaches for quantitative structure–activity relationship (QSAR) modeling. Among them, gradient boosting has recently garnered particular attention, for its performance in data science competitions, virtual screening campaigns, and bioactivity prediction. However, different variants of
-
A deep learning framework for accurate reaction prediction and its application on high-throughput experimentation data J. Cheminfom. (IF 8.6) Pub Date : 2023-08-11 Baiqing Li, Shimin Su, Chan Zhu, Jie Lin, Xinyue Hu, Lebin Su, Zhunzhun Yu, Kuangbiao Liao, Hongming Chen
In recent years, it has been seen that artificial intelligence (AI) starts to bring revolutionary changes to chemical synthesis. However, the lack of suitable ways of representing chemical reactions and the scarceness of reaction data has limited the wider application of AI to reaction prediction. Here, we introduce a novel reaction representation, GraphRXN, for reaction prediction. It utilizes a universal
-
DeepSAT: Learning Molecular Structures from Nuclear Magnetic Resonance Data J. Cheminfom. (IF 8.6) Pub Date : 2023-08-07 Hyun Woo Kim, Chen Zhang, Raphael Reher, Mingxun Wang, Kelsey L. Alexander, Louis-Félix Nothias, Yoo Kyong Han, Hyeji Shin, Ki Yong Lee, Kyu Hyeong Lee, Myeong Ji Kim, Pieter C. Dorrestein, William H. Gerwick, Garrison W. Cottrell
The identification of molecular structure is essential for understanding chemical diversity and for developing drug leads from small molecules. Nevertheless, the structure elucidation of small molecules by Nuclear Magnetic Resonance (NMR) experiments is often a long and non-trivial process that relies on years of training. To achieve this process efficiently, several spectral databases have been established
-
Correction: Improving the quality of chemical language model outcomes with atom-in-SMILES tokenization J. Cheminfom. (IF 8.6) Pub Date : 2023-07-31 Umit V. Ucak, Islambek Ashyrmamatov, Juyong Lee
Correction: Journal of Cheminformatics (2023) 15:55 https://doi.org/10.1186/s13321-023-00725-9 Following publication of the original article [1], the authors requested to correct the funding number NRF-2019M3E5D4066898 to NRF-2022M3E5F3081268. Funding This work was supported by the National Research Foundation of Korea (NRF) grants funded by the Korean government (MSIT) (Nos. NRF-2022M3E5F3081268,
-
PyL3dMD: Python LAMMPS 3D molecular descriptors package J. Cheminfom. (IF 8.6) Pub Date : 2023-07-28 Pawan Panwar, Quanpeng Yang, Ashlie Martini
Molecular descriptors characterize the biological, physical, and chemical properties of molecules and have long been used for understanding molecular interactions and facilitating materials design. Some of the most robust descriptors are derived from geometrical representations of molecules, called 3-dimensional (3D) descriptors. When calculated from molecular dynamics (MD) simulation trajectories
-
Correction: Reconstruction of lossless molecular representations from fingerprints J. Cheminfom. (IF 8.6) Pub Date : 2023-07-26 Umit V. Ucak, Islambek Ashyrmamatov, Juyong Lee
Correction : Journal of Cheminformatics (2023) 15:26 https://doi.org/10.1186/s13321-023-00693-0 Following publication of the original article [1], the authors requested to correct the funding number NRF-2019M3E5D4066898 to NRF-2022M3E5F3081268. Funding This work was supported by National Research Foundation of Korea (NRF) Grants funded by the Korean government (MSIT) (Nos. NRF-2022M3E5F3081268, NRF-2022R1C1C1005080
-
Explaining compound activity predictions with a substructure-aware loss for graph neural networks J. Cheminfom. (IF 8.6) Pub Date : 2023-07-25 Kenza Amara, Raquel Rodríguez-Pérez, José Jiménez-Luna
Explainable machine learning is increasingly used in drug discovery to help rationalize compound property predictions. Feature attribution techniques are popular choices to identify which molecular substructures are responsible for a predicted property change. However, established molecular feature attribution methods have so far displayed low performance for popular deep learning algorithms such as
-
The BinDiscover database: a biology-focused meta-analysis tool for 156,000 GC–TOF MS metabolome samples J. Cheminfom. (IF 8.6) Pub Date : 2023-07-20 Parker Ladd Bremer, Gert Wohlgemuth, Oliver Fiehn
Metabolomics by gas chromatography/mass spectrometry (GC/MS) provides a standardized and reliable platform for understanding small molecule biology. Since 2005, the West Coast Metabolomics Center at the University of California at Davis has collated GC/MS metabolomics data from over 156,000 samples and 2000 studies into the standardized BinBase database. We believe that the observations from these
-
Force field-inspired transformer network assisted crystal density prediction for energetic materials J. Cheminfom. (IF 8.6) Pub Date : 2023-07-19 Jun-Xuan Jin, Gao-Peng Ren, Jianjian Hu, Yingzhe Liu, Yunhu Gao, Ke-Jun Wu, Yuchen He
Machine learning has great potential in predicting chemical information with greater precision than traditional methods. Graph neural networks (GNNs) have become increasingly popular in recent years, as they can automatically learn the features of the molecule from the graph, significantly reducing the time needed to find and build molecular descriptors. However, the application of machine learning
-
PINNED: identifying characteristics of druggable human proteins using an interpretable neural network J. Cheminfom. (IF 8.6) Pub Date : 2023-07-19 Michael Cunningham, Danielle Pins, Zoltán Dezső, Maricel Torrent, Aparna Vasanthakumar, Abhishek Pandey
The identification of human proteins that are amenable to pharmacologic modulation without significant off-target effects remains an important unsolved challenge. Computational methods have been devised to identify features which distinguish between “druggable” and “undruggable” proteins, finding that protein sequence, tissue and cellular localization, biological role, and position in the protein–protein
-
TB-IECS: an accurate machine learning-based scoring function for virtual screening J. Cheminfom. (IF 8.6) Pub Date : 2023-07-04 Xujun Zhang, Chao Shen, Dejun Jiang, Jintu Zhang, Qing Ye, Lei Xu, Tingjun Hou, Peichen Pan, Yu Kang
Machine learning-based scoring functions (MLSFs) have shown potential for improving virtual screening capabilities over classical scoring functions (SFs). Due to the high computational cost in the process of feature generation, the numbers of descriptors used in MLSFs and the characterization of protein–ligand interactions are always limited, which may affect the overall accuracy and efficiency. Here
-
Improving reproducibility and reusability in the Journal of Cheminformatics J. Cheminfom. (IF 8.6) Pub Date : 2023-06-30 Charles Tapley Hoyt, Barbara Zdrazil, Rajarshi Guha, Nina Jeliazkova, Karina Martinez-Mayorga, Eva Nittinger
Reproducibility is essential to independently determine the veracity and integrity of scholarly work. Researchers in the life and natural sciences continue to struggle to achieve reproducibility due to many factors and facets of modern research including methodologies [1], data sharing [2], problematic incentive structures [3], widespread systematic issues with peer review [4,5,6], and others. Together
-
A comparison of approaches to accessing existing biological and chemical relational databases via SPARQL J. Cheminfom. (IF 8.6) Pub Date : 2023-06-20 Jakub Galgonek, Jiří Vondrášek
Current biological and chemical research is increasingly dependent on the reusability of previously acquired data, which typically come from various sources. Consequently, there is a growing need for database systems and databases stored in them to be interoperable with each other. One of the possible solutions to address this issue is to use systems based on Semantic Web technologies, namely on the
-
ProfhEX: AI-based platform for small molecules liability profiling J. Cheminfom. (IF 8.6) Pub Date : 2023-06-09 Filippo Lunghini, Anna Fava, Vincenzo Pisapia, Francesco Sacco, Daniela Iaconis, Andrea Rosario Beccari
Off-target drug interactions are a major reason for candidate failure in the drug discovery process. Anticipating potential drug’s adverse effects in the early stages is necessary to minimize health risks to patients, animal testing, and economical costs. With the constantly increasing size of virtual screening libraries, AI-driven methods can be exploited as first-tier screening tools to provide liability
-
Adaptive language model training for molecular design J. Cheminfom. (IF 8.6) Pub Date : 2023-06-08 Andrew E. Blanchard, Debsindhu Bhowmik, Zachary Fox, John Gounley, Jens Glaser, Belinda S. Akpa, Stephan Irle
The vast size of chemical space necessitates computational approaches to automate and accelerate the design of molecular sequences to guide experimental efforts for drug discovery. Genetic algorithms provide a useful framework to incrementally generate molecules by applying mutations to known chemical structures. Recently, masked language models have been applied to automate the mutation process by
-
RetroRanker: leveraging reaction changes to improve retrosynthesis prediction through re-ranking J. Cheminfom. (IF 8.6) Pub Date : 2023-06-08 Junren Li, Lei Fang, Jian-Guang Lou
Retrosynthesis is an important task in organic chemistry. Recently, numerous data-driven approaches have achieved promising results in this task. However, in practice, these data-driven methods might lead to sub-optimal outcomes by making predictions based on the training data distribution, a phenomenon we refer as frequency bias. For example, in template-based approaches, low-ranked predictions are
-
Tora3D: an autoregressive torsion angle prediction model for molecular 3D conformation generation J. Cheminfom. (IF 8.6) Pub Date : 2023-06-07 Zimei Zhang, Gang Wang, Rui Li, Lin Ni, RunZe Zhang, Kaiyang Cheng, Qun Ren, Xiangtai Kong, Shengkun Ni, Xiaochu Tong, Li Luo, Dingyan Wang, Xiaojie Lu, Mingyue Zheng, Xutong Li
Three-dimensional (3D) conformations of a small molecule profoundly affect its binding to the target of interest, the resulting biological effects, and its disposition in living organisms, but it is challenging to accurately characterize the conformational ensemble experimentally. Here, we proposed an autoregressive torsion angle prediction model Tora3D for molecular 3D conformer generation. Rather