-
CardioGenAI: a machine learning-based framework for re-engineering drugs for reduced hERG liability J. Cheminfom. (IF 7.1) Pub Date : 2025-03-05 Gregory W. Kyro, Matthew T. Martin, Eric D. Watt, Victor S. Batista
The link between in vitro hERG ion channel inhibition and subsequent in vivo QT interval prolongation, a critical risk factor for the development of arrythmias such as Torsade de Pointes, is so well established that in vitro hERG activity alone is often sufficient to end the development of an otherwise promising drug candidate. It is therefore of tremendous interest to develop advanced methods for
-
Achieving well-informed decision-making in drug discovery: a comprehensive calibration study using neural network-based structure-activity models J. Cheminfom. (IF 7.1) Pub Date : 2025-03-05 Hannah Rosa Friesacher, Ola Engkvist, Lewis Mervin, Yves Moreau, Adam Arany
In the drug discovery process, where experiments can be costly and time-consuming, computational models that predict drug-target interactions are valuable tools to accelerate the development of new therapeutic agents. Estimating the uncertainty inherent in these neural network predictions provides valuable information that facilitates optimal decision-making when risk assessment is crucial. However
-
GNINA 1.3: the next increment in molecular docking with deep learning J. Cheminfom. (IF 7.1) Pub Date : 2025-03-02 Andrew T. McNutt, Yanjing Li, Rocco Meli, Rishal Aggarwal, David Ryan Koes
Computer-aided drug design has the potential to significantly reduce the astronomical costs of drug development, and molecular docking plays a prominent role in this process. Molecular docking is an in silico technique that predicts the bound 3D conformations of two molecules, a necessary step for other structure-based methods. Here, we describe version 1.3 of the open-source molecular docking software
-
Syn-MolOpt: a synthesis planning-driven molecular optimization method using data-derived functional reaction templates J. Cheminfom. (IF 7.1) Pub Date : 2025-03-02 Xiaodan Yin, Xiaorui Wang, Zhenxing Wu, Qin Li, Yu Kang, Yafeng Deng, Pei Luo, Huanxiang Liu, Guqin Shi, Zheng Wang, Xiaojun Yao, Chang-Yu Hsieh, Tingjun Hou
Molecular optimization is a crucial step in drug development, involving structural modifications to improve the desired properties of drug candidates. Although many deep-learning-based molecular optimization algorithms have been proposed and may perform well on benchmarks, they usually do not pay sufficient attention to the synthesizability of molecules, resulting in optimized compounds difficult to
-
Improving route development using convergent retrosynthesis planning J. Cheminfom. (IF 7.1) Pub Date : 2025-02-27 Paula Torren-Peraire, Jonas Verhoeven, Dorota Herman, Hugo Ceulemans, Igor V. Tetko, Jörg K. Wegner
Retrosynthesis consists of recursively breaking down a target molecule to produce a synthesis route composed of readily accessible building blocks. In recent years, computer-aided synthesis planning methods have allowed a greater exploration of potential synthesis routes, combining state-of-the-art machine-learning methods with chemical knowledge. However, these methods are generally developed to produce
-
Pretraining graph transformers with atom-in-a-molecule quantum properties for improved ADMET modeling J. Cheminfom. (IF 7.1) Pub Date : 2025-02-27 Alessio Fallani, Ramil Nugmanov, Jose Arjona-Medina, Jörg Kurt Wegner, Alexandre Tkatchenko, Kostiantyn Chernichenko
We evaluate the impact of pretraining Graph Transformer architectures on atom-level quantum-mechanical features for the modeling of absorption, distribution, metabolism, excretion, and toxicity (ADMET) properties of drug-like compounds. We compare this pretraining strategy with two others: one based on molecular quantum properties (specifically the HOMO-LUMO gap) and one using a self-supervised atom
-
Infrared spectrum analysis of organic molecules with neural networks using standard reference data sets in combination with real-world data J. Cheminfom. (IF 7.1) Pub Date : 2025-02-26 Dev Punjabi, Yu-Chieh Huang, Laura Holzhauer, Pierre Tremouilhac, Pascal Friederich, Nicole Jung, Stefan Bräse
In this study, we propose a neural network- based approach to analyze IR spectra and detect the presence of functional groups. Our neural network architecture is based on the concept of learning split representations. We demonstrate that our method achieves favorable validation performance using the NIST dataset. Furthermore, by incorporating additional data from the open-access research data repository
-
DrugDiff: small molecule diffusion model with flexible guidance towards molecular properties J. Cheminfom. (IF 7.1) Pub Date : 2025-02-25 Marie Oestreich, Erinc Merdivan, Michael Lee, Joachim L. Schultze, Marie Piraud, Matthias Becker
With the cost/yield-ratio of drug development becoming increasingly unfavourable, recent work has explored machine learning to accelerate early stages of the development process. Given the current success of deep generative models across domains, we here investigated their application to the property-based proposal of new small molecules for drug development. Specifically, we trained a latent diffusion
-
kMoL: an open-source machine and federated learning library for drug discovery J. Cheminfom. (IF 7.1) Pub Date : 2025-02-25 Romeo Cozac, Haris Hasic, Jun Jin Choong, Vincent Richard, Loic Beheshti, Cyrille Froehlich, Takuto Koyama, Shigeyuki Matsumoto, Ryosuke Kojima, Hiroaki Iwata, Aki Hasegawa, Takao Otsuka, Yasushi Okuno
Machine learning is quickly becoming integral to drug discovery pipelines, particularly quantitative structure-activity relationship (QSAR) and absorption, distribution, metabolism, and excretion (ADME) tasks. Graph Convolutional Network (GCN) models have proven especially promising due to their inherent ability to model molecular structures using graph-based representations. However, maximizing the
-
Predictive modeling of biodegradation pathways using transformer architectures J. Cheminfom. (IF 7.1) Pub Date : 2025-02-17 Liam Brydon, Kunyang Zhang, Gillian Dobbie, Katerina Taškova, Jörg Simon Wicker
In recent years, the integration of machine learning techniques into chemical reaction product prediction has opened new avenues for understanding and predicting the behaviour of chemical substances. The necessity for such predictive methods stems from the growing regulatory and social awareness of the environmental consequences associated with the persistence and accumulation of chemical residues
-
ROASMI: accelerating small molecule identification by repurposing retention data J. Cheminfom. (IF 7.1) Pub Date : 2025-02-14 Fang-Yuan Sun, Ying-Hao Yin, Hui-Jun Liu, Lu-Na Shen, Xiu-Lin Kang, Gui-Zhong Xin, Li-Fang Liu, Jia-Yi Zheng
The limited replicability of retention data hinders its application in untargeted metabolomics for small molecule identification. While retention order models hold promise in addressing this issue, their predictive reliability is limited by uncertain generalizability. Here, we present the ROASMI model, which enables reliable prediction of retention order within a well-defined application domain by
-
FluoBase: a fluorinated agents database J. Cheminfom. (IF 7.1) Pub Date : 2025-02-11 Rafal Mulka, Dan Su, Wen-Shuo Huang, Li Zhang, Huaihai Huang, Xiaoyu Lai, Yao Li, Xiao-Song Xue
Organofluorine compounds, owing to their unique physicochemical properties, play an increasingly crucial role in fields such as medicine, pesticides, and advanced materials. Fluorinated reagents are indispensable for developing efficient synthetic methods for organofluorine compounds and serve as the cornerstone of organofluorine chemistry. Equally important are fluorinated functional molecules, which
-
Barlow Twins deep neural network for advanced 1D drug–target interaction prediction J. Cheminfom. (IF 7.1) Pub Date : 2025-02-05 Maximilian G. Schuh, Davide Boldini, Annkathrin I. Bohne, Stephan A. Sieber
Accurate prediction of drug–target interactions is critical for advancing drug discovery. By reducing time and cost, machine learning and deep learning can accelerate this laborious discovery process. In a novel approach, BarlowDTI, we utilise the powerful Barlow Twins architecture for feature-extraction while considering the structure of the target protein. Our method achieves state-of-the-art predictive
-
Positional embeddings and zero-shot learning using BERT for molecular-property prediction J. Cheminfom. (IF 7.1) Pub Date : 2025-02-05 Medard Edmund Mswahili, JunHa Hwang, Jagath C. Rajapakse, Kyuri Jo, Young-Seob Jeong
Recently, advancements in cheminformatics such as representation learning for chemical structures, deep learning (DL) for property prediction, data-driven discovery, and optimization of chemical data handling, have led to increased demands for handling chemical simplified molecular input line entry system (SMILES) data, particularly in text analysis tasks. These advancements have driven the need to
-
Improving drug repositioning with negative data labeling using large language models J. Cheminfom. (IF 7.1) Pub Date : 2025-02-04 Milan Picard, Mickael Leclercq, Antoine Bodein, Marie Pier Scott-Boyer, Olivier Perin, Arnaud Droit
Drug repositioning offers numerous advantages, such as faster development timelines, reduced costs, and lower failure rates in drug development. Supervised machine learning is commonly used to score drug candidates but is hindered by the lack of reliable negative data—drugs that fail due to inefficacy or toxicity— which is difficult to access, lowering their prediction accuracy and generalization.
-
PretoxTM: a text mining system for extracting treatment-related findings from preclinical toxicology reports J. Cheminfom. (IF 7.1) Pub Date : 2025-02-03 Javier Corvi, Nicolás Díaz-Roussel, José M. Fernández, Francesco Ronzano, Emilio Centeno, Pablo Accuosto, Celine Ibrahim, Shoji Asakura, Frank Bringezu, Mirjam Fröhlicher, Annika Kreuchwig, Yoko Nogami, Jeong Rih, Raul Rodriguez-Esteban, Nicolas Sajot, Joerg Wichard, Heng-Yi Michael Wu, Philip Drew, Thomas Steger-Hartmann, Alfonso Valencia, Laura I. Furlong, Salvador Capella-Gutierrez
Over the last few decades the pharmaceutical industry has generated a vast corpus of knowledge on the safety and efficacy of drugs. Much of this information is contained in toxicology reports, which summarise the results of animal studies designed to analyse the effects of the tested compound, including unintended pharmacological and toxic effects, known as treatment-related findings. Despite the potential
-
MLinvitroTox reloaded for high-throughput hazard-based prioritization of high-resolution mass spectrometry data J. Cheminfom. (IF 7.1) Pub Date : 2025-01-31 Katarzyna Arturi, Eliza J. Harris, Lilian Gasser, Beate I. Escher, Georg Braun, Robin Bosshard, Juliane Hollender
MLinvitroTox is an automated Python pipeline developed for high-throughput hazard-driven prioritization of toxicologically relevant signals detected in complex environmental samples through high-resolution tandem mass spectrometry (HRMS/MS). MLinvitroTox is a machine learning (ML) framework comprising 490 independent XGBoost classifiers trained on molecular fingerprints from chemical structures and
-
APBIO: bioactive profiling of air pollutants through inferred bioactivity signatures and prediction of novel target interactions J. Cheminfom. (IF 7.1) Pub Date : 2025-01-31 Eva Viesi, Ugo Perricone, Patrick Aloy, Rosalba Giugno
More sophisticated representations of compounds attempt to incorporate not only information on the structure and physicochemical properties of molecules, but also knowledge about their biological traits, leading to the so-called bioactivity profile. The bioactive profiling of air pollutants is challenging and crucial, as their biological activity and toxicological effects have not been deeply investigated
-
AiGPro: a multi-tasks model for profiling of GPCRs for agonist and antagonist J. Cheminfom. (IF 7.1) Pub Date : 2025-01-29 Rahul Brahma, Sunghyun Moon, Jae-Min Shin, Kwang-Hwi Cho
G protein-coupled receptors (GPCRs) play vital roles in various physiological processes, making them attractive drug discovery targets. Meanwhile, deep learning techniques have revolutionized drug discovery by facilitating efficient tools for expediting the identification and optimization of ligands. However, existing models for the GPCRs often focus on single-target or a small subset of GPCRs or employ
-
hERGAT: predicting hERG blockers using graph attention mechanism through atom- and molecule-level interaction analyses J. Cheminfom. (IF 7.1) Pub Date : 2025-01-28 Dohyeon Lee, Sunyong Yoo
The human ether-a-go-go-related gene (hERG) channel plays a critical role in the electrical activity of the heart, and its blockers can cause serious cardiotoxic effects. Thus, screening for hERG channel blockers is a crucial step in the drug development process. Many in silico models have been developed to predict hERG blockers, which can efficiently save time and resources. However, previous methods
-
The algebraic extended atom-type graph-based model for precise ligand–receptor binding affinity prediction J. Cheminfom. (IF 7.1) Pub Date : 2025-01-22 Farjana Tasnim Mukta, Md Masud Rana, Avery Meyer, Sally Ellingson, Duc D. Nguyen
Accurate prediction of ligand-receptor binding affinity is crucial in structure-based drug design, significantly impacting the development of effective drugs. Recent advances in machine learning (ML)–based scoring functions have improved these predictions, yet challenges remain in modeling complex molecular interactions. This study introduces the AGL-EAT-Score, a scoring function that integrates extended
-
StreamChol: a web-based application for predicting cholestasis J. Cheminfom. (IF 7.1) Pub Date : 2025-01-21 Pablo Rodríguez-Belenguer, Emilio Soria-Olivas, Manuel Pastor
This article introduces StreamChol, a software for developing and applying mechanistic models to predict cholestasis. StreamChol is a Streamlit application, usable as a desktop application or web-accessible software when installed on a server using a docker container. StreamChol allows a seamless integration of pharmacokinetic analyses with Machine Learning models. This integration not only enables
-
Matched pairs demonstrate robustness against inter-assay variability J. Cheminfom. (IF 7.1) Pub Date : 2025-01-20 Jochem Nelen, Horacio Pérez-Sánchez, Hans De Winter, Dries Van Rompaey
Machine learning models for chemistry require large datasets, often compiled by combining data from multiple assays. However, combining data without careful curation can introduce significant noise. While absolute values from different assays are rarely comparable, trends or differences between compounds are often assumed to be consistent. This study evaluates that assumption by analyzing potency differences
-
One size does not fit all: revising traditional paradigms for assessing accuracy of QSAR models used for virtual screening J. Cheminfom. (IF 7.1) Pub Date : 2025-01-16 James Wellnitz, Sankalp Jain, Joshua E. Hochuli, Travis Maxfield, Eugene N. Muratov, Alexander Tropsha, Alexey V. Zakharov
Traditional best practices for quantitative structure activity relationship (QSAR) modeling recommend dataset balancing and balanced accuracy (BA) as the key desired objective of model development. This study explores the value of the conventional norms in the context of using QSAR models for virtual screening of modern large and ultra-large chemical libraries. For this increasingly common task, we
-
Chemical space as a unifying theme for chemistry J. Cheminfom. (IF 7.1) Pub Date : 2025-01-16 Jean-Louis Reymond
Chemistry has diversified from a basic understanding of the elements to studying millions of highly diverse molecules and materials, which together are conceptualized as the chemical space. A map of this chemical space where distances represent similarities between compounds can represent the mutual relationships between different subfields of chemistry and help the discipline to be viewed and understood
-
Context-dependent similarity analysis of analogue series for structure–activity relationship transfer based on a concept from natural language processing J. Cheminfom. (IF 7.1) Pub Date : 2025-01-15 Atsushi Yoshimori, Jürgen Bajorath
Analogue series (AS) are generated during compound optimization in medicinal chemistry and are the major source of structure–activity relationship (SAR) information. Pairs of active AS consisting of compounds with corresponding substituents and comparable potency progression represent SAR transfer events for the same target or across different targets. We report a new computational approach to systematically
-
Fragmenstein: predicting protein–ligand structures of compounds derived from known crystallographic fragment hits using a strict conserved-binding–based methodology J. Cheminfom. (IF 7.1) Pub Date : 2025-01-13 Matteo P. Ferla, Rubén Sánchez-García, Rachael E. Skyner, Stefan Gahbauer, Jenny C. Taylor, Frank von Delft, Brian D. Marsden, Charlotte M. Deane
Current strategies centred on either merging or linking initial hits from fragment-based drug design (FBDD) crystallographic screens generally do not fully leaverage 3D structural information. We show that an algorithmic approach (Fragmenstein) that ‘stitches’ the ligand atoms from this structural information together can provide more accurate and reliable predictions for protein–ligand complex conformation
-
ADMET evaluation in drug discovery: 21. Application and industrial validation of machine learning algorithms for Caco-2 permeability prediction J. Cheminfom. (IF 7.1) Pub Date : 2025-01-10 Dong Wang, Jieyu Jin, Guqin Shi, Jingxiao Bao, Zheng Wang, Shimeng Li, Peichen Pan, Dan Li, Yu Kang, Tingjun Hou
The Caco-2 cell model has been widely used to assess the intestinal permeability of drug candidates in vitro, owing to its morphological and functional similarity to human enterocytes. While Caco-2 cell assay is considered safe and cost-effective, it is also characterized by being time-consuming. Therefore, computational models that achieve high accuracies in predicting Caco-2 permeability are crucial
-
CLAIRE: a contrastive learning-based predictor for EC number of chemical reactions J. Cheminfom. (IF 7.1) Pub Date : 2025-01-07 Zishuo Zeng, Jin Guo, Jiao Jin, Xiaozhou Luo
Predicting EC numbers for chemical reactions enables efficient enzymatic annotations for computer-aided synthesis planning. However, conventional machine learning approaches encounter challenges due to data scarcity and class imbalance. Here, we introduce CLAIRE (Contrastive Learning-based AnnotatIon for Reaction’s EC), a novel framework leveraging contrastive learning, pre-trained language model-based
-
Prediction of Pt, Ir, Ru, and Rh complexes light absorption in the therapeutic window for phototherapy using machine learning J. Cheminfom. (IF 7.1) Pub Date : 2025-01-05 V. Vigna, T. F. G. G. Cova, A. A. C. C. Pais, E. Sicilia
Effective light-based cancer treatments, such as photodynamic therapy (PDT) and photoactivated chemotherapy (PACT), rely on compounds that are activated by light efficiently, and absorb within the therapeutic window (600–850 nm). Traditional prediction methods for these light absorption properties, including Time-Dependent Density Functional Theory (TDDFT), are often computationally intensive and time-consuming
-
DeepTGIN: a novel hybrid multimodal approach using transformers and graph isomorphism networks for protein-ligand binding affinity prediction J. Cheminfom. (IF 7.1) Pub Date : 2024-12-29 Guishen Wang, Hangchen Zhang, Mengting Shao, Yuncong Feng, Chen Cao, Xiaowen Hu
Predicting protein-ligand binding affinity is essential for understanding protein-ligand interactions and advancing drug discovery. Recent research has demonstrated the advantages of sequence-based models and graph-based models. In this study, we present a novel hybrid multimodal approach, DeepTGIN, which integrates transformers and graph isomorphism networks to predict protein-ligand binding affinity
-
STOUT V2.0: SMILES to IUPAC name conversion using transformer models J. Cheminfom. (IF 7.1) Pub Date : 2024-12-27 Kohulan Rajan, Achim Zielesny, Christoph Steinbeck
Naming chemical compounds systematically is a complex task governed by a set of rules established by the International Union of Pure and Applied Chemistry (IUPAC). These rules are universal and widely accepted by chemists worldwide, but their complexity makes it challenging for individuals to consistently apply them accurately. A translation method can be employed to address this challenge. Accurate
-
Comprehensive benchmarking of computational tools for predicting toxicokinetic and physicochemical properties of chemicals J. Cheminfom. (IF 7.1) Pub Date : 2024-12-26 Domenico Gadaleta, Eva Serrano-Candelas, Rita Ortega-Vallbona, Erika Colombo, Marina Garcia de Lomana, Giada Biava, Pablo Aparicio-Sánchez, Alessandra Roncaglioni, Rafael Gozalbes, Emilio Benfenati
Ensuring the safety of chemicals for environmental and human health involves assessing physicochemical (PC) and toxicokinetic (TK) properties, which are crucial for absorption, distribution, metabolism, excretion, and toxicity (ADMET). Computational methods play a vital role in predicting these properties, given the current trends in reducing experimental approaches, especially those that involve animal
-
Correction: StreaMD: the toolkit for high-throughput molecular dynamics simulations J. Cheminfom. (IF 7.1) Pub Date : 2024-12-23 Aleksandra Ivanova, Olena Mokshyna, Pavel Polishchuk
Correction: Journal of Cheminformatics (2024) 16:123 https://doi.org/10.1186/s13321-024-00918-w Following publication of the original article [1], the authors identified that section Availability and requirements is missing. Availability and requirements Project name: StreaMD GitHub: https://github.com/ci-lab-cz/streamd Operating system(s): Linux Programming language: Python 3 Other requirements: GROMACS
-
AttenhERG: a reliable and interpretable graph neural network framework for predicting hERG channel blockers J. Cheminfom. (IF 7.1) Pub Date : 2024-12-23 Tianbiao Yang, Xiaoyu Ding, Elizabeth McMichael, Frank W. Pun, Alex Aliper, Feng Ren, Alex Zhavoronkov, Xiao Ding
Cardiotoxicity, particularly drug-induced arrhythmias, poses a significant challenge in drug development, highlighting the importance of early-stage prediction of human ether-a-go-go-related gene (hERG) toxicity. hERG encodes the pore-forming subunit of the cardiac potassium channel. Traditional methods are both costly and time-intensive, necessitating the development of computational approaches. In
-
Interface-aware molecular generative framework for protein–protein interaction modulators J. Cheminfom. (IF 7.1) Pub Date : 2024-12-20 Jianmin Wang, Jiashun Mao, Chunyan Li, Hongxin Xiang, Xun Wang, Shuang Wang, Zixu Wang, Yangyang Chen, Yuquan Li, Kyoung Tai No, Tao Song, Xiangxiang Zeng
Protein–protein interactions (PPIs) play a crucial role in numerous biochemical and biological processes. Although several structure-based molecular generative models have been developed, PPI interfaces and compounds targeting PPIs exhibit distinct physicochemical properties compared to traditional binding pockets and small-molecule drugs. As a result, generating compounds that effectively target PPIs
-
MolNexTR: a generalized deep learning model for molecular image recognition J. Cheminfom. (IF 7.1) Pub Date : 2024-12-18 Yufan Chen, Ching Ting Leung, Yong Huang, Jianwei Sun, Hao Chen, Hanyu Gao
In the field of chemical structure recognition, the task of converting molecular images into machine-readable data formats such as SMILES string stands as a significant challenge, primarily due to the varied drawing styles and conventions prevalent in chemical literature. To bridge this gap, we proposed MolNexTR, a novel image-to-graph deep learning model that collaborates to fuse the strengths of
-
FlavorMiner: a machine learning platform for extracting molecular flavor profiles from structural data J. Cheminfom. (IF 7.1) Pub Date : 2024-12-10 Fabio Herrera-Rocha, Miguel Fernández-Niño, Jorge Duitama, Mónica P. Cala, María José Chica, Ludger A. Wessjohann, Mehdi D. Davari, Andrés Fernando González Barrios
Flavor is the main factor driving consumers acceptance of food products. However, tracking the biochemistry of flavor is a formidable challenge due to the complexity of food composition. Current methodologies for linking individual molecules to flavor in foods and beverages are expensive and time-consuming. Predictive models based on machine learning (ML) are emerging as an alternative to speed up
-
Be aware of overfitting by hyperparameter optimization! J. Cheminfom. (IF 7.1) Pub Date : 2024-12-09 Igor V. Tetko, Ruud van Deursen, Guillaume Godin
Hyperparameter optimization is very frequently employed in machine learning. However, an optimization of a large space of parameters could result in overfitting of models. In recent studies on solubility prediction the authors collected seven thermodynamic and kinetic solubility datasets from different data sources. They used state-of-the-art graph-based methods and compared models developed for each
-
Human-in-the-loop active learning for goal-oriented molecule generation J. Cheminfom. (IF 7.1) Pub Date : 2024-12-09 Yasmine Nahal, Janosch Menke, Julien Martinelli, Markus Heinonen, Mikhail Kabeshov, Jon Paul Janet, Eva Nittinger, Ola Engkvist, Samuel Kaski
Machine learning (ML) systems have enabled the modelling of quantitative structure–property relationships (QSPR) and structure-activity relationships (QSAR) using existing experimental data to predict target properties for new molecules. These property predictors hold significant potential in accelerating drug discovery by guiding generative artificial intelligence (AI) agents to explore desired chemical
-
CSearch: chemical space search via virtual synthesis and global optimization J. Cheminfom. (IF 7.1) Pub Date : 2024-12-05 Hakjean Kim, Seongok Ryu, Nuri Jung, Jinsol Yang, Chaok Seok
The two key components of computational molecular design are virtually generating molecules and predicting the properties of these generated molecules. This study focuses on an effective method for molecular generation through virtual synthesis and global optimization of a given objective function. Using a pre-trained graph neural network (GNN) objective function to approximate the docking energies
-
Deepmol: an automated machine and deep learning framework for computational chemistry J. Cheminfom. (IF 7.1) Pub Date : 2024-12-05 João Correia, João Capela, Miguel Rocha
The domain of computational chemistry has experienced a significant evolution due to the introduction of Machine Learning (ML) technologies. Despite its potential to revolutionize the field, researchers are often encumbered by obstacles, such as the complexity of selecting optimal algorithms, the automation of data pre-processing steps, the necessity for adaptive feature engineering, and the assurance
-
Sort & Slice: a simple and superior alternative to hash-based folding for extended-connectivity fingerprints J. Cheminfom. (IF 7.1) Pub Date : 2024-12-03 Markus Dablander, Thierry Hanser, Renaud Lambiotte, Garrett M. Morris
Extended-connectivity fingerprints (ECFPs) are a ubiquitous tool in current cheminformatics and molecular machine learning, and one of the most prevalent molecular feature extraction techniques used for chemical prediction. Atom features learned by graph neural networks can be aggregated to compound-level representations using a large spectrum of graph pooling methods. In contrast, sets of detected
-
cidalsDB: an AI-empowered platform for anti-pathogen therapeutics research J. Cheminfom. (IF 7.1) Pub Date : 2024-11-28 Emna Harigua-Souiai, Ons Masmoudi, Samer Makni, Rafeh Oualha, Yosser Z. Abdelkrim, Sara Hamdi, Oussama Souiai, Ikram Guizani
Computer-aided drug discovery (CADD) is nurtured by late advances in big data analytics and Artificial Intelligence (AI) towards enhanced drug discovery (DD) outcomes. In this context, reliable datasets are of utmost importance. We herein present CidalsDB a novel web server for AI-assisted DD against infectious pathogens, namely Leishmania parasites and Coronaviruses. We performed a literature search
-
Group graph: a molecular graph representation with enhanced performance, efficiency and interpretability J. Cheminfom. (IF 7.1) Pub Date : 2024-11-28 Piao-Yang Cao, Yang He, Ming-Yang Cui, Xiao-Min Zhang, Qingye Zhang, Hong-Yu Zhang
The exploration of chemical space holds promise for developing influential chemical entities. Molecular representations, which reflect features of molecular structure in silico, assist in navigating chemical space appropriately. Unlike atom-level molecular representations, such as SMILES and atom graph, which can sometimes lead to confusing interpretations about chemical substructures, substructure-level
-
GT-NMR: a novel graph transformer-based approach for accurate prediction of NMR chemical shifts J. Cheminfom. (IF 7.1) Pub Date : 2024-11-26 Haochen Chen, Tao Liang, Kai Tan, Anan Wu, Xin Lu
In this work, inspired by the graph transformer, we presented an improved protocol, termed GT-NMR, which integrates 2D molecular graph representation with Transformer architecture, for accurate yet efficient prediction of NMR chemical shifts. The effectiveness of the GT-NMR was thoroughly examined with the standard nmrshiftdb2 dataset, 37 natural products and structural elucidation of 11 pairs of natural
-
Suitability of large language models for extraction of high-quality chemical reaction dataset from patent literature J. Cheminfom. (IF 7.1) Pub Date : 2024-11-26 Sarveswara Rao Vangala, Sowmya Ramaswamy Krishnan, Navneet Bung, Dhandapani Nandagopal, Gomathi Ramasamy, Satyam Kumar, Sridharan Sankaran, Rajgopal Srinivasan, Arijit Roy
With the advent of artificial intelligence (AI), it is now possible to design diverse and novel molecules from previously unexplored chemical space. However, a challenge for chemists is the synthesis of such molecules. Recently, there have been attempts to develop AI models for retrosynthesis prediction, which rely on the availability of a high-quality training dataset. In this work, we explore the
-
Molecular identification via molecular fingerprint extraction from atomic force microscopy images J. Cheminfom. (IF 7.1) Pub Date : 2024-11-25 Manuel González Lastre, Pablo Pou, Miguel Wiche, Daniel Ebeling, Andre Schirmeisen, Rubén Pérez
Non–Contact Atomic Force Microscopy with CO–functionalized metal tips (referred to as HR-AFM) provides access to the internal structure of individual molecules adsorbed on a surface with totally unprecedented resolution. Previous works have shown that deep learning (DL) models can retrieve the chemical and structural information encoded in a 3D stack of constant-height HR–AFM images, leading to molecular
-
A systematic review of deep learning chemical language models in recent era J. Cheminfom. (IF 7.1) Pub Date : 2024-11-18 Hector Flores-Hernandez, Emmanuel Martinez-Ledesma
Discovering new chemical compounds with specific properties can provide advantages for fields that rely on materials for their development, although this task comes at a high cost in terms of complexity and resources. Since the beginning of the data age, deep learning techniques have revolutionized the process of designing molecules by analyzing and learning from representations of molecular data,
-
QSPRpred: a Flexible Open-Source Quantitative Structure-Property Relationship Modelling Tool J. Cheminfom. (IF 7.1) Pub Date : 2024-11-14 Helle W. van den Maagdenberg, Martin Šícho, David Alencar Araripe, Sohvi Luukkonen, Linde Schoenmaker, Michiel Jespers, Olivier J. M. Béquignon, Marina Gorostiola González, Remco L. van den Broek, Andrius Bernatavicius, J. G. Coen van Hasselt, Piet. H. van der Graaf, Gerard J. P. van Westen
Building reliable and robust quantitative structure–property relationship (QSPR) models is a challenging task. First, the experimental data needs to be obtained, analyzed and curated. Second, the number of available methods is continuously growing and evaluating different algorithms and methodologies can be arduous. Finally, the last hurdle that researchers face is to ensure the reproducibility of
-
Accelerated hit identification with target evaluation, deep learning and automated labs: prospective validation in IRAK1 J. Cheminfom. (IF 7.1) Pub Date : 2024-11-14 Gintautas Kamuntavičius, Alvaro Prat, Tanya Paquet, Orestis Bastas, Hisham Abdel Aty, Qing Sun, Carsten B. Andersen, John Harman, Marc E. Siladi, Daniel R. Rines, Sarah J. L. Flatters, Roy Tal, Povilas Norvaišas
Target identification and hit identification can be transformed through the application of biomedical knowledge analysis, AI-driven virtual screening and robotic cloud lab systems. However there are few prospective studies that evaluate the efficacy of such integrated approaches. We synergistically integrate our in-house-developed target evaluation (SpectraView) and deep-learning-driven virtual screening
-
Comparative evaluation of methods for the prediction of protein–ligand binding sites J. Cheminfom. (IF 7.1) Pub Date : 2024-11-11 Javier S. Utgés, Geoffrey J. Barton
The accurate identification of protein–ligand binding sites is of critical importance in understanding and modulating protein function. Accordingly, ligand binding site prediction has remained a research focus for over three decades with over 50 methods developed and a change of paradigm from geometry-based to machine learning. In this work, we collate 13 ligand binding site predictors, spanning 30 years
-
Protein-small molecule binding site prediction based on a pre-trained protein language model with contrastive learning J. Cheminfom. (IF 7.1) Pub Date : 2024-11-06 Jue Wang, Yufan Liu, Boxue Tian
Predicting protein-small molecule binding sites, the initial step in structure-guided drug design, remains challenging for proteins lacking experimentally derived ligand-bound structures. Here, we propose CLAPE-SMB, which integrates a pre-trained protein language model with contrastive learning to provide high accuracy predictions of small molecule binding sites that can accommodate proteins without
-
Milestones in chemoinformatics: global view of the field J. Cheminfom. (IF 7.1) Pub Date : 2024-11-05 Jürgen Bajorath
Over the past ~ 25 years, chemoinformatics has evolved as a scientific discipline, with a strong foundation in pharmaceutical research and scientific roots that can be traced back to the late 1950s. It covers a wide methodological spectrum and is perhaps best positioned in the greater context of chemical information science. Herein, the chemoinformatics discipline is delineated, characteristic (and
-
StreaMD: the toolkit for high-throughput molecular dynamics simulations J. Cheminfom. (IF 7.1) Pub Date : 2024-11-05 Aleksandra Ivanova, Olena Mokshyna, Pavel Polishchuk
Molecular dynamics simulations serve as a prevalent approach for investigating the dynamic behaviour of proteins and protein–ligand complexes. Due to its versatility and speed, GROMACS stands out as a commonly utilized software platform for executing molecular dynamics simulations. However, its effective utilization requires substantial expertise in configuring, executing, and interpreting molecular
-
Quantitative structure–activity relationships of chemical bioactivity toward proteins associated with molecular initiating events of organ-specific toxicity J. Cheminfom. (IF 7.1) Pub Date : 2024-11-05 Domenico Gadaleta, Marina Garcia de Lomana, Eva Serrano-Candelas, Rita Ortega-Vallbona, Rafael Gozalbes, Alessandra Roncaglioni, Emilio Benfenati
The adverse outcome pathway (AOP) concept has gained attention as a way to explore the mechanism of chemical toxicity. In this study, quantitative structure–activity relationship (QSAR) models were developed to predict compound activity toward protein targets relevant to molecular initiating events (MIE) upstream of organ-specific toxicities, namely liver steatosis, cholestasis, nephrotoxicity, neural
-
Accurate prediction of protein–ligand interactions by combining physical energy functions and graph-neural networks J. Cheminfom. (IF 7.1) Pub Date : 2024-11-04 Yiyu Hong, Junsu Ha, Jaemin Sim, Chae Jo Lim, Kwang-Seok Oh, Ramakrishnan Chandrasekaran, Bomin Kim, Jieun Choi, Junsu Ko, Woong-Hee Shin, Juyong Lee
We introduce an advanced model for predicting protein–ligand interactions. Our approach combines the strengths of graph neural networks with physics-based scoring methods. Existing structure-based machine-learning models for protein–ligand binding prediction often fall short in practical virtual screening scenarios, hindered by the intricacies of binding poses, the chemical diversity of drug-like molecules
-
Searching chemical databases in the pre-history of cheminformatics J. Cheminfom. (IF 7.1) Pub Date : 2024-11-04 Peter Willett
This article highlights research from the last century that has provided the basis for the searching techniques that are used in present-day cheminformatics systems, and thus provides an acknowledgement of the contributions made by early pioneers in the field.
-
GTransCYPs: an improved graph transformer neural network with attention pooling for reliably predicting CYP450 inhibitors J. Cheminfom. (IF 7.1) Pub Date : 2024-10-29 Candra Zonyfar, Soualihou Ngnamsie Njimbouom, Sophia Mosalla, Jeong-Dong Kim
State‑of‑the‑art medical studies proved that predicting CYP450 enzyme inhibitors is beneficial in the early stage of drug discovery. However, accurate machine learning-based (ML) in silico methods for predicting CYP450 inhibitors remains challenging. Here, we introduce GTransCYPs, an improved graph neural network (GNN) with a transformer mechanism for predicting CYP450 inhibitors. This model significantly
-
A comprehensive comparison of deep learning-based compound-target interaction prediction models to unveil guiding design principles J. Cheminfom. (IF 7.1) Pub Date : 2024-10-28 Sina Abdollahi, Darius P. Schaub, Madalena Barroso, Nora C. Laubach, Wiebke Hutwelker, Ulf Panzer, S.øren W. Gersting, Stefan Bonn
The evaluation of compound-target interactions (CTIs) is at the heart of drug discovery efforts. Given the substantial time and monetary costs of classical experimental screening, significant efforts have been dedicated to develop deep learning-based models that can accurately predict CTIs. A comprehensive comparison of these models on a large, curated CTI dataset is, however, still lacking. Here,