-
FluoBase: a fluorinated agents database J. Cheminfom. (IF 7.1) Pub Date : 2025-02-11 Rafal Mulka, Dan Su, Wen-Shuo Huang, Li Zhang, Huaihai Huang, Xiaoyu Lai, Yao Li, Xiao-Song Xue
Organofluorine compounds, owing to their unique physicochemical properties, play an increasingly crucial role in fields such as medicine, pesticides, and advanced materials. Fluorinated reagents are indispensable for developing efficient synthetic methods for organofluorine compounds and serve as the cornerstone of organofluorine chemistry. Equally important are fluorinated functional molecules, which
-
Barlow Twins deep neural network for advanced 1D drug–target interaction prediction J. Cheminfom. (IF 7.1) Pub Date : 2025-02-05 Maximilian G. Schuh, Davide Boldini, Annkathrin I. Bohne, Stephan A. Sieber
Accurate prediction of drug–target interactions is critical for advancing drug discovery. By reducing time and cost, machine learning and deep learning can accelerate this laborious discovery process. In a novel approach, BarlowDTI, we utilise the powerful Barlow Twins architecture for feature-extraction while considering the structure of the target protein. Our method achieves state-of-the-art predictive
-
Positional embeddings and zero-shot learning using BERT for molecular-property prediction J. Cheminfom. (IF 7.1) Pub Date : 2025-02-05 Medard Edmund Mswahili, JunHa Hwang, Jagath C. Rajapakse, Kyuri Jo, Young-Seob Jeong
Recently, advancements in cheminformatics such as representation learning for chemical structures, deep learning (DL) for property prediction, data-driven discovery, and optimization of chemical data handling, have led to increased demands for handling chemical simplified molecular input line entry system (SMILES) data, particularly in text analysis tasks. These advancements have driven the need to
-
Improving drug repositioning with negative data labeling using large language models J. Cheminfom. (IF 7.1) Pub Date : 2025-02-04 Milan Picard, Mickael Leclercq, Antoine Bodein, Marie Pier Scott-Boyer, Olivier Perin, Arnaud Droit
Drug repositioning offers numerous advantages, such as faster development timelines, reduced costs, and lower failure rates in drug development. Supervised machine learning is commonly used to score drug candidates but is hindered by the lack of reliable negative data—drugs that fail due to inefficacy or toxicity— which is difficult to access, lowering their prediction accuracy and generalization.
-
PretoxTM: a text mining system for extracting treatment-related findings from preclinical toxicology reports J. Cheminfom. (IF 7.1) Pub Date : 2025-02-03 Javier Corvi, Nicolás Díaz-Roussel, José M. Fernández, Francesco Ronzano, Emilio Centeno, Pablo Accuosto, Celine Ibrahim, Shoji Asakura, Frank Bringezu, Mirjam Fröhlicher, Annika Kreuchwig, Yoko Nogami, Jeong Rih, Raul Rodriguez-Esteban, Nicolas Sajot, Joerg Wichard, Heng-Yi Michael Wu, Philip Drew, Thomas Steger-Hartmann, Alfonso Valencia, Laura I. Furlong, Salvador Capella-Gutierrez
Over the last few decades the pharmaceutical industry has generated a vast corpus of knowledge on the safety and efficacy of drugs. Much of this information is contained in toxicology reports, which summarise the results of animal studies designed to analyse the effects of the tested compound, including unintended pharmacological and toxic effects, known as treatment-related findings. Despite the potential
-
MLinvitroTox reloaded for high-throughput hazard-based prioritization of high-resolution mass spectrometry data J. Cheminfom. (IF 7.1) Pub Date : 2025-01-31 Katarzyna Arturi, Eliza J. Harris, Lilian Gasser, Beate I. Escher, Georg Braun, Robin Bosshard, Juliane Hollender
MLinvitroTox is an automated Python pipeline developed for high-throughput hazard-driven prioritization of toxicologically relevant signals detected in complex environmental samples through high-resolution tandem mass spectrometry (HRMS/MS). MLinvitroTox is a machine learning (ML) framework comprising 490 independent XGBoost classifiers trained on molecular fingerprints from chemical structures and
-
APBIO: bioactive profiling of air pollutants through inferred bioactivity signatures and prediction of novel target interactions J. Cheminfom. (IF 7.1) Pub Date : 2025-01-31 Eva Viesi, Ugo Perricone, Patrick Aloy, Rosalba Giugno
More sophisticated representations of compounds attempt to incorporate not only information on the structure and physicochemical properties of molecules, but also knowledge about their biological traits, leading to the so-called bioactivity profile. The bioactive profiling of air pollutants is challenging and crucial, as their biological activity and toxicological effects have not been deeply investigated
-
AiGPro: a multi-tasks model for profiling of GPCRs for agonist and antagonist J. Cheminfom. (IF 7.1) Pub Date : 2025-01-29 Rahul Brahma, Sunghyun Moon, Jae-Min Shin, Kwang-Hwi Cho
G protein-coupled receptors (GPCRs) play vital roles in various physiological processes, making them attractive drug discovery targets. Meanwhile, deep learning techniques have revolutionized drug discovery by facilitating efficient tools for expediting the identification and optimization of ligands. However, existing models for the GPCRs often focus on single-target or a small subset of GPCRs or employ
-
hERGAT: predicting hERG blockers using graph attention mechanism through atom- and molecule-level interaction analyses J. Cheminfom. (IF 7.1) Pub Date : 2025-01-28 Dohyeon Lee, Sunyong Yoo
The human ether-a-go-go-related gene (hERG) channel plays a critical role in the electrical activity of the heart, and its blockers can cause serious cardiotoxic effects. Thus, screening for hERG channel blockers is a crucial step in the drug development process. Many in silico models have been developed to predict hERG blockers, which can efficiently save time and resources. However, previous methods
-
The algebraic extended atom-type graph-based model for precise ligand–receptor binding affinity prediction J. Cheminfom. (IF 7.1) Pub Date : 2025-01-22 Farjana Tasnim Mukta, Md Masud Rana, Avery Meyer, Sally Ellingson, Duc D. Nguyen
Accurate prediction of ligand-receptor binding affinity is crucial in structure-based drug design, significantly impacting the development of effective drugs. Recent advances in machine learning (ML)–based scoring functions have improved these predictions, yet challenges remain in modeling complex molecular interactions. This study introduces the AGL-EAT-Score, a scoring function that integrates extended
-
StreamChol: a web-based application for predicting cholestasis J. Cheminfom. (IF 7.1) Pub Date : 2025-01-21 Pablo Rodríguez-Belenguer, Emilio Soria-Olivas, Manuel Pastor
This article introduces StreamChol, a software for developing and applying mechanistic models to predict cholestasis. StreamChol is a Streamlit application, usable as a desktop application or web-accessible software when installed on a server using a docker container. StreamChol allows a seamless integration of pharmacokinetic analyses with Machine Learning models. This integration not only enables
-
Matched pairs demonstrate robustness against inter-assay variability J. Cheminfom. (IF 7.1) Pub Date : 2025-01-20 Jochem Nelen, Horacio Pérez-Sánchez, Hans De Winter, Dries Van Rompaey
Machine learning models for chemistry require large datasets, often compiled by combining data from multiple assays. However, combining data without careful curation can introduce significant noise. While absolute values from different assays are rarely comparable, trends or differences between compounds are often assumed to be consistent. This study evaluates that assumption by analyzing potency differences
-
One size does not fit all: revising traditional paradigms for assessing accuracy of QSAR models used for virtual screening J. Cheminfom. (IF 7.1) Pub Date : 2025-01-16 James Wellnitz, Sankalp Jain, Joshua E. Hochuli, Travis Maxfield, Eugene N. Muratov, Alexander Tropsha, Alexey V. Zakharov
Traditional best practices for quantitative structure activity relationship (QSAR) modeling recommend dataset balancing and balanced accuracy (BA) as the key desired objective of model development. This study explores the value of the conventional norms in the context of using QSAR models for virtual screening of modern large and ultra-large chemical libraries. For this increasingly common task, we
-
Chemical space as a unifying theme for chemistry J. Cheminfom. (IF 7.1) Pub Date : 2025-01-16 Jean-Louis Reymond
Chemistry has diversified from a basic understanding of the elements to studying millions of highly diverse molecules and materials, which together are conceptualized as the chemical space. A map of this chemical space where distances represent similarities between compounds can represent the mutual relationships between different subfields of chemistry and help the discipline to be viewed and understood
-
Context-dependent similarity analysis of analogue series for structure–activity relationship transfer based on a concept from natural language processing J. Cheminfom. (IF 7.1) Pub Date : 2025-01-15 Atsushi Yoshimori, Jürgen Bajorath
Analogue series (AS) are generated during compound optimization in medicinal chemistry and are the major source of structure–activity relationship (SAR) information. Pairs of active AS consisting of compounds with corresponding substituents and comparable potency progression represent SAR transfer events for the same target or across different targets. We report a new computational approach to systematically
-
Fragmenstein: predicting protein–ligand structures of compounds derived from known crystallographic fragment hits using a strict conserved-binding–based methodology J. Cheminfom. (IF 7.1) Pub Date : 2025-01-13 Matteo P. Ferla, Rubén Sánchez-García, Rachael E. Skyner, Stefan Gahbauer, Jenny C. Taylor, Frank von Delft, Brian D. Marsden, Charlotte M. Deane
Current strategies centred on either merging or linking initial hits from fragment-based drug design (FBDD) crystallographic screens generally do not fully leaverage 3D structural information. We show that an algorithmic approach (Fragmenstein) that ‘stitches’ the ligand atoms from this structural information together can provide more accurate and reliable predictions for protein–ligand complex conformation
-
ADMET evaluation in drug discovery: 21. Application and industrial validation of machine learning algorithms for Caco-2 permeability prediction J. Cheminfom. (IF 7.1) Pub Date : 2025-01-10 Dong Wang, Jieyu Jin, Guqin Shi, Jingxiao Bao, Zheng Wang, Shimeng Li, Peichen Pan, Dan Li, Yu Kang, Tingjun Hou
The Caco-2 cell model has been widely used to assess the intestinal permeability of drug candidates in vitro, owing to its morphological and functional similarity to human enterocytes. While Caco-2 cell assay is considered safe and cost-effective, it is also characterized by being time-consuming. Therefore, computational models that achieve high accuracies in predicting Caco-2 permeability are crucial
-
CLAIRE: a contrastive learning-based predictor for EC number of chemical reactions J. Cheminfom. (IF 7.1) Pub Date : 2025-01-07 Zishuo Zeng, Jin Guo, Jiao Jin, Xiaozhou Luo
Predicting EC numbers for chemical reactions enables efficient enzymatic annotations for computer-aided synthesis planning. However, conventional machine learning approaches encounter challenges due to data scarcity and class imbalance. Here, we introduce CLAIRE (Contrastive Learning-based AnnotatIon for Reaction’s EC), a novel framework leveraging contrastive learning, pre-trained language model-based
-
Prediction of Pt, Ir, Ru, and Rh complexes light absorption in the therapeutic window for phototherapy using machine learning J. Cheminfom. (IF 7.1) Pub Date : 2025-01-05 V. Vigna, T. F. G. G. Cova, A. A. C. C. Pais, E. Sicilia
Effective light-based cancer treatments, such as photodynamic therapy (PDT) and photoactivated chemotherapy (PACT), rely on compounds that are activated by light efficiently, and absorb within the therapeutic window (600–850 nm). Traditional prediction methods for these light absorption properties, including Time-Dependent Density Functional Theory (TDDFT), are often computationally intensive and time-consuming
-
DeepTGIN: a novel hybrid multimodal approach using transformers and graph isomorphism networks for protein-ligand binding affinity prediction J. Cheminfom. (IF 7.1) Pub Date : 2024-12-29 Guishen Wang, Hangchen Zhang, Mengting Shao, Yuncong Feng, Chen Cao, Xiaowen Hu
Predicting protein-ligand binding affinity is essential for understanding protein-ligand interactions and advancing drug discovery. Recent research has demonstrated the advantages of sequence-based models and graph-based models. In this study, we present a novel hybrid multimodal approach, DeepTGIN, which integrates transformers and graph isomorphism networks to predict protein-ligand binding affinity
-
STOUT V2.0: SMILES to IUPAC name conversion using transformer models J. Cheminfom. (IF 7.1) Pub Date : 2024-12-27 Kohulan Rajan, Achim Zielesny, Christoph Steinbeck
Naming chemical compounds systematically is a complex task governed by a set of rules established by the International Union of Pure and Applied Chemistry (IUPAC). These rules are universal and widely accepted by chemists worldwide, but their complexity makes it challenging for individuals to consistently apply them accurately. A translation method can be employed to address this challenge. Accurate
-
Comprehensive benchmarking of computational tools for predicting toxicokinetic and physicochemical properties of chemicals J. Cheminfom. (IF 7.1) Pub Date : 2024-12-26 Domenico Gadaleta, Eva Serrano-Candelas, Rita Ortega-Vallbona, Erika Colombo, Marina Garcia de Lomana, Giada Biava, Pablo Aparicio-Sánchez, Alessandra Roncaglioni, Rafael Gozalbes, Emilio Benfenati
Ensuring the safety of chemicals for environmental and human health involves assessing physicochemical (PC) and toxicokinetic (TK) properties, which are crucial for absorption, distribution, metabolism, excretion, and toxicity (ADMET). Computational methods play a vital role in predicting these properties, given the current trends in reducing experimental approaches, especially those that involve animal
-
Correction: StreaMD: the toolkit for high-throughput molecular dynamics simulations J. Cheminfom. (IF 7.1) Pub Date : 2024-12-23 Aleksandra Ivanova, Olena Mokshyna, Pavel Polishchuk
Correction: Journal of Cheminformatics (2024) 16:123 https://doi.org/10.1186/s13321-024-00918-w Following publication of the original article [1], the authors identified that section Availability and requirements is missing. Availability and requirements Project name: StreaMD GitHub: https://github.com/ci-lab-cz/streamd Operating system(s): Linux Programming language: Python 3 Other requirements: GROMACS
-
AttenhERG: a reliable and interpretable graph neural network framework for predicting hERG channel blockers J. Cheminfom. (IF 7.1) Pub Date : 2024-12-23 Tianbiao Yang, Xiaoyu Ding, Elizabeth McMichael, Frank W. Pun, Alex Aliper, Feng Ren, Alex Zhavoronkov, Xiao Ding
Cardiotoxicity, particularly drug-induced arrhythmias, poses a significant challenge in drug development, highlighting the importance of early-stage prediction of human ether-a-go-go-related gene (hERG) toxicity. hERG encodes the pore-forming subunit of the cardiac potassium channel. Traditional methods are both costly and time-intensive, necessitating the development of computational approaches. In
-
Interface-aware molecular generative framework for protein–protein interaction modulators J. Cheminfom. (IF 7.1) Pub Date : 2024-12-20 Jianmin Wang, Jiashun Mao, Chunyan Li, Hongxin Xiang, Xun Wang, Shuang Wang, Zixu Wang, Yangyang Chen, Yuquan Li, Kyoung Tai No, Tao Song, Xiangxiang Zeng
Protein–protein interactions (PPIs) play a crucial role in numerous biochemical and biological processes. Although several structure-based molecular generative models have been developed, PPI interfaces and compounds targeting PPIs exhibit distinct physicochemical properties compared to traditional binding pockets and small-molecule drugs. As a result, generating compounds that effectively target PPIs
-
MolNexTR: a generalized deep learning model for molecular image recognition J. Cheminfom. (IF 7.1) Pub Date : 2024-12-18 Yufan Chen, Ching Ting Leung, Yong Huang, Jianwei Sun, Hao Chen, Hanyu Gao
In the field of chemical structure recognition, the task of converting molecular images into machine-readable data formats such as SMILES string stands as a significant challenge, primarily due to the varied drawing styles and conventions prevalent in chemical literature. To bridge this gap, we proposed MolNexTR, a novel image-to-graph deep learning model that collaborates to fuse the strengths of
-
FlavorMiner: a machine learning platform for extracting molecular flavor profiles from structural data J. Cheminfom. (IF 7.1) Pub Date : 2024-12-10 Fabio Herrera-Rocha, Miguel Fernández-Niño, Jorge Duitama, Mónica P. Cala, María José Chica, Ludger A. Wessjohann, Mehdi D. Davari, Andrés Fernando González Barrios
Flavor is the main factor driving consumers acceptance of food products. However, tracking the biochemistry of flavor is a formidable challenge due to the complexity of food composition. Current methodologies for linking individual molecules to flavor in foods and beverages are expensive and time-consuming. Predictive models based on machine learning (ML) are emerging as an alternative to speed up
-
Be aware of overfitting by hyperparameter optimization! J. Cheminfom. (IF 7.1) Pub Date : 2024-12-09 Igor V. Tetko, Ruud van Deursen, Guillaume Godin
Hyperparameter optimization is very frequently employed in machine learning. However, an optimization of a large space of parameters could result in overfitting of models. In recent studies on solubility prediction the authors collected seven thermodynamic and kinetic solubility datasets from different data sources. They used state-of-the-art graph-based methods and compared models developed for each
-
Human-in-the-loop active learning for goal-oriented molecule generation J. Cheminfom. (IF 7.1) Pub Date : 2024-12-09 Yasmine Nahal, Janosch Menke, Julien Martinelli, Markus Heinonen, Mikhail Kabeshov, Jon Paul Janet, Eva Nittinger, Ola Engkvist, Samuel Kaski
Machine learning (ML) systems have enabled the modelling of quantitative structure–property relationships (QSPR) and structure-activity relationships (QSAR) using existing experimental data to predict target properties for new molecules. These property predictors hold significant potential in accelerating drug discovery by guiding generative artificial intelligence (AI) agents to explore desired chemical
-
CSearch: chemical space search via virtual synthesis and global optimization J. Cheminfom. (IF 7.1) Pub Date : 2024-12-05 Hakjean Kim, Seongok Ryu, Nuri Jung, Jinsol Yang, Chaok Seok
The two key components of computational molecular design are virtually generating molecules and predicting the properties of these generated molecules. This study focuses on an effective method for molecular generation through virtual synthesis and global optimization of a given objective function. Using a pre-trained graph neural network (GNN) objective function to approximate the docking energies
-
Deepmol: an automated machine and deep learning framework for computational chemistry J. Cheminfom. (IF 7.1) Pub Date : 2024-12-05 João Correia, João Capela, Miguel Rocha
The domain of computational chemistry has experienced a significant evolution due to the introduction of Machine Learning (ML) technologies. Despite its potential to revolutionize the field, researchers are often encumbered by obstacles, such as the complexity of selecting optimal algorithms, the automation of data pre-processing steps, the necessity for adaptive feature engineering, and the assurance
-
Sort & Slice: a simple and superior alternative to hash-based folding for extended-connectivity fingerprints J. Cheminfom. (IF 7.1) Pub Date : 2024-12-03 Markus Dablander, Thierry Hanser, Renaud Lambiotte, Garrett M. Morris
Extended-connectivity fingerprints (ECFPs) are a ubiquitous tool in current cheminformatics and molecular machine learning, and one of the most prevalent molecular feature extraction techniques used for chemical prediction. Atom features learned by graph neural networks can be aggregated to compound-level representations using a large spectrum of graph pooling methods. In contrast, sets of detected
-
cidalsDB: an AI-empowered platform for anti-pathogen therapeutics research J. Cheminfom. (IF 7.1) Pub Date : 2024-11-28 Emna Harigua-Souiai, Ons Masmoudi, Samer Makni, Rafeh Oualha, Yosser Z. Abdelkrim, Sara Hamdi, Oussama Souiai, Ikram Guizani
Computer-aided drug discovery (CADD) is nurtured by late advances in big data analytics and Artificial Intelligence (AI) towards enhanced drug discovery (DD) outcomes. In this context, reliable datasets are of utmost importance. We herein present CidalsDB a novel web server for AI-assisted DD against infectious pathogens, namely Leishmania parasites and Coronaviruses. We performed a literature search
-
Group graph: a molecular graph representation with enhanced performance, efficiency and interpretability J. Cheminfom. (IF 7.1) Pub Date : 2024-11-28 Piao-Yang Cao, Yang He, Ming-Yang Cui, Xiao-Min Zhang, Qingye Zhang, Hong-Yu Zhang
The exploration of chemical space holds promise for developing influential chemical entities. Molecular representations, which reflect features of molecular structure in silico, assist in navigating chemical space appropriately. Unlike atom-level molecular representations, such as SMILES and atom graph, which can sometimes lead to confusing interpretations about chemical substructures, substructure-level
-
GT-NMR: a novel graph transformer-based approach for accurate prediction of NMR chemical shifts J. Cheminfom. (IF 7.1) Pub Date : 2024-11-26 Haochen Chen, Tao Liang, Kai Tan, Anan Wu, Xin Lu
In this work, inspired by the graph transformer, we presented an improved protocol, termed GT-NMR, which integrates 2D molecular graph representation with Transformer architecture, for accurate yet efficient prediction of NMR chemical shifts. The effectiveness of the GT-NMR was thoroughly examined with the standard nmrshiftdb2 dataset, 37 natural products and structural elucidation of 11 pairs of natural
-
Suitability of large language models for extraction of high-quality chemical reaction dataset from patent literature J. Cheminfom. (IF 7.1) Pub Date : 2024-11-26 Sarveswara Rao Vangala, Sowmya Ramaswamy Krishnan, Navneet Bung, Dhandapani Nandagopal, Gomathi Ramasamy, Satyam Kumar, Sridharan Sankaran, Rajgopal Srinivasan, Arijit Roy
With the advent of artificial intelligence (AI), it is now possible to design diverse and novel molecules from previously unexplored chemical space. However, a challenge for chemists is the synthesis of such molecules. Recently, there have been attempts to develop AI models for retrosynthesis prediction, which rely on the availability of a high-quality training dataset. In this work, we explore the
-
Molecular identification via molecular fingerprint extraction from atomic force microscopy images J. Cheminfom. (IF 7.1) Pub Date : 2024-11-25 Manuel González Lastre, Pablo Pou, Miguel Wiche, Daniel Ebeling, Andre Schirmeisen, Rubén Pérez
Non–Contact Atomic Force Microscopy with CO–functionalized metal tips (referred to as HR-AFM) provides access to the internal structure of individual molecules adsorbed on a surface with totally unprecedented resolution. Previous works have shown that deep learning (DL) models can retrieve the chemical and structural information encoded in a 3D stack of constant-height HR–AFM images, leading to molecular
-
A systematic review of deep learning chemical language models in recent era J. Cheminfom. (IF 7.1) Pub Date : 2024-11-18 Hector Flores-Hernandez, Emmanuel Martinez-Ledesma
Discovering new chemical compounds with specific properties can provide advantages for fields that rely on materials for their development, although this task comes at a high cost in terms of complexity and resources. Since the beginning of the data age, deep learning techniques have revolutionized the process of designing molecules by analyzing and learning from representations of molecular data,
-
QSPRpred: a Flexible Open-Source Quantitative Structure-Property Relationship Modelling Tool J. Cheminfom. (IF 7.1) Pub Date : 2024-11-14 Helle W. van den Maagdenberg, Martin Šícho, David Alencar Araripe, Sohvi Luukkonen, Linde Schoenmaker, Michiel Jespers, Olivier J. M. Béquignon, Marina Gorostiola González, Remco L. van den Broek, Andrius Bernatavicius, J. G. Coen van Hasselt, Piet. H. van der Graaf, Gerard J. P. van Westen
Building reliable and robust quantitative structure–property relationship (QSPR) models is a challenging task. First, the experimental data needs to be obtained, analyzed and curated. Second, the number of available methods is continuously growing and evaluating different algorithms and methodologies can be arduous. Finally, the last hurdle that researchers face is to ensure the reproducibility of
-
Accelerated hit identification with target evaluation, deep learning and automated labs: prospective validation in IRAK1 J. Cheminfom. (IF 7.1) Pub Date : 2024-11-14 Gintautas Kamuntavičius, Alvaro Prat, Tanya Paquet, Orestis Bastas, Hisham Abdel Aty, Qing Sun, Carsten B. Andersen, John Harman, Marc E. Siladi, Daniel R. Rines, Sarah J. L. Flatters, Roy Tal, Povilas Norvaišas
Target identification and hit identification can be transformed through the application of biomedical knowledge analysis, AI-driven virtual screening and robotic cloud lab systems. However there are few prospective studies that evaluate the efficacy of such integrated approaches. We synergistically integrate our in-house-developed target evaluation (SpectraView) and deep-learning-driven virtual screening
-
Comparative evaluation of methods for the prediction of protein–ligand binding sites J. Cheminfom. (IF 7.1) Pub Date : 2024-11-11 Javier S. Utgés, Geoffrey J. Barton
The accurate identification of protein–ligand binding sites is of critical importance in understanding and modulating protein function. Accordingly, ligand binding site prediction has remained a research focus for over three decades with over 50 methods developed and a change of paradigm from geometry-based to machine learning. In this work, we collate 13 ligand binding site predictors, spanning 30 years
-
Protein-small molecule binding site prediction based on a pre-trained protein language model with contrastive learning J. Cheminfom. (IF 7.1) Pub Date : 2024-11-06 Jue Wang, Yufan Liu, Boxue Tian
Predicting protein-small molecule binding sites, the initial step in structure-guided drug design, remains challenging for proteins lacking experimentally derived ligand-bound structures. Here, we propose CLAPE-SMB, which integrates a pre-trained protein language model with contrastive learning to provide high accuracy predictions of small molecule binding sites that can accommodate proteins without
-
Milestones in chemoinformatics: global view of the field J. Cheminfom. (IF 7.1) Pub Date : 2024-11-05 Jürgen Bajorath
Over the past ~ 25 years, chemoinformatics has evolved as a scientific discipline, with a strong foundation in pharmaceutical research and scientific roots that can be traced back to the late 1950s. It covers a wide methodological spectrum and is perhaps best positioned in the greater context of chemical information science. Herein, the chemoinformatics discipline is delineated, characteristic (and
-
StreaMD: the toolkit for high-throughput molecular dynamics simulations J. Cheminfom. (IF 7.1) Pub Date : 2024-11-05 Aleksandra Ivanova, Olena Mokshyna, Pavel Polishchuk
Molecular dynamics simulations serve as a prevalent approach for investigating the dynamic behaviour of proteins and protein–ligand complexes. Due to its versatility and speed, GROMACS stands out as a commonly utilized software platform for executing molecular dynamics simulations. However, its effective utilization requires substantial expertise in configuring, executing, and interpreting molecular
-
Quantitative structure–activity relationships of chemical bioactivity toward proteins associated with molecular initiating events of organ-specific toxicity J. Cheminfom. (IF 7.1) Pub Date : 2024-11-05 Domenico Gadaleta, Marina Garcia de Lomana, Eva Serrano-Candelas, Rita Ortega-Vallbona, Rafael Gozalbes, Alessandra Roncaglioni, Emilio Benfenati
The adverse outcome pathway (AOP) concept has gained attention as a way to explore the mechanism of chemical toxicity. In this study, quantitative structure–activity relationship (QSAR) models were developed to predict compound activity toward protein targets relevant to molecular initiating events (MIE) upstream of organ-specific toxicities, namely liver steatosis, cholestasis, nephrotoxicity, neural
-
Accurate prediction of protein–ligand interactions by combining physical energy functions and graph-neural networks J. Cheminfom. (IF 7.1) Pub Date : 2024-11-04 Yiyu Hong, Junsu Ha, Jaemin Sim, Chae Jo Lim, Kwang-Seok Oh, Ramakrishnan Chandrasekaran, Bomin Kim, Jieun Choi, Junsu Ko, Woong-Hee Shin, Juyong Lee
We introduce an advanced model for predicting protein–ligand interactions. Our approach combines the strengths of graph neural networks with physics-based scoring methods. Existing structure-based machine-learning models for protein–ligand binding prediction often fall short in practical virtual screening scenarios, hindered by the intricacies of binding poses, the chemical diversity of drug-like molecules
-
Searching chemical databases in the pre-history of cheminformatics J. Cheminfom. (IF 7.1) Pub Date : 2024-11-04 Peter Willett
This article highlights research from the last century that has provided the basis for the searching techniques that are used in present-day cheminformatics systems, and thus provides an acknowledgement of the contributions made by early pioneers in the field.
-
GTransCYPs: an improved graph transformer neural network with attention pooling for reliably predicting CYP450 inhibitors J. Cheminfom. (IF 7.1) Pub Date : 2024-10-29 Candra Zonyfar, Soualihou Ngnamsie Njimbouom, Sophia Mosalla, Jeong-Dong Kim
State‑of‑the‑art medical studies proved that predicting CYP450 enzyme inhibitors is beneficial in the early stage of drug discovery. However, accurate machine learning-based (ML) in silico methods for predicting CYP450 inhibitors remains challenging. Here, we introduce GTransCYPs, an improved graph neural network (GNN) with a transformer mechanism for predicting CYP450 inhibitors. This model significantly
-
A comprehensive comparison of deep learning-based compound-target interaction prediction models to unveil guiding design principles J. Cheminfom. (IF 7.1) Pub Date : 2024-10-28 Sina Abdollahi, Darius P. Schaub, Madalena Barroso, Nora C. Laubach, Wiebke Hutwelker, Ulf Panzer, S.øren W. Gersting, Stefan Bonn
The evaluation of compound-target interactions (CTIs) is at the heart of drug discovery efforts. Given the substantial time and monetary costs of classical experimental screening, significant efforts have been dedicated to develop deep learning-based models that can accurately predict CTIs. A comprehensive comparison of these models on a large, curated CTI dataset is, however, still lacking. Here,
-
Towards the prediction of drug solubility in binary solvent mixtures at various temperatures using machine learning J. Cheminfom. (IF 7.1) Pub Date : 2024-10-28 Zeqing Bao, Gary Tom, Austin Cheng, Jeffrey Watchorn, Alán Aspuru-Guzik, Christine Allen
Drug solubility is an important parameter in the drug development process, yet it is often tedious and challenging to measure, especially for expensive drugs or those available in small quantities. To alleviate these challenges, machine learning (ML) has been applied to predict drug solubility as an alternative approach. However, the majority of existing ML research has focused on the predictions of
-
MEF-AlloSite: an accurate and robust Multimodel Ensemble Feature selection for the Allosteric Site identification model J. Cheminfom. (IF 7.1) Pub Date : 2024-10-23 Sadettin Y. Ugurlu, David McDonald, Shan He
A crucial mechanism for controlling the actions of proteins is allostery. Allosteric modulators have the potential to provide many benefits compared to orthosteric ligands, such as increased selectivity and saturability of their effect. The identification of new allosteric sites presents prospects for the creation of innovative medications and enhances our comprehension of fundamental biological mechanisms
-
Graph neural processes for molecules: an evaluation on docking scores and strategies to improve generalization J. Cheminfom. (IF 7.1) Pub Date : 2024-10-23 Miguel García-Ortegón, Srijit Seal, Carl Rasmussen, Andreas Bender, Sergio Bacallado
Neural processes (NPs) are models for meta-learning which output uncertainty estimates. So far, most studies of NPs have focused on low-dimensional datasets of highly-correlated tasks. While these homogeneous datasets are useful for benchmarking, they may not be representative of realistic transfer learning. In particular, applications in scientific research may prove especially challenging due to
-
Large-scale annotation of biochemically relevant pockets and tunnels in cognate enzyme–ligand complexes J. Cheminfom. (IF 7.1) Pub Date : 2024-10-15 O. Vavra, J. Tyzack, F. Haddadi, J. Stourac, J. Damborsky, S. Mazurenko, J. M. Thornton, D. Bednar
Tunnels in enzymes with buried active sites are key structural features allowing the entry of substrates and the release of products, thus contributing to the catalytic efficiency. Targeting the bottlenecks of protein tunnels is also a powerful protein engineering strategy. However, the identification of functional tunnels in multiple protein structures is a non-trivial task that can only be addressed
-
Insights into predicting small molecule retention times in liquid chromatography using deep learning J. Cheminfom. (IF 7.1) Pub Date : 2024-10-07 Yuting Liu, Akiyasu C. Yoshizawa, Yiwei Ling, Shujiro Okuda
In untargeted metabolomics, structures of small molecules are annotated using liquid chromatography-mass spectrometry by leveraging information from the molecular retention time (RT) in the chromatogram and m/z (formerly called ''mass-to-charge ratio'') in the mass spectrum. However, correct identification of metabolites is challenging due to the vast array of small molecules. Therefore, various in
-
Data mining of PubChem bioassay records reveals diverse OXPHOS inhibitory chemotypes as potential therapeutic agents against ovarian cancer J. Cheminfom. (IF 7.1) Pub Date : 2024-10-07 Sejal Sharma, Liping Feng, Nicha Boonpattrawong, Arvinder Kapur, Lisa Barroilhet, Manish S. Patankar, Spencer S. Ericksen
Focused screening on target-prioritized compound sets can be an efficient alternative to high throughput screening (HTS). For most biomolecular targets, compound prioritization models depend on prior screening data or a target structure. For phenotypic or multi-protein pathway targets, it may not be clear which public assay records provide relevant data. The question also arises as to whether data
-
Bitter peptide prediction using graph neural networks J. Cheminfom. (IF 7.1) Pub Date : 2024-10-07 Prashant Srivastava, Alexandra Steuer, Francesco Ferri, Alessandro Nicoli, Kristian Schultz, Saptarshi Bej, Antonella Di Pizio, Olaf Wolkenhauer
Bitter taste is an unpleasant taste modality that affects food consumption. Bitter peptides are generated during enzymatic processes that produce functional, bioactive protein hydrolysates or during the aging process of fermented products such as cheese, soybean protein, and wine. Understanding the underlying peptide sequences responsible for bitter taste can pave the way for more efficient identification
-
A multi-view feature representation for predicting drugs combination synergy based on ensemble and multi-task attention models J. Cheminfom. (IF 7.1) Pub Date : 2024-09-27 Samar Monem, Aboul Ella Hassanien, Alaa H. Abdel-Hamid
This paper proposes a novel multi-view ensemble predictor model that is designed to address the challenge of determining synergistic drug combinations by predicting both the synergy score value values and synergy class label of drug combinations with cancer cell lines. The proposed methodology involves representing drug features through four distinct views: Simplified Molecular-Input Line-Entry System
-
Combining graph neural networks and transformers for few-shot nuclear receptor binding activity prediction J. Cheminfom. (IF 7.1) Pub Date : 2024-09-27 Luis H. M. Torres, Joel P. Arrais, Bernardete Ribeiro
Nuclear receptors (NRs) play a crucial role as biological targets in drug discovery. However, determining which compounds can act as endocrine disruptors and modulate the function of NRs with a reduced amount of candidate drugs is a challenging task. Moreover, the computational methods for NR-binding activity prediction mostly focus on a single receptor at a time, which may limit their effectiveness
-
Computer-aided pattern scoring (C@PS): a novel cheminformatic workflow to predict ligands with rare modes-of-action J. Cheminfom. (IF 7.1) Pub Date : 2024-09-23 Sven Marcel Stefan, Katja Stefan, Vigneshwaran Namasivayam
The identification, establishment, and exploration of potential pharmacological drug targets are major steps of the drug development pipeline. Target validation requires diverse chemical tools that come with a spectrum of functionality, e.g., inhibitors, activators, and other modulators. Particularly tools with rare modes-of-action allow for a proper kinetic and functional characterization of the targets-of-interest
-
EC-Conf: A ultra-fast diffusion model for molecular conformation generation with equivariant consistency J. Cheminfom. (IF 7.1) Pub Date : 2024-09-03 Zhiguang Fan, Yuedong Yang, Mingyuan Xu, Hongming Chen
Despite recent advancement in 3D molecule conformation generation driven by diffusion models, its high computational cost in iterative diffusion/denoising process limits its application. Here, an equivariant consistency model (EC-Conf) was proposed as a fast diffusion method for low-energy conformation generation. In EC-Conf, a modified SE (3)-equivariant transformer model was directly used to encode