-
Clc-db: an open-source online database of chiral ligands and catalysts J. Cheminfom. (IF 7.1) Pub Date : 2025-04-03
Gufeng Yu, Kaiwen Yu, Xi Wang, Chenxi Zhang, Yicong Luo, Xiaohong Huo, Yang YangThe design and optimization of chiral ligands and catalysts are fundamental to advancing asymmetric catalysis, a critical area in organic chemistry with wide-ranging impacts across scientific disciplines. Traditional experimental approaches, while essential, are often hindered by their slow pace and complexity. Recent advancements have demonstrated that computational methods, particularly machine learning
-
The evolution of open science in cheminformatics: a journey from closed systems to collaborative innovation J. Cheminfom. (IF 7.1) Pub Date : 2025-04-03
Christoph SteinbeckCheminformatics has significantly transformed over the past four decades, evolving from a field dominated by proprietary systems to one increasingly embracing open science principles. In its early years, cheminformatics was characterised by commercial software and restricted data access, limiting collaboration and reproducibility. The advent of open-source software in the late 1990s and early 2000s
-
Correction: APBIO: bioactive profiling of air pollutants through inferred bioactivity signatures and prediction of novel target interactions J. Cheminfom. (IF 7.1) Pub Date : 2025-04-01
Eva Viesi, Ugo Perricone, Patrick Aloy, Rosalba GiugnoCorrection: Journal of Cheminformatics (2025) 17:13 https://doi.org/10.1186/s13321-025-00961-1 Following publication of the original article [1], the authors identified the following errors: The incorrect Acknowledgements is: The authors would like to thank the ‘National Biodiversity Future Center’ (identification code CN00000033, CUP B73C21001300006) on ‘Biodiversity’, financed under the National
-
Predictive modeling of visible-light azo-photoswitches’ properties using structural features J. Cheminfom. (IF 7.1) Pub Date : 2025-04-01
Said Byadi, P. K. Hashim, Pavel SidorovIn this manuscript we present the strategy for modeling photoswitch properties (maximum absorption wavelength and thermal half-life of photoisomers) of visible-light azo-photoswitches using structural data. We compile a comprehensive data set from literature sources and perform a rigorous benchmark to select the best feature type and modeling approach. The fragment counts have demonstrated the best
-
Generate what you can make: achieving in-house synthesizability with readily available resources in de novo drug design J. Cheminfom. (IF 7.1) Pub Date : 2025-03-28
Alan Kai Hassen, Martin Šícho, Yorick J. van Aalst, Mirjam C. W. Huizenga, Darcy N. R. Reynolds, Sohvi Luukkonen, Andrius Bernatavicius, Djork-Arné Clevert, Antonius P. A. Janssen, Gerard J. P. van Westen, Mike PreussComputer-Aided Synthesis Planning (CASP) and CASP-based approximated synthesizability scores have rarely been used as generation objectives in Computer-Aided Drug Design despite facilitating the in-silico generation of synthesizable molecules. However, these synthesizability approaches are disconnected from the reality of small laboratory drug design, where building block resources are limited, thus
-
Three pillars for ensuring public access and integrity of chemical databases powering cheminformatics J. Cheminfom. (IF 7.1) Pub Date : 2025-03-28
Antony J. Williams, Ann M. RichardSince the inception of the Internet, public databases disseminating chemistry data to the community have proliferated and helped to support and encourage a burgeoning interest in cheminformatics. This has been supported by a shift in open science, exemplified by Open Data, Open Source, and Open Standards (ODOSOS) for chemistry [1], as well as by the increasing sophistication and availability of free
-
Protecting your skin: a highly accurate LSTM network integrating conjoint features for predicting chemical-induced skin irritation J. Cheminfom. (IF 7.1) Pub Date : 2025-03-27
Huynh Anh Duy, Tarapong SrisongkramSkin irritation is a significant adverse effect associated with chemicals and drug substances. Quantitative structure-activity relationship (QSAR) is an alternative method bypassing in vivo assay for filling data gaps in chemical risk assessment. In this study, we developed QSAR models based on recurrent neural networks (RNNs) to classify skin irritation caused by chemical compounds. We utilized chemical
-
Publishing neural networks in drug discovery might compromise training data privacy J. Cheminfom. (IF 7.1) Pub Date : 2025-03-26
Fabian P. Krüger, Johan Östman, Lewis Mervin, Igor V. Tetko, Ola EngkvistThis study investigates the risks of exposing confidential chemical structures when machine learning models trained on these structures are made publicly available. We use membership inference attacks, a common method to assess privacy that is largely unexplored in the context of drug discovery, to examine neural networks for molecular property prediction in a black-box setting. Our results reveal
-
A unified approach to inferring chemical compounds with the desired aqueous solubility J. Cheminfom. (IF 7.1) Pub Date : 2025-03-26
Muniba Batool, Naveed Ahmed Azam, Jianshen Zhu, Kazuya Haraguchi, Liang Zhao, Tatsuya AkutsuAqueous solubility (AS) is a key physiochemical property that plays a crucial role in drug discovery and material design. We report a novel unified approach to predict and infer chemical compounds with the desired AS based on simple deterministic graph-theoretic descriptors, multiple linear regression (MLR), and mixed integer linear programming (MILP). Selected descriptors based on a forward stepwise
-
Large language models open new way of AI-assisted molecule design for chemists J. Cheminfom. (IF 7.1) Pub Date : 2025-03-24
Shoichi Ishida, Tomohiro Sato, Teruki Honma, Kei TerayamaRecent advancements in artificial intelligence (AI)-based molecular design methodologies have offered synthetic chemists new ways to design functional molecules with their desired properties. While various AI-based molecule generators have significantly advanced toward practical applications, their effective use still requires specialized knowledge and skills concerning AI techniques. Here, we develop
-
An interpretable deep geometric learning model to predict the effects of mutations on protein–protein interactions using large-scale protein language model J. Cheminfom. (IF 7.1) Pub Date : 2025-03-21
Caiya Zhang, Yan Sun, Pingzhao HuProtein–protein interactions (PPIs) are central to the mechanisms of signaling pathways and immune responses, which can help us understand disease etiology. Therefore, there is a significant need for efficient and rapid automated approaches to predict changes in PPIs. In recent years, there has been a significant increase in applying deep learning techniques to predict changes in binding affinity between
-
Anticipating protein evolution with successor sequence predictor J. Cheminfom. (IF 7.1) Pub Date : 2025-03-21
Rayyan Tariq Khan, Pavel Kohout, Milos Musil, Monika Rosinska, Jiri Damborsky, Stanislav Mazurenko, David BednarThe quest to predict and understand protein evolution has been hindered by limitations on both the theoretical and the experimental fronts. Most existing theoretical models of evolution are descriptive, rather than predictive, leaving the final modifications in the hands of researchers. Existing experimental techniques to help probe the evolutionary sequence space of proteins, such as directed evolution
-
The specification game: rethinking the evaluation of drug response prediction for precision oncology J. Cheminfom. (IF 7.1) Pub Date : 2025-03-14
Francesco Codicè, Corrado Pancotti, Cesare Rollo, Yves Moreau, Piero Fariselli, Daniele RaimondiPrecision oncology plays a pivotal role in contemporary healthcare, aiming to optimize treatments for each patient based on their unique characteristics. This objective has spurred the emergence of various cancer cell line drug response datasets, driven by the need to facilitate pre-clinical studies by exploring the impact of multi-omics data on drug response. Despite the proliferation of machine learning
-
Fifteen years of ChEMBL and its role in cheminformatics and drug discovery J. Cheminfom. (IF 7.1) Pub Date : 2025-03-10
Barbara ZdrazilIn October 2024 we celebrated the 15th anniversary of the first launch of ChEMBL, Europe’s most impactful, open-access drug discovery database, hosted by EMBL’s European Bioinformatics Institute (EMBL-EBI). This is a good moment to reflect on ChEMBL’s history, the role that ChEMBL plays in Cheminformatics and Drug Discovery as well as innovations accelerated using data extracted from it. The review
-
Accelerating the inference of string generation-based chemical reaction models for industrial applications J. Cheminfom. (IF 7.1) Pub Date : 2025-03-10
Mikhail Andronov, Natalia Andronova, Michael Wand, Jürgen Schmidhuber, Djork-Arné ClevertTransformer-based, template-free SMILES-to-SMILES translation models for reaction prediction and single-step retrosynthesis are of interest to computer-aided synthesis planning systems, as they offer state-of-the-art accuracy. However, their slow inference speed limits their practical utility in such applications. To address this challenge, we propose speculative decoding with a simple chemically specific
-
CardioGenAI: a machine learning-based framework for re-engineering drugs for reduced hERG liability J. Cheminfom. (IF 7.1) Pub Date : 2025-03-05
Gregory W. Kyro, Matthew T. Martin, Eric D. Watt, Victor S. BatistaThe link between in vitro hERG ion channel inhibition and subsequent in vivo QT interval prolongation, a critical risk factor for the development of arrythmias such as Torsade de Pointes, is so well established that in vitro hERG activity alone is often sufficient to end the development of an otherwise promising drug candidate. It is therefore of tremendous interest to develop advanced methods for
-
Achieving well-informed decision-making in drug discovery: a comprehensive calibration study using neural network-based structure-activity models J. Cheminfom. (IF 7.1) Pub Date : 2025-03-05
Hannah Rosa Friesacher, Ola Engkvist, Lewis Mervin, Yves Moreau, Adam AranyIn the drug discovery process, where experiments can be costly and time-consuming, computational models that predict drug-target interactions are valuable tools to accelerate the development of new therapeutic agents. Estimating the uncertainty inherent in these neural network predictions provides valuable information that facilitates optimal decision-making when risk assessment is crucial. However
-
GNINA 1.3: the next increment in molecular docking with deep learning J. Cheminfom. (IF 7.1) Pub Date : 2025-03-02
Andrew T. McNutt, Yanjing Li, Rocco Meli, Rishal Aggarwal, David Ryan KoesComputer-aided drug design has the potential to significantly reduce the astronomical costs of drug development, and molecular docking plays a prominent role in this process. Molecular docking is an in silico technique that predicts the bound 3D conformations of two molecules, a necessary step for other structure-based methods. Here, we describe version 1.3 of the open-source molecular docking software
-
Syn-MolOpt: a synthesis planning-driven molecular optimization method using data-derived functional reaction templates J. Cheminfom. (IF 7.1) Pub Date : 2025-03-02
Xiaodan Yin, Xiaorui Wang, Zhenxing Wu, Qin Li, Yu Kang, Yafeng Deng, Pei Luo, Huanxiang Liu, Guqin Shi, Zheng Wang, Xiaojun Yao, Chang-Yu Hsieh, Tingjun HouMolecular optimization is a crucial step in drug development, involving structural modifications to improve the desired properties of drug candidates. Although many deep-learning-based molecular optimization algorithms have been proposed and may perform well on benchmarks, they usually do not pay sufficient attention to the synthesizability of molecules, resulting in optimized compounds difficult to
-
Improving route development using convergent retrosynthesis planning J. Cheminfom. (IF 7.1) Pub Date : 2025-02-27
Paula Torren-Peraire, Jonas Verhoeven, Dorota Herman, Hugo Ceulemans, Igor V. Tetko, Jörg K. WegnerRetrosynthesis consists of recursively breaking down a target molecule to produce a synthesis route composed of readily accessible building blocks. In recent years, computer-aided synthesis planning methods have allowed a greater exploration of potential synthesis routes, combining state-of-the-art machine-learning methods with chemical knowledge. However, these methods are generally developed to produce
-
Pretraining graph transformers with atom-in-a-molecule quantum properties for improved ADMET modeling J. Cheminfom. (IF 7.1) Pub Date : 2025-02-27
Alessio Fallani, Ramil Nugmanov, Jose Arjona-Medina, Jörg Kurt Wegner, Alexandre Tkatchenko, Kostiantyn ChernichenkoWe evaluate the impact of pretraining Graph Transformer architectures on atom-level quantum-mechanical features for the modeling of absorption, distribution, metabolism, excretion, and toxicity (ADMET) properties of drug-like compounds. We compare this pretraining strategy with two others: one based on molecular quantum properties (specifically the HOMO-LUMO gap) and one using a self-supervised atom
-
Infrared spectrum analysis of organic molecules with neural networks using standard reference data sets in combination with real-world data J. Cheminfom. (IF 7.1) Pub Date : 2025-02-26
Dev Punjabi, Yu-Chieh Huang, Laura Holzhauer, Pierre Tremouilhac, Pascal Friederich, Nicole Jung, Stefan BräseIn this study, we propose a neural network- based approach to analyze IR spectra and detect the presence of functional groups. Our neural network architecture is based on the concept of learning split representations. We demonstrate that our method achieves favorable validation performance using the NIST dataset. Furthermore, by incorporating additional data from the open-access research data repository
-
DrugDiff: small molecule diffusion model with flexible guidance towards molecular properties J. Cheminfom. (IF 7.1) Pub Date : 2025-02-25
Marie Oestreich, Erinc Merdivan, Michael Lee, Joachim L. Schultze, Marie Piraud, Matthias BeckerWith the cost/yield-ratio of drug development becoming increasingly unfavourable, recent work has explored machine learning to accelerate early stages of the development process. Given the current success of deep generative models across domains, we here investigated their application to the property-based proposal of new small molecules for drug development. Specifically, we trained a latent diffusion
-
kMoL: an open-source machine and federated learning library for drug discovery J. Cheminfom. (IF 7.1) Pub Date : 2025-02-25
Romeo Cozac, Haris Hasic, Jun Jin Choong, Vincent Richard, Loic Beheshti, Cyrille Froehlich, Takuto Koyama, Shigeyuki Matsumoto, Ryosuke Kojima, Hiroaki Iwata, Aki Hasegawa, Takao Otsuka, Yasushi OkunoMachine learning is quickly becoming integral to drug discovery pipelines, particularly quantitative structure-activity relationship (QSAR) and absorption, distribution, metabolism, and excretion (ADME) tasks. Graph Convolutional Network (GCN) models have proven especially promising due to their inherent ability to model molecular structures using graph-based representations. However, maximizing the
-
Predictive modeling of biodegradation pathways using transformer architectures J. Cheminfom. (IF 7.1) Pub Date : 2025-02-17
Liam Brydon, Kunyang Zhang, Gillian Dobbie, Katerina Taškova, Jörg Simon WickerIn recent years, the integration of machine learning techniques into chemical reaction product prediction has opened new avenues for understanding and predicting the behaviour of chemical substances. The necessity for such predictive methods stems from the growing regulatory and social awareness of the environmental consequences associated with the persistence and accumulation of chemical residues
-
ROASMI: accelerating small molecule identification by repurposing retention data J. Cheminfom. (IF 7.1) Pub Date : 2025-02-14
Fang-Yuan Sun, Ying-Hao Yin, Hui-Jun Liu, Lu-Na Shen, Xiu-Lin Kang, Gui-Zhong Xin, Li-Fang Liu, Jia-Yi ZhengThe limited replicability of retention data hinders its application in untargeted metabolomics for small molecule identification. While retention order models hold promise in addressing this issue, their predictive reliability is limited by uncertain generalizability. Here, we present the ROASMI model, which enables reliable prediction of retention order within a well-defined application domain by
-
FluoBase: a fluorinated agents database J. Cheminfom. (IF 7.1) Pub Date : 2025-02-11
Rafal Mulka, Dan Su, Wen-Shuo Huang, Li Zhang, Huaihai Huang, Xiaoyu Lai, Yao Li, Xiao-Song XueOrganofluorine compounds, owing to their unique physicochemical properties, play an increasingly crucial role in fields such as medicine, pesticides, and advanced materials. Fluorinated reagents are indispensable for developing efficient synthetic methods for organofluorine compounds and serve as the cornerstone of organofluorine chemistry. Equally important are fluorinated functional molecules, which
-
Barlow Twins deep neural network for advanced 1D drug–target interaction prediction J. Cheminfom. (IF 7.1) Pub Date : 2025-02-05
Maximilian G. Schuh, Davide Boldini, Annkathrin I. Bohne, Stephan A. SieberAccurate prediction of drug–target interactions is critical for advancing drug discovery. By reducing time and cost, machine learning and deep learning can accelerate this laborious discovery process. In a novel approach, BarlowDTI, we utilise the powerful Barlow Twins architecture for feature-extraction while considering the structure of the target protein. Our method achieves state-of-the-art predictive
-
Positional embeddings and zero-shot learning using BERT for molecular-property prediction J. Cheminfom. (IF 7.1) Pub Date : 2025-02-05
Medard Edmund Mswahili, JunHa Hwang, Jagath C. Rajapakse, Kyuri Jo, Young-Seob JeongRecently, advancements in cheminformatics such as representation learning for chemical structures, deep learning (DL) for property prediction, data-driven discovery, and optimization of chemical data handling, have led to increased demands for handling chemical simplified molecular input line entry system (SMILES) data, particularly in text analysis tasks. These advancements have driven the need to
-
Improving drug repositioning with negative data labeling using large language models J. Cheminfom. (IF 7.1) Pub Date : 2025-02-04
Milan Picard, Mickael Leclercq, Antoine Bodein, Marie Pier Scott-Boyer, Olivier Perin, Arnaud DroitDrug repositioning offers numerous advantages, such as faster development timelines, reduced costs, and lower failure rates in drug development. Supervised machine learning is commonly used to score drug candidates but is hindered by the lack of reliable negative data—drugs that fail due to inefficacy or toxicity— which is difficult to access, lowering their prediction accuracy and generalization.
-
PretoxTM: a text mining system for extracting treatment-related findings from preclinical toxicology reports J. Cheminfom. (IF 7.1) Pub Date : 2025-02-03
Javier Corvi, Nicolás Díaz-Roussel, José M. Fernández, Francesco Ronzano, Emilio Centeno, Pablo Accuosto, Celine Ibrahim, Shoji Asakura, Frank Bringezu, Mirjam Fröhlicher, Annika Kreuchwig, Yoko Nogami, Jeong Rih, Raul Rodriguez-Esteban, Nicolas Sajot, Joerg Wichard, Heng-Yi Michael Wu, Philip Drew, Thomas Steger-Hartmann, Alfonso Valencia, Laura I. Furlong, Salvador Capella-GutierrezOver the last few decades the pharmaceutical industry has generated a vast corpus of knowledge on the safety and efficacy of drugs. Much of this information is contained in toxicology reports, which summarise the results of animal studies designed to analyse the effects of the tested compound, including unintended pharmacological and toxic effects, known as treatment-related findings. Despite the potential
-
MLinvitroTox reloaded for high-throughput hazard-based prioritization of high-resolution mass spectrometry data J. Cheminfom. (IF 7.1) Pub Date : 2025-01-31
Katarzyna Arturi, Eliza J. Harris, Lilian Gasser, Beate I. Escher, Georg Braun, Robin Bosshard, Juliane HollenderMLinvitroTox is an automated Python pipeline developed for high-throughput hazard-driven prioritization of toxicologically relevant signals detected in complex environmental samples through high-resolution tandem mass spectrometry (HRMS/MS). MLinvitroTox is a machine learning (ML) framework comprising 490 independent XGBoost classifiers trained on molecular fingerprints from chemical structures and
-
APBIO: bioactive profiling of air pollutants through inferred bioactivity signatures and prediction of novel target interactions J. Cheminfom. (IF 7.1) Pub Date : 2025-01-31
Eva Viesi, Ugo Perricone, Patrick Aloy, Rosalba GiugnoMore sophisticated representations of compounds attempt to incorporate not only information on the structure and physicochemical properties of molecules, but also knowledge about their biological traits, leading to the so-called bioactivity profile. The bioactive profiling of air pollutants is challenging and crucial, as their biological activity and toxicological effects have not been deeply investigated
-
AiGPro: a multi-tasks model for profiling of GPCRs for agonist and antagonist J. Cheminfom. (IF 7.1) Pub Date : 2025-01-29
Rahul Brahma, Sunghyun Moon, Jae-Min Shin, Kwang-Hwi ChoG protein-coupled receptors (GPCRs) play vital roles in various physiological processes, making them attractive drug discovery targets. Meanwhile, deep learning techniques have revolutionized drug discovery by facilitating efficient tools for expediting the identification and optimization of ligands. However, existing models for the GPCRs often focus on single-target or a small subset of GPCRs or employ
-
hERGAT: predicting hERG blockers using graph attention mechanism through atom- and molecule-level interaction analyses J. Cheminfom. (IF 7.1) Pub Date : 2025-01-28
Dohyeon Lee, Sunyong YooThe human ether-a-go-go-related gene (hERG) channel plays a critical role in the electrical activity of the heart, and its blockers can cause serious cardiotoxic effects. Thus, screening for hERG channel blockers is a crucial step in the drug development process. Many in silico models have been developed to predict hERG blockers, which can efficiently save time and resources. However, previous methods
-
The algebraic extended atom-type graph-based model for precise ligand–receptor binding affinity prediction J. Cheminfom. (IF 7.1) Pub Date : 2025-01-22
Farjana Tasnim Mukta, Md Masud Rana, Avery Meyer, Sally Ellingson, Duc D. NguyenAccurate prediction of ligand-receptor binding affinity is crucial in structure-based drug design, significantly impacting the development of effective drugs. Recent advances in machine learning (ML)–based scoring functions have improved these predictions, yet challenges remain in modeling complex molecular interactions. This study introduces the AGL-EAT-Score, a scoring function that integrates extended
-
StreamChol: a web-based application for predicting cholestasis J. Cheminfom. (IF 7.1) Pub Date : 2025-01-21
Pablo Rodríguez-Belenguer, Emilio Soria-Olivas, Manuel PastorThis article introduces StreamChol, a software for developing and applying mechanistic models to predict cholestasis. StreamChol is a Streamlit application, usable as a desktop application or web-accessible software when installed on a server using a docker container. StreamChol allows a seamless integration of pharmacokinetic analyses with Machine Learning models. This integration not only enables
-
Matched pairs demonstrate robustness against inter-assay variability J. Cheminfom. (IF 7.1) Pub Date : 2025-01-20
Jochem Nelen, Horacio Pérez-Sánchez, Hans De Winter, Dries Van RompaeyMachine learning models for chemistry require large datasets, often compiled by combining data from multiple assays. However, combining data without careful curation can introduce significant noise. While absolute values from different assays are rarely comparable, trends or differences between compounds are often assumed to be consistent. This study evaluates that assumption by analyzing potency differences
-
One size does not fit all: revising traditional paradigms for assessing accuracy of QSAR models used for virtual screening J. Cheminfom. (IF 7.1) Pub Date : 2025-01-16
James Wellnitz, Sankalp Jain, Joshua E. Hochuli, Travis Maxfield, Eugene N. Muratov, Alexander Tropsha, Alexey V. ZakharovTraditional best practices for quantitative structure activity relationship (QSAR) modeling recommend dataset balancing and balanced accuracy (BA) as the key desired objective of model development. This study explores the value of the conventional norms in the context of using QSAR models for virtual screening of modern large and ultra-large chemical libraries. For this increasingly common task, we
-
Chemical space as a unifying theme for chemistry J. Cheminfom. (IF 7.1) Pub Date : 2025-01-16
Jean-Louis ReymondChemistry has diversified from a basic understanding of the elements to studying millions of highly diverse molecules and materials, which together are conceptualized as the chemical space. A map of this chemical space where distances represent similarities between compounds can represent the mutual relationships between different subfields of chemistry and help the discipline to be viewed and understood
-
Context-dependent similarity analysis of analogue series for structure–activity relationship transfer based on a concept from natural language processing J. Cheminfom. (IF 7.1) Pub Date : 2025-01-15
Atsushi Yoshimori, Jürgen BajorathAnalogue series (AS) are generated during compound optimization in medicinal chemistry and are the major source of structure–activity relationship (SAR) information. Pairs of active AS consisting of compounds with corresponding substituents and comparable potency progression represent SAR transfer events for the same target or across different targets. We report a new computational approach to systematically
-
Fragmenstein: predicting protein–ligand structures of compounds derived from known crystallographic fragment hits using a strict conserved-binding–based methodology J. Cheminfom. (IF 7.1) Pub Date : 2025-01-13
Matteo P. Ferla, Rubén Sánchez-García, Rachael E. Skyner, Stefan Gahbauer, Jenny C. Taylor, Frank von Delft, Brian D. Marsden, Charlotte M. DeaneCurrent strategies centred on either merging or linking initial hits from fragment-based drug design (FBDD) crystallographic screens generally do not fully leaverage 3D structural information. We show that an algorithmic approach (Fragmenstein) that ‘stitches’ the ligand atoms from this structural information together can provide more accurate and reliable predictions for protein–ligand complex conformation
-
ADMET evaluation in drug discovery: 21. Application and industrial validation of machine learning algorithms for Caco-2 permeability prediction J. Cheminfom. (IF 7.1) Pub Date : 2025-01-10
Dong Wang, Jieyu Jin, Guqin Shi, Jingxiao Bao, Zheng Wang, Shimeng Li, Peichen Pan, Dan Li, Yu Kang, Tingjun HouThe Caco-2 cell model has been widely used to assess the intestinal permeability of drug candidates in vitro, owing to its morphological and functional similarity to human enterocytes. While Caco-2 cell assay is considered safe and cost-effective, it is also characterized by being time-consuming. Therefore, computational models that achieve high accuracies in predicting Caco-2 permeability are crucial
-
CLAIRE: a contrastive learning-based predictor for EC number of chemical reactions J. Cheminfom. (IF 7.1) Pub Date : 2025-01-07
Zishuo Zeng, Jin Guo, Jiao Jin, Xiaozhou LuoPredicting EC numbers for chemical reactions enables efficient enzymatic annotations for computer-aided synthesis planning. However, conventional machine learning approaches encounter challenges due to data scarcity and class imbalance. Here, we introduce CLAIRE (Contrastive Learning-based AnnotatIon for Reaction’s EC), a novel framework leveraging contrastive learning, pre-trained language model-based
-
Prediction of Pt, Ir, Ru, and Rh complexes light absorption in the therapeutic window for phototherapy using machine learning J. Cheminfom. (IF 7.1) Pub Date : 2025-01-05
V. Vigna, T. F. G. G. Cova, A. A. C. C. Pais, E. SiciliaEffective light-based cancer treatments, such as photodynamic therapy (PDT) and photoactivated chemotherapy (PACT), rely on compounds that are activated by light efficiently, and absorb within the therapeutic window (600–850 nm). Traditional prediction methods for these light absorption properties, including Time-Dependent Density Functional Theory (TDDFT), are often computationally intensive and time-consuming
-
DeepTGIN: a novel hybrid multimodal approach using transformers and graph isomorphism networks for protein-ligand binding affinity prediction J. Cheminfom. (IF 7.1) Pub Date : 2024-12-29
Guishen Wang, Hangchen Zhang, Mengting Shao, Yuncong Feng, Chen Cao, Xiaowen HuPredicting protein-ligand binding affinity is essential for understanding protein-ligand interactions and advancing drug discovery. Recent research has demonstrated the advantages of sequence-based models and graph-based models. In this study, we present a novel hybrid multimodal approach, DeepTGIN, which integrates transformers and graph isomorphism networks to predict protein-ligand binding affinity
-
STOUT V2.0: SMILES to IUPAC name conversion using transformer models J. Cheminfom. (IF 7.1) Pub Date : 2024-12-27
Kohulan Rajan, Achim Zielesny, Christoph SteinbeckNaming chemical compounds systematically is a complex task governed by a set of rules established by the International Union of Pure and Applied Chemistry (IUPAC). These rules are universal and widely accepted by chemists worldwide, but their complexity makes it challenging for individuals to consistently apply them accurately. A translation method can be employed to address this challenge. Accurate
-
Comprehensive benchmarking of computational tools for predicting toxicokinetic and physicochemical properties of chemicals J. Cheminfom. (IF 7.1) Pub Date : 2024-12-26
Domenico Gadaleta, Eva Serrano-Candelas, Rita Ortega-Vallbona, Erika Colombo, Marina Garcia de Lomana, Giada Biava, Pablo Aparicio-Sánchez, Alessandra Roncaglioni, Rafael Gozalbes, Emilio BenfenatiEnsuring the safety of chemicals for environmental and human health involves assessing physicochemical (PC) and toxicokinetic (TK) properties, which are crucial for absorption, distribution, metabolism, excretion, and toxicity (ADMET). Computational methods play a vital role in predicting these properties, given the current trends in reducing experimental approaches, especially those that involve animal
-
Correction: StreaMD: the toolkit for high-throughput molecular dynamics simulations J. Cheminfom. (IF 7.1) Pub Date : 2024-12-23
Aleksandra Ivanova, Olena Mokshyna, Pavel PolishchukCorrection: Journal of Cheminformatics (2024) 16:123 https://doi.org/10.1186/s13321-024-00918-w Following publication of the original article [1], the authors identified that section Availability and requirements is missing. Availability and requirements Project name: StreaMD GitHub: https://github.com/ci-lab-cz/streamd Operating system(s): Linux Programming language: Python 3 Other requirements: GROMACS
-
AttenhERG: a reliable and interpretable graph neural network framework for predicting hERG channel blockers J. Cheminfom. (IF 7.1) Pub Date : 2024-12-23
Tianbiao Yang, Xiaoyu Ding, Elizabeth McMichael, Frank W. Pun, Alex Aliper, Feng Ren, Alex Zhavoronkov, Xiao DingCardiotoxicity, particularly drug-induced arrhythmias, poses a significant challenge in drug development, highlighting the importance of early-stage prediction of human ether-a-go-go-related gene (hERG) toxicity. hERG encodes the pore-forming subunit of the cardiac potassium channel. Traditional methods are both costly and time-intensive, necessitating the development of computational approaches. In
-
Interface-aware molecular generative framework for protein–protein interaction modulators J. Cheminfom. (IF 7.1) Pub Date : 2024-12-20
Jianmin Wang, Jiashun Mao, Chunyan Li, Hongxin Xiang, Xun Wang, Shuang Wang, Zixu Wang, Yangyang Chen, Yuquan Li, Kyoung Tai No, Tao Song, Xiangxiang ZengProtein–protein interactions (PPIs) play a crucial role in numerous biochemical and biological processes. Although several structure-based molecular generative models have been developed, PPI interfaces and compounds targeting PPIs exhibit distinct physicochemical properties compared to traditional binding pockets and small-molecule drugs. As a result, generating compounds that effectively target PPIs
-
MolNexTR: a generalized deep learning model for molecular image recognition J. Cheminfom. (IF 7.1) Pub Date : 2024-12-18
Yufan Chen, Ching Ting Leung, Yong Huang, Jianwei Sun, Hao Chen, Hanyu GaoIn the field of chemical structure recognition, the task of converting molecular images into machine-readable data formats such as SMILES string stands as a significant challenge, primarily due to the varied drawing styles and conventions prevalent in chemical literature. To bridge this gap, we proposed MolNexTR, a novel image-to-graph deep learning model that collaborates to fuse the strengths of
-
FlavorMiner: a machine learning platform for extracting molecular flavor profiles from structural data J. Cheminfom. (IF 7.1) Pub Date : 2024-12-10
Fabio Herrera-Rocha, Miguel Fernández-Niño, Jorge Duitama, Mónica P. Cala, María José Chica, Ludger A. Wessjohann, Mehdi D. Davari, Andrés Fernando González BarriosFlavor is the main factor driving consumers acceptance of food products. However, tracking the biochemistry of flavor is a formidable challenge due to the complexity of food composition. Current methodologies for linking individual molecules to flavor in foods and beverages are expensive and time-consuming. Predictive models based on machine learning (ML) are emerging as an alternative to speed up
-
Be aware of overfitting by hyperparameter optimization! J. Cheminfom. (IF 7.1) Pub Date : 2024-12-09
Igor V. Tetko, Ruud van Deursen, Guillaume GodinHyperparameter optimization is very frequently employed in machine learning. However, an optimization of a large space of parameters could result in overfitting of models. In recent studies on solubility prediction the authors collected seven thermodynamic and kinetic solubility datasets from different data sources. They used state-of-the-art graph-based methods and compared models developed for each
-
Human-in-the-loop active learning for goal-oriented molecule generation J. Cheminfom. (IF 7.1) Pub Date : 2024-12-09
Yasmine Nahal, Janosch Menke, Julien Martinelli, Markus Heinonen, Mikhail Kabeshov, Jon Paul Janet, Eva Nittinger, Ola Engkvist, Samuel KaskiMachine learning (ML) systems have enabled the modelling of quantitative structure–property relationships (QSPR) and structure-activity relationships (QSAR) using existing experimental data to predict target properties for new molecules. These property predictors hold significant potential in accelerating drug discovery by guiding generative artificial intelligence (AI) agents to explore desired chemical
-
CSearch: chemical space search via virtual synthesis and global optimization J. Cheminfom. (IF 7.1) Pub Date : 2024-12-05
Hakjean Kim, Seongok Ryu, Nuri Jung, Jinsol Yang, Chaok SeokThe two key components of computational molecular design are virtually generating molecules and predicting the properties of these generated molecules. This study focuses on an effective method for molecular generation through virtual synthesis and global optimization of a given objective function. Using a pre-trained graph neural network (GNN) objective function to approximate the docking energies
-
Deepmol: an automated machine and deep learning framework for computational chemistry J. Cheminfom. (IF 7.1) Pub Date : 2024-12-05
João Correia, João Capela, Miguel RochaThe domain of computational chemistry has experienced a significant evolution due to the introduction of Machine Learning (ML) technologies. Despite its potential to revolutionize the field, researchers are often encumbered by obstacles, such as the complexity of selecting optimal algorithms, the automation of data pre-processing steps, the necessity for adaptive feature engineering, and the assurance
-
Sort & Slice: a simple and superior alternative to hash-based folding for extended-connectivity fingerprints J. Cheminfom. (IF 7.1) Pub Date : 2024-12-03
Markus Dablander, Thierry Hanser, Renaud Lambiotte, Garrett M. MorrisExtended-connectivity fingerprints (ECFPs) are a ubiquitous tool in current cheminformatics and molecular machine learning, and one of the most prevalent molecular feature extraction techniques used for chemical prediction. Atom features learned by graph neural networks can be aggregated to compound-level representations using a large spectrum of graph pooling methods. In contrast, sets of detected
-
cidalsDB: an AI-empowered platform for anti-pathogen therapeutics research J. Cheminfom. (IF 7.1) Pub Date : 2024-11-28
Emna Harigua-Souiai, Ons Masmoudi, Samer Makni, Rafeh Oualha, Yosser Z. Abdelkrim, Sara Hamdi, Oussama Souiai, Ikram GuizaniComputer-aided drug discovery (CADD) is nurtured by late advances in big data analytics and Artificial Intelligence (AI) towards enhanced drug discovery (DD) outcomes. In this context, reliable datasets are of utmost importance. We herein present CidalsDB a novel web server for AI-assisted DD against infectious pathogens, namely Leishmania parasites and Coronaviruses. We performed a literature search
-
Group graph: a molecular graph representation with enhanced performance, efficiency and interpretability J. Cheminfom. (IF 7.1) Pub Date : 2024-11-28
Piao-Yang Cao, Yang He, Ming-Yang Cui, Xiao-Min Zhang, Qingye Zhang, Hong-Yu ZhangThe exploration of chemical space holds promise for developing influential chemical entities. Molecular representations, which reflect features of molecular structure in silico, assist in navigating chemical space appropriately. Unlike atom-level molecular representations, such as SMILES and atom graph, which can sometimes lead to confusing interpretations about chemical substructures, substructure-level