-
EC-Conf: A ultra-fast diffusion model for molecular conformation generation with equivariant consistency J. Cheminfom. (IF 7.1) Pub Date : 2024-09-03 Zhiguang Fan, Yuedong Yang, Mingyuan Xu, Hongming Chen
Despite recent advancement in 3D molecule conformation generation driven by diffusion models, its high computational cost in iterative diffusion/denoising process limits its application. Here, an equivariant consistency model (EC-Conf) was proposed as a fast diffusion method for low-energy conformation generation. In EC-Conf, a modified SE (3)-equivariant transformer model was directly used to encode
-
RAIChU: automating the visualisation of natural product biosynthesis J. Cheminfom. (IF 7.1) Pub Date : 2024-09-03 Barbara R. Terlouw, Friederike Biermann, Sophie P. J. M. Vromans, Elham Zamani, Eric J. N. Helfrich, Marnix H. Medema
Natural products are molecules that fulfil a range of important ecological functions. Many natural products have been exploited for pharmaceutical and agricultural applications. In contrast to many other specialised metabolites, the products of modular nonribosomal peptide synthetase (NRPS) and polyketide synthase (PKS) systems can often (partially) be predicted from the DNA sequence of the biosynthetic
-
Evaluating the generalizability of graph neural networks for predicting collision cross section J. Cheminfom. (IF 7.1) Pub Date : 2024-08-29 Chloe Engler Hart, António José Preto, Shaurya Chanana, David Healey, Tobias Kind, Daniel Domingo-Fernández
Ion Mobility coupled with Mass Spectrometry (IM-MS) is a promising analytical technique that enhances molecular characterization by measuring collision cross-section (CCS) values, which are indicative of the molecular size and shape. However, the effective application of CCS values in structural analysis is still constrained by the limited availability of experimental data, necessitating the development
-
BuildAMol: a versatile Python toolkit for fragment-based molecular design J. Cheminfom. (IF 7.1) Pub Date : 2024-08-25 Noah Kleinschmidt, Thomas Lemmin
In recent years computational methods for molecular modeling have become a prime focus of computational biology and cheminformatics. Many dedicated systems exist for modeling specific classes of molecules such as proteins or small drug-like ligands. These are often heavily tailored toward the automated generation of molecular structures based on some meta-input by the user and are not intended for
-
Deep learning of multimodal networks with topological regularization for drug repositioning J. Cheminfom. (IF 7.1) Pub Date : 2024-08-23 Yuto Ohnuki, Manato Akiyama, Yasubumi Sakakibara
Computational techniques for drug-disease prediction are essential in enhancing drug discovery and repositioning. While many methods utilize multimodal networks from various biological databases, few integrate comprehensive multi-omics data, including transcriptomes, proteomes, and metabolomes. We introduce STRGNN, a novel graph deep learning approach that predicts drug-disease relationships using
-
Automatic molecular fragmentation by evolutionary optimisation J. Cheminfom. (IF 7.1) Pub Date : 2024-08-19 Fiona C. Y. Yu, Jorge L. Gálvez Vallejo, Giuseppe M. J. Barca
Molecular fragmentation is an effective suite of approaches to reduce the formal computational complexity of quantum chemistry calculations while enhancing their algorithmic parallelisability. However, the practical applicability of fragmentation techniques remains hindered by a dearth of automation and effective metrics to assess the quality of a fragmentation scheme. In this article, we present the
-
Democratizing cheminformatics: interpretable chemical grouping using an automated KNIME workflow J. Cheminfom. (IF 7.1) Pub Date : 2024-08-16 José T. Moreira-Filho, Dhruv Ranganath, Mike Conway, Charles Schmitt, Nicole Kleinstreuer, Kamel Mansouri
With the increased availability of chemical data in public databases, innovative techniques and algorithms have emerged for the analysis, exploration, visualization, and extraction of information from these data. One such technique is chemical grouping, where chemicals with common characteristics are categorized into distinct groups based on physicochemical properties, use, biological activity, or
-
Metis: a python-based user interface to collect expert feedback for generative chemistry models J. Cheminfom. (IF 7.1) Pub Date : 2024-08-14 Janosch Menke, Yasmine Nahal, Esben Jannik Bjerrum, Mikhail Kabeshov, Samuel Kaski, Ola Engkvist
One challenge that current de novo drug design models face is a disparity between the user’s expectations and the actual output of the model in practical applications. Tailoring models to better align with chemists’ implicit knowledge, expectation and preferences is key to overcoming this obstacle effectively. While interest in preference-based and human-in-the-loop machine learning in chemistry is
-
Geometric deep learning for molecular property predictions with chemical accuracy across chemical space J. Cheminfom. (IF 7.1) Pub Date : 2024-08-13 Maarten R. Dobbelaere, István Lengyel, Christian V. Stevens, Kevin M. Van Geem
Chemical engineers heavily rely on precise knowledge of physicochemical properties to model chemical processes. Despite the growing popularity of deep learning, it is only rarely applied for property prediction due to data scarcity and limited accuracy for compounds in industrially-relevant areas of the chemical space. Herein, we present a geometric deep learning framework for predicting gas- and liquid-phase
-
MolCompass: multi-tool for the navigation in chemical space and visual validation of QSAR/QSPR models J. Cheminfom. (IF 7.1) Pub Date : 2024-08-12 Sergey Sosnin
The exponential growth of data is challenging for humans because their ability to analyze data is limited. Especially in chemistry, there is a demand for tools that can visualize molecular datasets in a convenient graphical way. We propose a new, ready-to-use, multi-tool, and open-source framework for visualizing and navigating chemical space. This framework adheres to the low-code/no-code (LCNC) paradigm
-
Building shape-focused pharmacophore models for effective docking screening J. Cheminfom. (IF 7.1) Pub Date : 2024-08-09 Paola Moyano-Gómez, Jukka V. Lehtonen, Olli T. Pentikäinen, Pekka A. Postila
The performance of molecular docking can be improved by comparing the shape similarity of the flexibly sampled poses against the target proteins’ inverted binding cavities. The effectiveness of these pseudo-ligands or negative image-based models in docking rescoring is boosted further by performing enrichment-driven optimization. Here, we introduce a novel shape-focused pharmacophore modeling algorithm
-
An automated calculation pipeline for differential pair interaction energies with molecular force fields using the Tinker Molecular Modeling Package J. Cheminfom. (IF 7.1) Pub Date : 2024-08-08 Felix Bänsch, Mirco Daniel, Harald Lanig, Christoph Steinbeck, Achim Zielesny
An automated pipeline for comprehensive calculation of intermolecular interaction energies based on molecular force-fields using the Tinker molecular modelling package is presented. Starting with non-optimized chemically intuitive monomer structures, the pipeline allows the approximation of global minimum energy monomers and dimers, configuration sampling for various monomer–monomer distances, estimation
-
Evaluation of reinforcement learning in transformer-based molecular design J. Cheminfom. (IF 7.1) Pub Date : 2024-08-08 Jiazhen He, Alessandro Tibo, Jon Paul Janet, Eva Nittinger, Christian Tyrchan, Werngard Czechtizky, Ola Engkvist
Designing compounds with a range of desirable properties is a fundamental challenge in drug discovery. In pre-clinical early drug discovery, novel compounds are often designed based on an already existing promising starting compound through structural modifications for further property optimization. Recently, transformer-based deep learning models have been explored for the task of molecular optimization
-
Hamiltonian diversity: effectively measuring molecular diversity by shortest Hamiltonian circuits J. Cheminfom. (IF 7.1) Pub Date : 2024-08-07 Xiuyuan Hu, Guoqing Liu, Quanming Yao, Yang Zhao, Hao Zhang
In recent years, significant advancements have been made in molecular generation algorithms aimed at facilitating drug development, and molecular diversity holds paramount importance within the realm of molecular generation. Nonetheless, the effective quantification of molecular diversity remains an elusive challenge, as extant metrics exemplified by Richness and Internal Diversity fall short in concurrently
-
Advancements in biotransformation pathway prediction: enhancements, datasets, and novel functionalities in enviPath J. Cheminfom. (IF 7.1) Pub Date : 2024-08-06 Jasmin Hafner, Tim Lorsbach, Sebastian Schmidt, Liam Brydon, Katharina Dost, Kunyang Zhang, Kathrin Fenner, Jörg Wicker
enviPath is a widely used database and prediction system for microbial biotransformation pathways of primarily xenobiotic compounds. Data and prediction system are freely available both via a web interface and a public REST API. Since its initial release in 2016, we extended the data available in enviPath and improved the performance of the prediction system and usability of the overall system. We
-
PETA: evaluating the impact of protein transfer learning with sub-word tokenization on downstream applications J. Cheminfom. (IF 7.1) Pub Date : 2024-08-02 Yang Tan, Mingchen Li, Ziyi Zhou, Pan Tan, Huiqun Yu, Guisheng Fan, Liang Hong
Protein language models (PLMs) play a dominant role in protein representation learning. Most existing PLMs regard proteins as sequences of 20 natural amino acids. The problem with this representation method is that it simply divides the protein sequence into sequences of individual amino acids, ignoring the fact that certain residues often occur together. Therefore, it is inappropriate to view amino
-
A novel multitask learning algorithm for tasks with distinct chemical space: zebrafish toxicity prediction as an example J. Cheminfom. (IF 7.1) Pub Date : 2024-08-02 Run-Hsin Lin, Pinpin Lin, Chia-Chi Wang, Chun-Wei Tung
Data scarcity is one of the most critical issues impeding the development of prediction models for chemical effects. Multitask learning algorithms leveraging knowledge from relevant tasks showed potential for dealing with tasks with limited data. However, current multitask methods mainly focus on learning from datasets whose task labels are available for most of the training samples. Since datasets
-
Implementation of a soft grading system for chemistry in a Moodle plugin: reaction handling J. Cheminfom. (IF 7.1) Pub Date : 2024-08-01 Louis Plyer, Gilles Marcou, Céline Perves, Fanny Bonachera, Alexander Varnek
Here, we present a new method for evaluating questions on chemical reactions in the context of remote education. This method can be used when binary grading is not sufficient as some tolerance may be acceptable. In order to determine a grade, the developed workflow uses the pairwise similarity assessment of two considered reactions, each encoded by a single molecular graph with the help of the Condensed
-
Transfer learning across different chemical domains: virtual screening of organic materials with deep learning models pretrained on small molecule and chemical reaction data J. Cheminfom. (IF 7.1) Pub Date : 2024-07-30 Chengwei Zhang, Yushuang Zhai, Ziyang Gong, Hongliang Duan, Yuan-Bin She, Yun-Fang Yang, An Su
Machine learning is becoming a preferred method for the virtual screening of organic materials due to its cost-effectiveness over traditional computationally demanding techniques. However, the scarcity of labeled data for organic materials poses a significant challenge for training advanced machine learning models. This study showcases the potential of utilizing databases of drug-like small molecules
-
Reproducible MS/MS library cleaning pipeline in matchms J. Cheminfom. (IF 7.1) Pub Date : 2024-07-29 Niek F. de Jonge, Helge Hecht, Michael Strobel, Mingxun Wang, Justin J. J. van der Hooft, Florian Huber
Mass spectral libraries have proven to be essential for mass spectrum annotation, both for library matching and training new machine learning algorithms. A key step in training machine learning models is the availability of high-quality training data. Public libraries of mass spectrometry data that are open to user submission often suffer from limited metadata curation and harmonization. The resulting
-
Hilbert-curve assisted structure embedding method J. Cheminfom. (IF 7.1) Pub Date : 2024-07-29 Gergely Zahoránszky-Kőhalmi, Kanny K. Wan, Alexander G. Godfrey
Chemical space embedding methods are widely utilized in various research settings for dimensional reduction, clustering and effective visualization. The maps generated by the embedding process can provide valuable insight to medicinal chemists in terms of the relationships between structural, physicochemical and biological properties of compounds. However, these maps are known to be difficult to interpret
-
A computational workflow for analysis of missense mutations in precision oncology J. Cheminfom. (IF 7.1) Pub Date : 2024-07-29 Rayyan Tariq Khan, Petra Pokorna, Jan Stourac, Simeon Borko, Ihor Arefiev, Joan Planas-Iglesias, Adam Dobias, Gaspar Pinto, Veronika Szotkowska, Jaroslav Sterba, Ondrej Slaby, Jiri Damborsky, Stanislav Mazurenko, David Bednar
Every year, more than 19 million cancer cases are diagnosed, and this number continues to increase annually. Since standard treatment options have varying success rates for different types of cancer, understanding the biology of an individual's tumour becomes crucial, especially for cases that are difficult to treat. Personalised high-throughput profiling, using next-generation sequencing, allows for
-
Enhancing molecular property prediction with auxiliary learning and task-specific adaptation J. Cheminfom. (IF 7.1) Pub Date : 2024-07-24 Vishal Dey, Xia Ning
Pretrained Graph Neural Networks have been widely adopted for various molecular property prediction tasks. Despite their ability to encode structural and relational features of molecules, traditional fine-tuning of such pretrained GNNs on the target task can lead to poor generalization. To address this, we explore the adaptation of pretrained GNNs to the target task by jointly training them with multiple
-
CACTI: an in silico chemical analysis tool through the integration of chemogenomic data and clustering analysis J. Cheminfom. (IF 7.1) Pub Date : 2024-07-24 Karla P. Godinez-Macias, Elizabeth A. Winzeler
It is well-accepted that knowledge of a small molecule’s target can accelerate optimization. Although chemogenomic databases are helpful resources for predicting or finding compound interaction partners, they tend to be limited and poorly annotated. Furthermore, unlike genes, compound identifiers are often not standardized, and many synonyms may exist, especially in the biological literature, making
-
Estimating the synthetic accessibility of molecules with building block and reaction-aware SAScore J. Cheminfom. (IF 7.1) Pub Date : 2024-07-23 Shuan Chen, Yousung Jung
Synthetic accessibility prediction is a task to estimate how easily a given molecule might be synthesizable in the laboratory, playing a crucial role in computer-aided molecular design. Although synthesis planning programs can determine synthesis routes, their slow processing times make them impractical for large-scale molecule screening. On the other hand, existing rapid synthesis accessibility estimation
-
Reaction rebalancing: a novel approach to curating reaction databases J. Cheminfom. (IF 7.1) Pub Date : 2024-07-19 Tieu-Long Phan, Klaus Weinbauer, Thomas Gärtner, Daniel Merkle, Jakob L. Andersen, Rolf Fagerberg, Peter F. Stadler
Reaction databases are a key resource for a wide variety of applications in computational chemistry and biochemistry, including Computer-aided Synthesis Planning (CASP) and the large-scale analysis of metabolic networks. The full potential of these resources can only be realized if datasets are accurate and complete. Missing co-reactants and co-products, i.e., unbalanced reactions, however, are the
-
piscesCSM: prediction of anticancer synergistic drug combinations J. Cheminfom. (IF 7.1) Pub Date : 2024-07-19 Raghad AlJarf, Carlos H. M. Rodrigues, Yoochan Myung, Douglas E. V. Pires, David B. Ascher
While drug combination therapies are of great importance, particularly in cancer treatment, identifying novel synergistic drug combinations has been a challenging venture. Computational methods have emerged in this context as a promising tool for prioritizing drug combinations for further evaluation, though they have presented limited performance, utility, and interpretability. Here, we propose a novel
-
Ualign: pushing the limit of template-free retrosynthesis prediction with unsupervised SMILES alignment J. Cheminfom. (IF 7.1) Pub Date : 2024-07-15 Kaipeng Zeng, Bo Yang, Xin Zhao, Yu Zhang, Fan Nie, Xiaokang Yang, Yaohui Jin, Yanyan Xu
Retrosynthesis planning poses a formidable challenge in the organic chemical industry, particularly in pharmaceuticals. Single-step retrosynthesis prediction, a crucial step in the planning process, has witnessed a surge in interest in recent years due to advancements in AI for science. Various deep learning-based methods have been proposed for this task in recent years, incorporating diverse levels
-
LVPocket: integrated 3D global-local information to protein binding pockets prediction with transfer learning of protein structure classification J. Cheminfom. (IF 7.1) Pub Date : 2024-07-07 Ruifeng Zhou, Jing Fan, Sishu Li, Wenjie Zeng, Yilun Chen, Xiaoshan Zheng, Hongyang Chen, Jun Liao
Previous deep learning methods for predicting protein binding pockets mainly employed 3D convolution, yet an abundance of convolution operations may lead the model to excessively prioritize local information, thus overlooking global information. Moreover, it is essential for us to account for the influence of diverse protein folding structural classes. Because proteins classified differently structurally
-
Advancements in hand-drawn chemical structure recognition through an enhanced DECIMER architecture J. Cheminfom. (IF 7.1) Pub Date : 2024-07-05 Kohulan Rajan, Henning Otto Brinkhaus, Achim Zielesny, Christoph Steinbeck
Accurate recognition of hand-drawn chemical structures is crucial for digitising hand-written chemical information in traditional laboratory notebooks or facilitating stylus-based structure entry on tablets or smartphones. However, the inherent variability in hand-drawn structures poses challenges for existing Optical Chemical Structure Recognition (OCSR) software. To address this, we present an enhanced
-
PromptSMILES: prompting for scaffold decoration and fragment linking in chemical language models J. Cheminfom. (IF 7.1) Pub Date : 2024-07-04 Morgan Thomas, Mazen Ahmad, Gary Tresadern, Gianni de Fabritiis
SMILES-based generative models are amongst the most robust and successful recent methods used to augment drug design. They are typically used for complete de novo generation, however, scaffold decoration and fragment linking applications are sometimes desirable which requires a different grammar, architecture, training dataset and therefore, re-training of a new model. In this work, we describe a simple
-
Application of machine reading comprehension techniques for named entity recognition in materials science J. Cheminfom. (IF 7.1) Pub Date : 2024-07-02 Zihui Huang, Liqiang He, Yuhang Yang, Andi Li, Zhiwen Zhang, Siwei Wu, Yang Wang, Yan He, Xujie Liu
Materials science is an interdisciplinary field that studies the properties, structures, and behaviors of different materials. A large amount of scientific literature contains rich knowledge in the field of materials science, but manually analyzing these papers to find material-related data is a daunting task. In information processing, named entity recognition (NER) plays a crucial role as it can
-
CPSign: conformal prediction for cheminformatics modeling J. Cheminfom. (IF 7.1) Pub Date : 2024-06-28 Staffan Arvidsson McShane, Ulf Norinder, Jonathan Alvarsson, Ernst Ahlberg, Lars Carlsson, Ola Spjuth
Conformal prediction has seen many applications in pharmaceutical science, being able to calibrate outputs of machine learning models and producing valid prediction intervals. We here present the open source software CPSign that is a complete implementation of conformal prediction for cheminformatics modeling. CPSign implements inductive and transductive conformal prediction for classification and
-
AutoTemplate: enhancing chemical reaction datasets for machine learning applications in organic chemistry J. Cheminfom. (IF 7.1) Pub Date : 2024-06-27 Lung-Yi Chen, Yi-Pei Li
This paper presents AutoTemplate, an innovative data preprocessing protocol, addressing the crucial need for high-quality chemical reaction datasets in the realm of machine learning applications in organic chemistry. Recent advances in artificial intelligence have expanded the application of machine learning in chemistry, particularly in yield prediction, retrosynthesis, and reaction condition prediction
-
Llamol: a dynamic multi-conditional generative transformer for de novo molecular design J. Cheminfom. (IF 7.1) Pub Date : 2024-06-21 Niklas Dobberstein, Astrid Maass, Jan Hamaekers
Generative models have demonstrated substantial promise in Natural Language Processing (NLP) and have found application in designing molecules, as seen in General Pretrained Transformer (GPT) models. In our efforts to develop such a tool for exploring the organic chemical space in search of potentially electro-active compounds, we present Llamol, a single novel generative transformer model based on
-
Physicochemical modelling of the retention mechanism of temperature-responsive polymeric columns for HPLC through machine learning algorithms J. Cheminfom. (IF 7.1) Pub Date : 2024-06-21 Elena Bandini, Rodrigo Castellano Ontiveros, Ardiana Kajtazi, Hamed Eghbali, Frédéric Lynen
Temperature-responsive liquid chromatography (TRLC) offers a promising alternative to reversed-phase liquid chromatography (RPLC) for environmentally friendly analytical techniques by utilizing pure water as a mobile phase, eliminating the need for harmful organic solvents. TRLC columns, packed with temperature-responsive polymers coupled to silica particles, exhibit a unique retention mechanism influenced
-
A BERT-based pretraining model for extracting molecular structural information from a SMILES sequence J. Cheminfom. (IF 7.1) Pub Date : 2024-06-19 Xiaofan Zheng, Yoichi Tomiura
Among the various molecular properties and their combinations, it is a costly process to obtain the desired molecular properties through theory or experiment. Using machine learning to analyze molecular structure features and to predict molecular properties is a potentially efficient alternative for accelerating the prediction of molecular properties. In this study, we analyze molecular properties
-
Stereochemically-aware bioactivity descriptors for uncharacterized chemical compounds J. Cheminfom. (IF 7.1) Pub Date : 2024-06-18 Arnau Comajuncosa-Creus, Aksel Lenes, Miguel Sánchez-Palomino, Dylan Dalton, Patrick Aloy
Stereochemistry plays a fundamental role in pharmacology. Here, we systematically investigate the relationship between stereoisomerism and bioactivity on over 1 M compounds, finding that a very significant fraction (~ 40%) of spatial isomer pairs show, to some extent, distinct bioactivities. We then use the 3D representation of these molecules to train a collection of deep neural networks (Signaturizers3D)
-
PubChem synonym filtering process using crowdsourcing J. Cheminfom. (IF 7.1) Pub Date : 2024-06-16 Sunghwan Kim, Bo Yu, Qingliang Li, Evan E. Bolton
PubChem ( https://pubchem.ncbi.nlm.nih.gov ) is a public chemical information resource containing more than 100 million unique chemical structures. One of the most requested tasks in PubChem and other chemical databases is to search chemicals by name (also commonly called a “chemical synonym”). PubChem performs this task by looking up chemical synonym-structure associations provided by individual depositors
-
Correction: QuanDB: a quantum chemical property database towards enhancing 3D molecular representation learning J. Cheminfom. (IF 7.1) Pub Date : 2024-06-11 Zhijiang Yang, Tengxin Huang, Li Pan, Jingjing Wang, Liangliang Wang, Junjie Ding, Junhua Xiao
Correction: Journal of Cheminformatics (2024) 16:48 https://doi.org/10.1186/s13321-024-00843-y Following publication of the original article [1], the authors identified a formatting error. The phrase "…with the computational cost exceeding 107 core-hours" in the Abstract should be replaced with "…with the computational cost exceeding 107 core-hours" The statement "…with a total computational cost of
-
An end-to-end method for predicting compound-protein interactions based on simplified homogeneous graph convolutional network and pre-trained language model J. Cheminfom. (IF 7.1) Pub Date : 2024-06-07 Yufang Zhang, Jiayi Li, Shenggeng Lin, Jianwei Zhao, Yi Xiong, Dong-Qing Wei
Identification of interactions between chemical compounds and proteins is crucial for various applications, including drug discovery, target identification, network pharmacology, and elucidation of protein functions. Deep neural network-based approaches are becoming increasingly popular in efficiently identifying compound-protein interactions with high-throughput capabilities, narrowing down the scope
-
PUResNetV2.0: a deep learning model leveraging sparse representation for improved ligand binding site prediction J. Cheminfom. (IF 7.1) Pub Date : 2024-06-07 Kandel Jeevan, Shrestha Palistha, Hilal Tayara, Kil T. Chong
Accurate ligand binding site prediction (LBSP) within proteins is essential for drug discovery. We developed ProteinUNetResNetV2.0 (PUResNetV2.0), leveraging sparse representation of protein structures to improve LBSP accuracy. Our training dataset included protein complexes from 4729 protein families. Evaluations on benchmark datasets showed that PUResNetV2.0 achieved an 85.4% Distance Center Atom
-
Protein target similarity is positive predictor of in vitro antipathogenic activity: a drug repurposing strategy for Plasmodium falciparum J. Cheminfom. (IF 7.1) Pub Date : 2024-05-30 Reagan M. Mogire, Silviane A. Miruka, Dennis W. Juma, Case W. McNamara, Ben Andagalu, Jeremy N. Burrows, Elodie Chenu, James Duffy, Bernhards R. Ogutu, Hoseah M. Akala
Drug discovery is an intricate and costly process. Repurposing existing drugs and active compounds offers a viable pathway to develop new therapies for various diseases. By leveraging publicly available biomedical information, it is possible to predict compounds’ activity and identify their potential targets across diverse organisms. In this study, we aimed to assess the antiplasmodial activity of
-
Identifying uncertainty in physical–chemical property estimation with IFSQSAR J. Cheminfom. (IF 7.1) Pub Date : 2024-05-30 Trevor N. Brown, Alessandro Sangion, Jon A. Arnot
This study describes the development and evaluation of six new models for predicting physical–chemical (PC) properties that are highly relevant for chemical hazard, exposure, and risk estimation: solubility (in water SW and octanol SO), vapor pressure (VP), and the octanol–water (KOW), octanol–air (KOA), and air–water (KAW) partition ratios. The models are implemented in the Iterative Fragment Selection
-
MolScore: a scoring, evaluation and benchmarking framework for generative models in de novo drug design J. Cheminfom. (IF 7.1) Pub Date : 2024-05-30 Morgan Thomas, Noel M. O’Boyle, Andreas Bender, Chris De Graaf
Generative models are undergoing rapid research and application to de novo drug design. To facilitate their application and evaluation, we present MolScore. MolScore already contains many drug-design-relevant scoring functions commonly used in benchmarks such as, molecular similarity, molecular docking, predictive models, synthesizability, and more. In addition, providing performance metrics to evaluate
-
Consensus holistic virtual screening for drug discovery: a novel machine learning model approach J. Cheminfom. (IF 7.1) Pub Date : 2024-05-28 Said Moshawih, Zhen Hui Bu, Hui Poh Goh, Nurolaini Kifli, Lam Hong Lee, Khang Wen Goh, Long Chiau Ming
In drug discovery, virtual screening is crucial for identifying potential hit compounds. This study aims to present a novel pipeline that employs machine learning models that amalgamates various conventional screening methods. A diverse array of protein targets was selected, and their corresponding datasets were subjected to active/decoy distribution analysis prior to scoring using four distinct methods:
-
TransExION: a transformer based explainable similarity metric for comparing IONS in tandem mass spectrometry J. Cheminfom. (IF 7.1) Pub Date : 2024-05-28 Danh Bui-Thi, Youzhong Liu, Jennifer L. Lippens, Kris Laukens, Thomas De Vijlder
Small molecule identification is a crucial task in analytical chemistry and life sciences. One of the most commonly used technologies to elucidate small molecule structures is mass spectrometry. Spectral library search of product ion spectra (MS/MS) is a popular strategy to identify or find structural analogues. This approach relies on the assumption that spectral similarity and structural similarity
-
Solvent flashcards: a visualisation tool for sustainable chemistry J. Cheminfom. (IF 7.1) Pub Date : 2024-05-28 Joseph Heeley, Samuel Boobier, Jonathan D. Hirst
Selecting greener solvents during experiment design is imperative for greener chemistry. While many solvent selection guides are currently used in the pharmaceutical industry, these are often paper-based guides which can make it difficult to identify and compare specific solvents. This work presents a stand-alone version of the solvent flashcards that were developed as part of the AI4Green electronic
-
Development of scoring-assisted generative exploration (SAGE) and its application to dual inhibitor design for acetylcholinesterase and monoamine oxidase B J. Cheminfom. (IF 7.1) Pub Date : 2024-05-24 Hocheol Lim
De novo molecular design is the process of searching chemical space for drug-like molecules with desired properties, and deep learning has been recognized as a promising solution. In this study, I developed an effective computational method called Scoring-Assisted Generative Exploration (SAGE) to enhance chemical diversity and property optimization through virtual synthesis simulation, the generation
-
CineMol: a programmatically accessible direct-to-SVG 3D small molecule drawer J. Cheminfom. (IF 7.1) Pub Date : 2024-05-23 David Meijer, Marnix H. Medema, Justin J. J. van der Hooft
Effective visualization of small molecules is paramount in conveying concepts and results in cheminformatics. Scalable vector graphics (SVG) are preferred for creating such visualizations, as SVGs can be easily altered in post-production and exported to other formats. A wide spectrum of software applications already exist that can visualize molecules, and customize these visualizations, in many ways
-
AiZynthFinder 4.0: developments based on learnings from 3 years of industrial application J. Cheminfom. (IF 7.1) Pub Date : 2024-05-23 Lakshidaa Saigiridharan, Alan Kai Hassen, Helen Lai, Paula Torren-Peraire, Ola Engkvist, Samuel Genheden
We present an updated overview of the AiZynthFinder package for retrosynthesis planning. Since the first version was released in 2020, we have added a substantial number of new features based on user feedback. Feature enhancements include policies for filter reactions, support for any one-step retrosynthesis model, a scoring framework and several additional search algorithms. To exemplify the typical
-
MolPROP: Molecular Property prediction with multimodal language and graph fusion J. Cheminfom. (IF 7.1) Pub Date : 2024-05-22 Zachary A. Rollins, Alan C. Cheng, Essam Metwally
Pretrained deep learning models self-supervised on large datasets of language, image, and graph representations are often fine-tuned on downstream tasks and have demonstrated remarkable adaptability in a variety of applications including chatbots, autonomous driving, and protein folding. Additional research aims to improve performance on downstream tasks by fusing high dimensional data representations
-
Generative design of compounds with desired potency from target protein sequences using a multimodal biochemical language model J. Cheminfom. (IF 7.1) Pub Date : 2024-05-22 Hengwei Chen, Jürgen Bajorath
Deep learning models adapted from natural language processing offer new opportunities for the prediction of active compounds via machine translation of sequential molecular data representations. For example, chemical language models are often derived for compound string transformation. Moreover, given the principal versatility of language models for translating different types of textual representations
-
InChI isotopologue and isotopomer specifications J. Cheminfom. (IF 7.1) Pub Date : 2024-05-14 Hunter N. B. Moseley, Philippe Rocca-Serra, Reza M. Salek, Masanori Arita, Emma L. Schymanski
This work presents a proposed extension to the International Union of Pure and Applied Chemistry (IUPAC) International Chemical Identifier (InChI) standard that allows the representation of isotopically-resolved chemical entities at varying levels of ambiguity in isotope location. This extension includes an improved interpretation of the current isotopic layer within the InChI standard and a new isotopologue
-
One chiral fingerprint to find them all J. Cheminfom. (IF 7.1) Pub Date : 2024-05-13 Markus Orsi, Jean-Louis Reymond
Molecular fingerprints are indispensable tools in cheminformatics. However, stereochemistry is generally not considered, which is problematic for large molecules which are almost all chiral. Herein we report MAP4C, a chiral version of our previously reported fingerprint MAP4, which lists MinHashes computed from character strings containing the SMILES of all pairs of circular substructures up to a diameter
-
Distance plus attention for binding affinity prediction J. Cheminfom. (IF 7.1) Pub Date : 2024-05-12 Julia Rahman, M. A. Hakim Newton, Mohammed Eunus Ali, Abdul Sattar
Protein-ligand binding affinity plays a pivotal role in drug development, particularly in identifying potential ligands for target disease-related proteins. Accurate affinity predictions can significantly reduce both the time and cost involved in drug development. However, highly precise affinity prediction remains a research challenge. A key to improve affinity prediction is to capture interactions
-
CIME4R: Exploring iterative, AI-guided chemical reaction optimization campaigns in their parameter space J. Cheminfom. (IF 7.1) Pub Date : 2024-05-10 Christina Humer, Rachel Nicholls, Henry Heberle, Moritz Heckmann, Michael Pühringer, Thomas Wolf, Maximilian Lübbesmeyer, Julian Heinrich, Julius Hillenbrand, Giulio Volpin, Marc Streit
Chemical reaction optimization (RO) is an iterative process that results in large, high-dimensional datasets. Current tools allow for only limited analysis and understanding of parameter spaces, making it hard for scientists to review or follow changes throughout the process. With the recent emergence of using artificial intelligence (AI) models to aid RO, another level of complexity has been added
-
Leveraging computational tools to combat malaria: assessment and development of new therapeutics J. Cheminfom. (IF 7.1) Pub Date : 2024-05-02 Nomagugu B. Ncube, Matshawandile Tukulula, Krishna G. Govender
As the world grapples with the relentless challenges posed by diseases like malaria, the advent of sophisticated computational tools has emerged as a beacon of hope in the quest for effective treatments. In this study we delve into the strategies behind computational tools encompassing virtual screening, molecular docking, artificial intelligence (AI), and machine learning (ML). We assess their effectiveness
-
From papers to RDF-based integration of physicochemical data and adverse outcome pathways for nanomaterials J. Cheminfom. (IF 7.1) Pub Date : 2024-05-01 Jeaphianne P. M. van Rijn, Marvin Martens, Ammar Ammar, Mihaela Roxana Cimpan, Valerie Fessard, Peter Hoet, Nina Jeliazkova, Sivakumar Murugadoss, Ivana Vinković Vrček, Egon L. Willighagen
Adverse Outcome Pathways (AOPs) have been proposed to facilitate mechanistic understanding of interactions of chemicals/materials with biological systems. Each AOP starts with a molecular initiating event (MIE) and possibly ends with adverse outcome(s) (AOs) via a series of key events (KEs). So far, the interaction of engineered nanomaterials (ENMs) with biomolecules, biomembranes, cells, and biological
-
QuanDB: a quantum chemical property database towards enhancing 3D molecular representation learning J. Cheminfom. (IF 7.1) Pub Date : 2024-04-29 Zhijiang Yang, Tengxin Huang, Li Pan, Jingjing Wang, Liangliang Wang, Junjie Ding, Junhua Xiao
Previous studies have shown that the three-dimensional (3D) geometric and electronic structure of molecules play a crucial role in determining their key properties and intermolecular interactions. Therefore, it is necessary to establish a quantum chemical (QC) property database containing the most stable 3D geometric conformations and electronic structures of molecules. In this study, a high-quality