-
AACFlow: An end-to-end model based on attention augmented convolutional neural network and flow-attention mechanism for identification of anticancer peptides Bioinformatics (IF 5.8) Pub Date : 2024-03-06 Shengli Zhang, Ya Zhao, Yunyun Liang
Motivation Anticancer peptides (ACPs) have natural cationic properties and can act on the anionic cell membrane of cancer cells to kill cancer cells. Therefore, ACPs have become a potential anticancer drug with good research value and prospect. Results In this paper, we propose AACFlow, an end-to-end model for identification of ACPs based on deep learning. End-to-end models have more room to automatically
-
Shu: Visualization of high dimensional biological pathways Bioinformatics (IF 5.8) Pub Date : 2024-03-06 Jorge Carrasco Muriel, Nicholas Cowie, Shannara Taylor Parkins, Marjan Mansouvar, Teddy Groves, Lars Keld Nielsen
Summary Shu is a visualization tool that integrates diverse data types into a metabolic map, with a focus on supporting multiple conditions and visualizing distributions. The goal is to provide a unified platform for handling the growing volume of multi-omics data, leveraging the metabolic maps developed by the metabolic modeling community. Additionally, shu offers a streamlined python API, based on
-
Multi-scale topology and position feature learning and relationship-aware graph reasoning for prediction of drug-related microbes Bioinformatics (IF 5.8) Pub Date : 2024-01-25 Ping Xuan, Jing Gu, Hui Cui, Shuai Wang, Toshiya Nakaguchi, Cheng Liu, Tiangang Zhang
Motivation The human microbiome may impact the effectiveness of drugs by modulating their activities and toxicities. Predicting candidate microbes for drugs can facilitate the exploration of the therapeutic effects of drugs. Most recent methods concentrate on constructing of the prediction models based on graph reasoning. They fail to sufficiently exploit the topology and position information, the
-
A simple refined DNA minimizer operator enables twofold faster computation Bioinformatics (IF 5.8) Pub Date : 2024-01-25 Chenxu Pan, Knut Reinert
Motivation The minimizer concept is a data structure for sequence sketching. The standard canonical minimizer selects a subset of k-mers from the given DNA sequence by comparing the forward and reverse k-mers in a window simultaneously according to a predefined selection scheme. It is widely employed by sequence analysis such as read mapping and assembly. k-mer density, k-mer repetitiveness (e.g. k-mer
-
Statistical framework to determine indel length distribution Bioinformatics (IF 5.8) Pub Date : 2024-01-25 Elya Wygoda, Gil Loewenthal, Asher Moshe, Michael Alburquerque, Itay Mayrose, Tal Pupko
Motivation Insertions and deletions (indels) of short DNA segments, along with substitutions, are the most frequent molecular evolutionary events. Indels were shown to affect numerous macro-evolutionary processes. Because indels may span multiple positions, their impact is a product of both their rate and their length distribution. An accurate inference of indel length distribution is important for
-
HiPhase: Jointly phasing small, structural, and tandem repeat variants from HiFi sequencing Bioinformatics (IF 5.8) Pub Date : 2024-01-25 James M Holt, Christopher T Saunders, William J Rowell, Zev Kronenberg, Aaron M Wenger, Michael Eberle
Motivation In diploid organisms, phasing is the problem of assigning the alleles at heterozygous variants to one of two haplotypes. Reads from PacBio HiFi sequencing provide long, accurate observations that can be used as the basis for both calling and phasing variants. HiFi reads also excel at calling larger classes of variation such as structural or tandem repeat variants. However, current phasing
-
Fragmentstein—Facilitating data reuse for cell-free DNA fragment analysis Bioinformatics (IF 5.8) Pub Date : 2024-01-15 Zsolt Balázs, Todor Gitchev, Ivna Ivankovic, Michael Krauthammer
Summary Method development for the analysis of cfDNA sequencing data is impeded by limited data sharing due to the strict control of sensitive genomic data. An existing solution for facilitating data sharing removes nucleotide-level information from raw cfDNA sequencing data, keeping alignment coordinates only. This simplified format can be publicly shared and would, theoretically, suffice for common
-
Multiomics-integrated deep language model enables in silico genome-wide detection of transcription factor binding site in unexplored biosamples Bioinformatics (IF 5.8) Pub Date : 2024-01-11 Zikun Yang, Xin Li, Lele Sheng, Ming Zhu, Xun Lan, Fei Gu
Motivation Transcription factor binding sites (TFBS) are regulatory elements that have significant impact on transcription regulation and cell fate determination. Canonical motifs, biological experiments, and computational methods have made it possible to discover TFBS. However, most existing in silico TFBS prediction models are solely-DNA-based, and are trained and utilized within the same biosample
-
Microbial Interactions from a New Perspective: Reinforcement Learning Reveals New Insights into Microbiome Evolution Bioinformatics (IF 5.8) Pub Date : 2024-01-11 Parsa Ghadermazi, Siu Hung Joshua Chan
Motivation Microbes are essential part of all ecosystems, influencing material flow and shaping their surroundings. Metabolic modeling has been a useful tool and provided tremendous insights into microbial community metabolism. However, current methods based on flux balance analysis (FBA) usually fail to predict metabolic and regulatory strategies that lead to long-term survival and stability especially
-
MK-BMC: a Multi-Kernel framework with Boosted distance metrics for Microbiome data for Classification Bioinformatics (IF 5.8) Pub Date : 2024-01-11 Huang Xu, Tian Wang, Yuqi Miao, Min Qian, Yaning Yang, Shuang Wang
Motivation Research on human microbiome have suggested associations with human health, opening opportunities to predict health outcomes using microbiome. Studies have also suggested that diverse forms of taxa such as rare taxa that are evolutionally-related and abundant taxa that are evolutionally-unrelated could be associated with or predictive of a health outcome. Although prediction models were
-
Pitfalls of machine learning models for protein-protein interaction networks Bioinformatics (IF 5.8) Pub Date : 2024-01-11 Loïc Lannelongue, Michael Inouye
Motivation Protein-protein interactions (PPIs) are essential to understanding biological pathways as well as their roles in development and disease. Computational tools, based on classic machine learning, have been successful at predicting PPIs in silico, but the lack of consistent and reliable frameworks for this task has led to network models that are difficult to compare and discrepancies between
-
FMAlign2: a novel fast multiple nucleotide sequence alignment method for ultralong datasets Bioinformatics (IF 5.8) Pub Date : 2024-01-11 Pinglu Zhang, Huan Liu, Yanming Wei, Yixiao Zhai, Qinzhong Tian, Quan Zou
Motivation In bioinformatics, multiple sequence alignment (MSA) is a crucial task. However, conventional methods often struggle with aligning ultralong sequences. To address this issue, researchers have designed MSA methods rooted in a vertical division strategy, which segments sequence data for parallel alignment. A prime example of this approach is FMAlign, which utilizes the FM-index to extract
-
nf-core/nanostring: A pipeline for reproducible NanoString nCounter analysis Bioinformatics (IF 5.8) Pub Date : 2024-01-10 Alexander Peltzer, Christopher Mohr, Kai B Stadermann, Matthias Zwick, Ramona Schmid
Motivation The NanoString™ nCounter® technology platform is a widely used targeted quantification platform for the analysis of gene expression of up to ∼ 800 genes. Whereas the software tools by the manufacturer can perform the analysis in an interactive and GUI driven approach, there is no portable and user-friendly workflow available that can be used to perform reproducible analysis of multiple samples
-
Seq-InSite: sequence supersedes structure for protein interaction site prediction Bioinformatics (IF 5.8) Pub Date : 2024-01-10 SeyedMohsen Hosseini, G Brian Golding, Lucian Ilie
Motivation Proteins accomplish cellular functions by interacting with each other, which makes the prediction of interaction sites a fundamental problem. As experimental methods are expensive and time consuming, computational prediction of the interaction sites has been studied extensively. Structure-based programs are the most accurate, while the sequence-based ones are much more widely applicable
-
methyLImp2: faster missing value estimation for DNA methylation data Bioinformatics (IF 5.8) Pub Date : 2024-01-10 Anna Plaksienko, Pietro Di Lena, Christine Nardini, Claudia Angelini
Motivation methyLImp, a method we recently introduced for the missing value estimation of DNA methylation data, has demonstrated competitive performance in data imputation compared to the existing, general-purpose, approaches. However, imputation running time was considerably long and unfeasible in case of large datasets with numerous missing values. Results methyLImp2 made possible computations that
-
ViralWasm: a client-side user-friendly web application suite for viral genomics Bioinformatics (IF 5.8) Pub Date : 2024-01-09 Daniel Ji, Robert Aboukhalil, Niema Moshiri
Motivation The genomic surveillance of viral pathogens such as SARS-CoV-2 and HIV-1 has been critical to modern epidemiology and public health, but the use of sequence analysis pipelines requires computational expertise, and web-based platforms require sending potentially sensitive raw sequence data to remote servers. Results We introduce ViralWasm, a user-friendly graphical web application suite for
-
CircSI-SSL: circRNA-binding site identification based on self-supervised learning Bioinformatics (IF 5.8) Pub Date : 2024-01-05 Chao Cao, Chunyu Wang, Shuhong Yang, Quan Zou
Motivation In recent years, circular RNAs (circRNAs), the particular form of RNA with a closed-loop structure, has attracted widespread attention due to their physiological significance (they can directly bind proteins), leading to the development of numerous protein site identification algorithms. Unfortunately, these studies are supervised and require the vast majority of labeled samples in training
-
SciDataFlow: A Tool for Improving the Flow of Data through Science Bioinformatics (IF 5.8) Pub Date : 2024-01-05 Vince Buffalo
Motivation Managing data and code in open scientific research is complicated by two key problems: large datasets often cannot be stored alongside code in repository platforms like GitHub, and iterative analysis can lead to unnoticed changes to data, increasing the risk that analyses are based on older versions of data. Results SciDataFlow is a fast, concurrent command-line tool paired with a simple
-
selscan 2.0: scanning for sweeps in unphased data Bioinformatics (IF 5.8) Pub Date : 2024-01-05 Zachary A Szpiech
Summary Several popular haplotype-based statistics for identifying recent or ongoing positive selection in genomes require knowledge of haplotype phase. Here we provide an update to selscan which implements a re-definition of these statistics for use in unphased data. Availability and Implementation Source code and binaries freely available at https://github.com/szpiech/selscan, implemented in C/C
-
Embedding-based alignment: combining protein language models with dynamic programming alignment to detect structural similarities in the twilight-zone Bioinformatics (IF 5.8) Pub Date : 2024-01-04 Lorenzo Pantolini, Gabriel Studer, Joana Pereira, Janani Durairaj, Gerardo Tauriello, Torsten Schwede
Motivation Language models are routinely used for text classification and generative tasks. Recently, the same architectures were applied to protein sequences, unlocking powerful new approaches in the bioinformatics field. Protein language models (pLMs) generate high dimensional embeddings on a per-residue level and encode a “semantic meaning” of each individual amino acid in the context of the full
-
Mosaic-PICASSO: accurate crosstalk removal for multiplex fluorescence imaging Bioinformatics (IF 5.8) Pub Date : 2024-01-04 Hu Cang, Yang Liu, Jianhua Xing
Motivation Ultra-multiplexed fluorescence imaging has revolutionized our understanding of biological systems, enabling the simultaneous visualization and quantification of multiple targets within biological specimens. A recent breakthrough in this field is PICASSO, a mutual-information-based technique capable of demixing up to 15 fluorophores without their spectra, thereby significantly simplifying
-
RPEMHC: improved prediction of MHC-peptide binding affinity by a deep learning approach based on residue-residue pair encoding Bioinformatics (IF 5.8) Pub Date : 2024-01-04 Xuejiao Wang, Tingfang Wu, Yelu Jiang, Taoning Chen, Deng Pan, Zhi Jin, Jingxin Xie, Lijun Quan, Qiang Lyu
Motivation Binding of peptides to major histocompatibility complex (MHC) molecules plays a crucial role in triggering T cell recognition mechanisms essential for immune response. Accurate prediction of MHC-peptide binding is vital for the development of cancer therapeutic vaccines. While recent deep learning-based methods have achieved significant performance in predicting MHC-peptide binding affinity
-
ExEmPLAR (Extracting, Exploring and EMbedding Pathways Leading to Actionable Research): A User-friendly Interface for Knowledge Graph Mining Bioinformatics (IF 5.8) Pub Date : 2024-01-04 Jon-Michael T Beasley, Daniel R Korn, Nyssa N Tucker, Erick T M Alves, Eugene N Muratov, Chris Bizon, Alexander Tropsha
Summary Knowledge graphs are being increasingly used in biomedical research to link large amounts of heterogenous data and facilitate reasoning across diverse knowledge sources. Wider adoption and exploration of knowledge graphs in the biomedical research community is limited by requirements to understand the underlying graph structure in terms of entity types and relationships, represented as nodes
-
A fast machine learning dataloader for epigenetic tracks from BigWig files Bioinformatics (IF 5.8) Pub Date : 2024-01-04 Joren Sebastian Retel, Andreas Poehlmann, Josh Chiou, Andreas Steffen, Djork-Arné Clevert
Summary We created bigwig-loader, a data-loader for epigenetic profiles from BigWig files that decompresses and processes information for multiple intervals from multiple BigWig files in parallel. This is an access pattern needed to create training batches for typical machine learning models on epigenetics data. Using a new codec, the decompression can be done on GPU making it fast enough to create
-
HAT: de novo variant calling for highly accurate short-read and long-read sequencing data Bioinformatics (IF 5.8) Pub Date : 2024-01-02 Jeffrey K Ng, Tychele N Turner
Motivation de novo variants (DNVs) are variants that are present in offspring but not in their parents. DNVs are both important for examining mutation rates as well as in the identification of disease-related variation. While efforts have been made to call DNVs, calling of DNVs is still challenging from parent-child sequenced trio data. We developed Hare And Tortoise (HAT) as an automated DNV detection
-
M-Ionic: Prediction of metal ion binding sites from sequence using residue embeddings Bioinformatics (IF 5.8) Pub Date : 2024-01-01 Aditi Shenoy, Yogesh Kalakoti, Durai Sundar, Arne Elofsson
Motivation Understanding metal-protein interaction can provide structural and functional insights into cellular processes. As the number of protein sequences increases, developing fast yet precise computational approaches to predict and annotate metal binding sites becomes imperative. Quick and resource-efficient pre-trained protein language model (pLM) embeddings have successfully predicted binding
-
MHCSeqNet2 - Improved Peptide-Class I MHC Binding Prediction for Alleles with Low Data Bioinformatics (IF 5.8) Pub Date : 2023-12-28 Patiphan Wongklaew, Sira Sriswasdi, Ekapol Chuangsuwanich
Motivation The binding of a peptide antigen to a class I major histocompatibility complex (MHC) protein is part of a key process that lets the immune system recognize an infected cell or a cancer cell. This mechanism enabled the development of peptide-based vaccines that can activate the patient’s immune response to treat cancers. Hence, the ability of accurately predict peptide-MHC binding is an essential
-
Mining literature and pathway data to explore the relations of ketamine with neurotransmitters and gut microbiota using a knowledge-graph Bioinformatics (IF 5.8) Pub Date : 2023-12-26 Ting Liu, K Anton Feenstra, Jaap Heringa, Zhisheng Huang
Motivation Up-to-date pathway knowledge is usually presented in scientific publications for human reading, making it difficult to utilize these resources for semantic integration and computational analysis of biological pathways. We here present an approach to mining knowledge graphs by combining manual curation with automated named entity recognition and automated relation extraction. This approach
-
TEFDTA: A Transformer Encoder and Fingerprint Representation Combined Prediction Method for Bonded and Non-Bonded Drug-Target Affinities Bioinformatics (IF 5.8) Pub Date : 2023-12-23 Zongquan Li, Pengxuan Ren, Hao yang, Jie Zheng, Fang Bai
Motivation The prediction of binding affinity between drug and target is crucial in drug discovery. However, the accuracy of current methods still needs to be improved. On the other hand, most deep learning methods focus only on the prediction of non-covalent (non-bonded) binding molecular systems, but neglect the cases of covalent binding, which has gained increasing attention in the field of drug
-
scDMV: A Zero-one Inflated Beta Mixture Model for DNA Methylation Variability with scBS-Seq Data Bioinformatics (IF 5.8) Pub Date : 2023-12-23 Yan Zhou, Ying Zhang, Minjiao Peng, Yaru Zhang, Chenghao Li, Lianjie Shu, Yaohua Hu, Jianzhong Su, Jinfeng Xu
Motivation The utilization of single-cell bisulfite sequencing (scBS-seq) methods allows for precise analysis of DNA methylation patterns at the individual cell level, enabling the identification of rare populations, revealing cell-specific epigenetic changes, and improving differential methylation analysis. Nonetheless, the presence of sparse data and an overabundance of zeros and ones, attributed
-
SOHPIE: Statistical Approach via Pseudo-Value Information and Estimation for Differential Network Analysis of Microbiome Data Bioinformatics (IF 5.8) Pub Date : 2023-12-22 Seungjun Ahn, Somnath Datta
Summary The SOHPIE R package implements a novel functionality for “multivariable” differential co-abundance network (DN, hereafter) analyses of microbiome data. It incorporates a regression approach that adjusts for additional covariates for DN analyses. This distinguishes from previous prominent approaches in DN analyses such as MDiNE and NetCoMi which do not feature a covariate adjustment of finding
-
GeNNius: An ultrafast drug-target interaction inference method based on graph neural networks Bioinformatics (IF 5.8) Pub Date : 2023-12-22 Uxía Veleiro, Jesús de la Fuente, Guillermo Serrano, Marija Pizurica, Mikel Casals, Antonio Pineda-Lucena, Silve Vicent, Idoia Ochoa, Olivier Gevaert, Mikel Hernaez
Motivation Drug-target interaction (DTI) prediction is a relevant but challenging task in the drug repurposing field. In-silico approaches have drawn particular attention as they can reduce associated costs and time commitment of traditional methodologies. Yet, current state-of-the-art methods present several limitations: existing DTI prediction approaches are computationally expensive, thereby hindering
-
CellularPotts.jl: Simulating Multiscale Cellular Models in Julia Bioinformatics (IF 5.8) Pub Date : 2023-12-22 Robert W Gregg, Panayiotis V Benos
Summary CellularPotts.jl is a software package written in Julia to simulate biological cellular processes such as division, adhesion, and signaling. Accurately modeling and predicting these simple processes is crucial because they facilitate more complex biological phenomena related to important disease states like tumor growth, wound healing, and infection. Here we take advantage of Cellular Potts
-
Coracle—A Machine Learning Framework to Identify Bacteria Associated with Continuous Variables Bioinformatics (IF 5.8) Pub Date : 2023-12-19 Sebastian Staab, Anny Cardénas, Raquel S Peixoto, Falk Schreiber, Christian R Voolstra
Summary We present Coracle, an Artificial Intelligence (AI) framework that can identify associations between bacterial communities and continuous variables. Coracle uses an ensemble approach of prominent feature selection methods and machine learning (ML) models to identify features, i.e., bacteria, associated with a continuous variable, e.g. host thermal tolerance. The results are aggregated into
-
Scaling up Single-Cell RNA-seq Data Analysis with CellBridge Workflow Bioinformatics (IF 5.8) Pub Date : 2023-12-19 Nima Nouri, Andre H Kurlovs, Giorgio Gaglia, Emanuele de Rinaldis, Virginia Savova
Summary Single-cell RNA sequencing (scRNA-seq) has revolutionized the study of gene expression at the individual cell level, unraveling unprecedented insights into cellular heterogeneity. However, the analysis of scRNA-seq data remains a challenging and time-consuming task, often demanding advanced computational expertise, rendering it impractical for high-volume environments and applications. We present
-
ELISL: Early-Late Integrated Synthetic Lethality Prediction in Cancer Bioinformatics (IF 5.8) Pub Date : 2023-12-19 Yasin I Tepeli, Colm Seale, Joana P Gonçalves
Motivation Anti-cancer therapies based on synthetic lethality (SL) exploit tumour vulnerabilities for treatment with reduced side effects, by targeting a gene that is jointly essential with another whose function is lost. Computational prediction is key to expedite SL screening, yet existing methods are vulnerable to prevalent selection bias in SL data and reliant on cancer or tissue type-specific
-
pyCapsid: Identifying dominant dynamics and quasi-rigid mechanical units in protein shells Bioinformatics (IF 5.8) Pub Date : 2023-12-19 Colin Brown, Anuradha Agarwal, Antoni Luque
Summary pyCapsid is a Python package developed to facilitate the characterization of the dynamics and quasi-rigid mechanical units of protein shells and other protein complexes. The package was developed in response to the rapid increase of high-resolution structures, particularly capsids of viruses, requiring multiscale biophysical analyses. Given a protein shell, pyCapsid generates the collective
-
ENTRAIN: integrating trajectory inference and gene regulatory networks with spatial data to co-localize the receptor-ligand interactions that specify cell fate Bioinformatics (IF 5.8) Pub Date : 2023-12-19 Wunna Kyaw, Ryan C Chai, Weng Hua Khoo, Leonard D Goldstein, Peter I Croucher, John M Murray, Tri Giang Phan
Motivation Cell fate is commonly studied by profiling the gene expression of single cells to infer developmental trajectories based on expression similarity, RNA velocity, or statistical mechanical properties. However, current approaches do not recover microenvironmental signals from the cellular niche that drive a differentiation trajectory. Results We resolve this with environment-aware trajectory
-
CoSIA: an R Bioconductor package for CrOss Species Investigation and Analysis Bioinformatics (IF 5.8) Pub Date : 2023-12-18 Anisha Haldar, Vishal H Oza, Nathaniel S DeVoss, Amanda D Clark, Brittany N Lasseigne
Summary High throughput sequencing technologies have enabled cross-species comparative transcriptomic studies; however, there are numerous challenges for these studies due to biological and technical factors. We developed CoSIA (Cross-Species Investigation and Analysis), an Bioconductor R package and Shiny app that provides an alternative framework for cross-species transcriptomic comparison of non-diseased
-
LncLocFormer: a Transformer-based deep learning model for multi-label lncRNA subcellular localization prediction by using localization-specific attention mechanism Bioinformatics (IF 5.8) Pub Date : 2023-12-18 Min Zeng, Yifan Wu, Yiming Li, Rui Yin, Chengqian Lu, Junwen Duan, Min Li
Motivation There is mounting evidence that the subcellular localization of lncRNAs can provide valuable insights into their biological functions. In the real world of transcriptomes, lncRNAs are usually localized in multiple subcellular localizations. Furthermore, lncRNAs have specific localization patterns for different subcellular localizations. Although several computational methods have been developed
-
Clumppling: cluster matching and permutation program with integer linear programming Bioinformatics (IF 5.8) Pub Date : 2023-12-14 Xiran Liu, Naama M Kopelman, Noah A Rosenberg
Motivation In the mixed-membership unsupervised clustering analyses commonly used in population genetics, multiple replicate data analyses can differ in their clustering solutions. Combinatorial algorithms assist in aligning clustering outputs from multiple replicates, so that clustering solutions can be interpreted and combined across replicates. Although several algorithms have been introduced, challenges
-
Benchmarking and improving the performance of variant-calling pipelines with RecallME Bioinformatics (IF 5.8) Pub Date : 2023-12-14 G Vozza, E Bonetti, G Tini, V Favalli, G Frige’, G Bucci, S De Summa, M Zanfardino, F Zapelloni, L Mazzarella
Motivation The steady increment of Whole Genome/Exome sequencing and the development of novel NGS-based gene panels requires continuous testing and validation of variant calling pipelines and the detection of sequencing-related issues to be maintained up-to-date and feasible for the clinical settings. State of the art tools are reliable when used to compute standard performance metrics. However, the
-
Assigning mutational signatures to individual samples and individual somatic mutations with SigProfilerAssignment Bioinformatics (IF 5.8) Pub Date : 2023-12-14 Marcos Díaz-Gay, Raviteja Vangara, Mark Barnes, Xi Wang, S M Ashiqul Islam, Ian Vermes, Stephen Duke, Nithish Bharadhwaj Narasimman, Ting Yang, Zichen Jiang, Sarah Moody, Sergey Senkin, Paul Brennan, Michael R Stratton, Ludmil B Alexandrov
Motivation Analysis of mutational signatures is a powerful approach for understanding the mutagenic processes that have shaped the evolution of a cancer genome. To evaluate the mutational signatures operative in a cancer genome, one first needs to quantify their activities by estimating the number of mutations imprinted by each signature. Results Here we present SigProfilerAssignment, a desktop and
-
VSCode-Antimony: A Source Editor for Building, Analyzing, and Translating Antimony Models Bioinformatics (IF 5.8) Pub Date : 2023-12-14 Steve Ma, Longxuan Fan, Sai Anish Konanki, Eva Liu, John H Gennari, Lucian P Smith, Joseph L Hellerstein, Herbert M Sauro
Motivation Developing biochemical models in systems biology is a complex, knowledge-intensive activity. Some modelers (especially novices) benefit from model development tools with a graphical user interface (GUI). However, as with the development of complex software, text-based representations of models provide many benefits for advanced model development. At present, the tools for text-based model
-
IntelliGenes: A novel machine learning pipeline for biomarker discovery and predictive analysis using multi-genomic profiles Bioinformatics (IF 5.8) Pub Date : 2023-12-13 William DeGroat, Dinesh Mendhe, Atharva Bhusari, Habiba Abdelhalim, Saman Zeeshan, Zeeshan Ahmed
In this article, we present IntelliGenes, a novel machine learning (ML) pipeline for the multi-genomics exploration to discover biomarkers significant in disease prediction with high accuracy. IntelliGenes is based on a novel approach, which consists of nexus of conventional statistical techniques and cutting-edge ML algorithms using multi-genomic, clinical, and demographic data. IntelliGenes introduces
-
Rarity: Discovering rare cell populations from single-cell imaging data Bioinformatics (IF 5.8) Pub Date : 2023-12-12 Kaspar Märtens, Michele Bortolomeazzi, Lucia Montorsi, Jo Spencer, Francesca Ciccarelli, Christopher Yau
Motivation Cell type identification plays an important role in the analysis and interpretation of single-cell data and can be carried out via supervised or unsupervised clustering approaches. Supervised methods are best suited where we can list all cell types and their respective marker genes a priori. While unsupervised clustering algorithms look for groups of cells with similar expression properties
-
GDmicro: classifying host disease status with GCN and Deep adaptation network based on the human gut microbiome data Bioinformatics (IF 5.8) Pub Date : 2023-12-12 Herui Liao, Jiayu Shang, Yanni Sun
Motivation With advances in metagenomic sequencing technologies, there are accumulating studies revealing the associations between the human gut microbiome and some human diseases. These associations shed light on using gut microbiome data to distinguish case and control samples of a specific disease, which is also called host disease status classification. Importantly, using learning-based models
-
PDBImages: A Command Line Tool for Automated Macromolecular Structure Visualization Bioinformatics (IF 5.8) Pub Date : 2023-12-12 Adam Midlik, Sreenath Nair, Stephen Anyango, Mandar Deshpande, David Sehnal, Mihaly Varadi, Sameer Velankar
Summary PDBImages is an innovative, open-source Node.js package that harnesses the power of the popular macromolecule structure visualization software Mol*. Designed for use by the scientific community, PDBImages provides a means to generate high-quality images for PDB and AlphaFold DB models. Its unique ability to render and save images directly to files in a browserless mode sets it apart, offering
-
OBMeta: a comprehensive web server to analyze and validate gut microbial features and biomarkers for obesity-associated metabolic diseases Bioinformatics (IF 5.8) Pub Date : 2023-12-09 Cuifang Xu, Jiating Huang, Yongqiang Gao, Weixing Zhao, Yiqi Shen, Feihong Luo, Gang Yu, Feng Zhu, Yan Ni
Motivation Gut dysbiosis is closely associated with obesity and related metabolic diseases including type 2 diabetes (T2D) and non-alcoholic fatty liver disease (NAFLD). The gut microbial features and biomarkers have been increasingly investigated in many studies, which require further validation due to the limited sample size and various confounding factors that may affect microbial compositions in
-
CytoCopasi: A Chemical Systems Biology Target and Drug Discovery Visual Data Analytics Platform Bioinformatics (IF 5.8) Pub Date : 2023-12-09 Hikmet Emre Kaya, Kevin J Naidoo
Motivation Target discovery and drug evaluation for diseases with complex mechanisms call for a streamlined chemical systems analysis platform. Currently available tools lack the emphasis on reaction kinetics, access to relevant databases, and algorithms to visualize perturbations on a chemical scale providing quantitative details as well streamlined visual data analytics functionality. Results CytoCopasi
-
MaxCLK: discovery of cancer driver genes via maximal clique and information entropy of modules Bioinformatics (IF 5.8) Pub Date : 2023-12-09 Jian Liu, Fubin Ma, Yongdi Zhu, Naiqian Zhang, Lingming Kong, Jia Mi, Haiyan Cong, Rui Gao, Mingyi Wang, Yusen Zhang
Motivation Cancer is caused by the accumulation of somatic mutations in multiple pathways, in which driver mutations are typically of the properties of high coverage and high exclusivity in patients. Identifying cancer driver genes has a pivotal role in understanding the mechanisms of oncogenesis and treatment. Results Here, we introduced MaxCLK, an algorithm for identifying cancer driver genes, which
-
EPIC-TRACE: predicting TCR binding to unseen epitopes using attention and contextualized embeddings Bioinformatics (IF 5.8) Pub Date : 2023-12-09 Dani Korpela, Emmi Jokinen, Alexandru Dumitrescu, Jani Huuhtanen, Satu Mustjoki, Harri Lähdesmäki
Motivation T cells play an essential role in adaptive immune system to fight pathogens and cancer but may also give rise to autoimmune diseases. The recognition of a peptide-MHC (pMHC) complex by a T cell receptor (TCR) is required to elicit an immune response. Many machine learning models have been developed to predict the binding, but generalizing predictions to pMHCs outside the training data remains
-
MMCL-CDR: Enhancing Cancer Drug Response Prediction with Multi-Omics and Morphology Images Contrastive Representation Learning Bioinformatics (IF 5.8) Pub Date : 2023-12-09 Yang Li, Zihou Guo, Xin Gao, Guohua Wang
Motivation Cancer is a complex disease that results in a significant number of global fatalities. Treatment strategies can vary among patients, even if they have the same type of cancer. The application of precision medicine in cancer shows promise for treating different types of cancer, reducing healthcare expenses, and improving recovery rates. To achieve personalized cancer treatment, machine learning
-
Drug repositioning with adaptive graph convolutional networks Bioinformatics (IF 5.8) Pub Date : 2023-12-08 Xinliang Sun, Xiao Jia, Zhangli Lu, Jing Tang, Min Li
Motivation Drug repositioning is an effective strategy to identify new indications for existing drugs, providing the quickest possible transition from bench to bedside. With the rapid development of deep learning, graph convolutional networks (GCNs) have been widely adopted for drug repositioning tasks. However, prior GCNs based methods exist limitations in deeply integrating node features and topological
-
Bibliometric analysis of neuroscience publications quantifies the impact of data sharing Bioinformatics (IF 5.8) Pub Date : 2023-12-08 Herve Emissah, Bengt Ljungquist, Giorgio A Ascoli
Summary Neural morphology, the branching geometry of brain cells, is an essential cellular substrate of nervous system function and pathology. Despite the accelerating production of digital reconstructions of neural morphology, the public accessibility of data remains a core issue in neuroscience. Deficiencies in the availability of existing data create redundancy of research efforts and limit synergy
-
CellWalker: A user-friendly and modular computational pipeline for morphological analysis of microscopy images Bioinformatics (IF 5.8) Pub Date : 2023-12-07 Harshavardhan Khare, Nathaly Dongo Mendoza, Chiara Zurzolo
Summary The implementation of computational tools for analysis of microscopy images has been one of the most important technological innovations in biology, providing researchers unmatched capabilities to comprehend cell shape and connectivity. While numerous tools exist for image annotation and segmentation, there is a noticeable gap when it comes to morphometric analysis of microscopy images. Most
-
LMdist: Local Manifold distance accurately measures beta diversity in ecological gradients Bioinformatics (IF 5.8) Pub Date : 2023-12-07 Susan L Hoops, Dan Knights
Motivation Differentiating ecosystems poses a complex, high-dimensional problem constrained by capturing relevant variation across species profiles. Researchers use pairwise distances and subsequent dimensionality reduction to highlight variation in a few dimensions. Despite popularity in analysis of ecological data, these low-dimensional visualizations can contain geometric abnormalities such as “arch”
-
Local Disordered Region Sampling (LDRS) for Ensemble Modeling of Proteins with Experimentally Undetermined or Low Confidence Prediction Segments Bioinformatics (IF 5.8) Pub Date : 2023-12-07 Zi Hao Liu, João M C Teixeira, Oufan Zhang, Thomas E Tsangaris, Jie Li, Claudiu C Gradinaru, Teresa Head-Gordon, Julie D Forman-Kay
Summary The Local Disordered Region Sampling (LDRS, pronounced loaders) tool is a new module developed for IDPConformerGenerator (Teixeira et al. 2022), a previously validated approach to model intrinsically disordered proteins (IDPs). The IDPConformerGenerator LDRS module provides a method for generating all-atom conformations of intrinsically disordered regions (IDRs) at N- and C-termini of and in
-
TADA: Taxonomy-Aware Dataset Aggregator Bioinformatics (IF 5.8) Pub Date : 2023-12-07 Emil Hägglund, Siv G E Andersson, Lionel Guy
The profusion of sequenced genomes across the bacterial and archeal domains offers unprecedented possibilities for phylogenetic and comparative genomic analyses. In general, phylogenetic reconstruction is improved by the use of more data. However, including all available data is (i) not computationally tractable, and (ii) prone to biases, as the abundance of genomes is very unequally distributed over
-
FunctanSNP: an R package for functional analysis of dense SNP data (with interactions) Bioinformatics (IF 5.8) Pub Date : 2023-12-07 Rui Ren, Kuangnan Fang, Qingzhao Zhang, Shuangge Ma
Summary Densely measured SNP data is routinely analyzed but faces challenges due to its high dimensionality, especially when gene-environment (G-E) interactions are incorporated. In recent literature, a functional analysis strategy has been developed, which treats dense SNP measurements as a realization of a genetic function and can “bypass” the dimensionality challenge. However, there is a lack of