-
scMLC: an accurate and robust multiplex community detection method for single-cell multi-omics data Brief. Bioinform. (IF 9.5) Pub Date : 2024-03-17 Yuxuan Chen, Ruiqing Zheng, Jin Liu, Min Li
Clustering cells based on single-cell multi-modal sequencing technologies provides an unprecedented opportunity to create high-resolution cell atlas, reveal cellular critical states and study health and diseases. However, effectively integrating different sequencing data for cell clustering remains a challenging task. Motivated by the successful application of Louvain in scRNA-seq data, we propose
-
Effective multi-modal clustering method via skip aggregation network for parallel scRNA-seq and scATAC-seq data Brief. Bioinform. (IF 9.5) Pub Date : 2024-03-17 Dayu Hu, Ke Liang, Zhibin Dong, Jun Wang, Yawei Zhao, Kunlun He
In recent years, there has been a growing trend in the realm of parallel clustering analysis for single-cell RNA-seq (scRNA) and single-cell Assay of Transposase Accessible Chromatin (scATAC) data. However, prevailing methods often treat these two data modalities as equals, neglecting the fact that the scRNA mode holds significantly richer information compared to the scATAC. This disregard hinders
-
DeTox: a pipeline for the detection of toxins in venomous organisms Brief. Bioinform. (IF 9.5) Pub Date : 2024-03-17 Allan Ringeval, Sarah Farhat, Alexander Fedosov, Marco Gerdol, Samuele Greco, Lou Mary, Maria Vittoria Modica, Nicolas Puillandre
Venomous organisms have independently evolved the ability to produce toxins 101 times during their evolutionary history, resulting in over 200 000 venomous species. Collectively, these species produce millions of toxins, making them a valuable resource for bioprospecting and understanding the evolutionary mechanisms underlying genetic diversification. RNA-seq is the preferred method for characterizing
-
Benchmarking multi-omics integration algorithms across single-cell RNA and ATAC data Brief. Bioinform. (IF 9.5) Pub Date : 2024-03-17 Chuxi Xiao, Yixin Chen, Qiuchen Meng, Lei Wei, Xuegong Zhang
Recent advancements in single-cell sequencing technologies have generated extensive omics data in various modalities and revolutionized cell research, especially in the single-cell RNA and ATAC data. The joint analysis across scRNA-seq data and scATAC-seq data has paved the way to comprehending the cellular heterogeneity and complex cellular regulatory networks. Multi-omics integration is gaining attention
-
Translational bioinformatics and data science for biomarker discovery in mental health: an analytical review Brief. Bioinform. (IF 9.5) Pub Date : 2024-03-17 Krithika Bhuvaneshwar, Yuriy Gusev
Translational bioinformatics and data science play a crucial role in biomarker discovery as it enables translational research and helps to bridge the gap between the bench research and the bedside clinical applications. Thanks to newer and faster molecular profiling technologies and reducing costs, there are many opportunities for researchers to explore the molecular and physiological mechanisms of
-
Incorporating network diffusion and peak location information for better single-cell ATAC-seq data analysis Brief. Bioinform. (IF 9.5) Pub Date : 2024-03-17 Jiating Yu, Jiacheng Leng, Zhichao Hou, Duanchen Sun, Ling-Yun Wu
Single-cell assay for transposase-accessible chromatin using sequencing (scATAC-seq) data provided new insights into the understanding of epigenetic heterogeneity and transcriptional regulation. With the increasing abundance of dataset resources, there is an urgent need to extract more useful information through high-quality data analysis methods specifically designed for scATAC-seq. However, analyzing
-
FusionNW, a potential clinical impact assessment of kinases in pan-cancer fusion gene network Brief. Bioinform. (IF 9.5) Pub Date : 2024-03-17 Chengyuan Yang, Himansu Kumar, Pora Kim
Kinase fusion genes are the most active fusion gene group in human cancer fusion genes. To help choose the clinically significant kinase so that the cancer patients that have fusion genes can be better diagnosed, we need a metric to infer the assessment of kinases in pan-cancer fusion genes rather than relying on the sample frequency expressed fusion genes. Most of all, multiple studies assessed human
-
scENCORE: leveraging single-cell epigenetic data to predict chromatin conformation using graph embedding Brief. Bioinform. (IF 9.5) Pub Date : 2024-03-17 Ziheng Duan, Siwei Xu, Shushrruth Sai Srinivasan, Ahyeon Hwang, Che Yu Lee, Feng Yue, Mark Gerstein, Yu Luan, Matthew Girgenti, Jing Zhang
Dynamic compartmentalization of eukaryotic DNA into active and repressed states enables diverse transcriptional programs to arise from a single genetic blueprint, whereas its dysregulation can be strongly linked to a broad spectrum of diseases. While single-cell Hi-C experiments allow for chromosome conformation profiling across many cells, they are still expensive and not widely available for most
-
The rise of taxon-specific epitope predictors Brief. Bioinform. (IF 9.5) Pub Date : 2024-03-17 Felipe Campelo, Francisco P Lobo
Computational predictors of immunogenic peptides, or epitopes, are traditionally built based on data from a broad range of pathogens without consideration for taxonomic information. While this approach may be reasonable if one aims to develop one-size-fits-all models, it may be counterproductive if the proteins for which the model is expected to generalize are known to come from a specific subset of
-
metaProbiotics: a tool for mining probiotic from metagenomic binning data based on a language model Brief. Bioinform. (IF 9.5) Pub Date : 2024-03-15 Shufang Wu, Tao Feng, Waijiao Tang, Cancan Qi, Jie Gao, Xiaolong He, Jiaxuan Wang, Hongwei Zhou, Zhencheng Fang
Beneficial bacteria remain largely unexplored. Lacking systematic methods, understanding probiotic community traits becomes challenging, leading to various conclusions about their probiotic effects among different publications. We developed language model–based metaProbiotics to rapidly detect probiotic bins from metagenomes, demonstrating superior performance in simulated benchmark datasets. Testing
-
MHCpLogics: an interactive machine learning-based tool for unsupervised data visualization and cluster analysis of immunopeptidomes Brief. Bioinform. (IF 9.5) Pub Date : 2024-03-15 Mohammad Shahbazy, Sri H Ramarathinam, Chen Li, Patricia T Illing, Pouya Faridi, Nathan P Croft, Anthony W Purcell
The major histocompatibility complex (MHC) encodes a range of immune response genes, including the human leukocyte antigens (HLAs) in humans. These molecules bind peptide antigens and present them on the cell surface for T cell recognition. The repertoires of peptides presented by HLA molecules are termed immunopeptidomes. The highly polymorphic nature of the genres that encode the HLA molecules confers
-
MRSL: a causal network pruning algorithm based on GWAS summary data Brief. Bioinform. (IF 9.5) Pub Date : 2024-03-15 Lei Hou, Zhi Geng, Zhongshang Yuan, Xu Shi, Chuan Wang, Feng Chen, Hongkai Li, Fuzhong Xue
Causal discovery is a powerful tool to disclose underlying structures by analyzing purely observational data. Genetic variants can provide useful complementary information for structure learning. Recently, Mendelian randomization (MR) studies have provided abundant marginal causal relationships of traits. Here, we propose a causal network pruning algorithm MRSL (MR-based structure learning algorithm)
-
Single-residue linear and conformational B cell epitopes prediction using random and ESM-2 based projections Brief. Bioinform. (IF 9.5) Pub Date : 2024-03-15 Sapir Israeli, Yoram Louzoun
B cell epitope prediction methods are separated into linear sequence-based predictors and conformational epitope predictions that typically use the measured or predicted protein structure. Most linear predictions rely on the translation of the sequence to biologically based representations and the applications of machine learning on these representations. We here present CALIBER ‘Conformational And
-
PANCDR: precise medicine prediction using an adversarial network for cancer drug response Brief. Bioinform. (IF 9.5) Pub Date : 2024-03-15 Juyeon Kim, Sung-Hye Park, Hyunju Lee
Pharmacogenomics aims to provide personalized therapy to patients based on their genetic variability. However, accurate prediction of cancer drug response (CDR) is challenging due to genetic heterogeneity. Since clinical data are limited, most studies predicting drug response use preclinical data to train models. However, such models might not be generalizable to external clinical data due to differences
-
High-throughput prediction of enzyme promiscuity based on substrate–product pairs Brief. Bioinform. (IF 9.5) Pub Date : 2024-03-15 Huadong Xing, Pengli Cai, Dongliang Liu, Mengying Han, Juan Liu, Yingying Le, Dachuan Zhang, Qian-Nan Hu
The screening of enzymes for catalyzing specific substrate–product pairs is often constrained in the realms of metabolic engineering and synthetic biology. Existing tools based on substrate and reaction similarity predominantly rely on prior knowledge, demonstrating limited extrapolative capabilities and an inability to incorporate custom candidate-enzyme libraries. Addressing these limitations, we
-
scGIR: deciphering cellular heterogeneity via gene ranking in single-cell weighted gene correlation networks Brief. Bioinform. (IF 9.5) Pub Date : 2024-03-15 Fei Xu, Huan Hu, Hai Lin, Jun Lu, Feng Cheng, Jiqian Zhang, Xiang Li, Jianwei Shuai
Single-cell RNA sequencing (scRNA-seq) has emerged as a powerful tool for investigating cellular heterogeneity through high-throughput analysis of individual cells. Nevertheless, challenges arise from prevalent sequencing dropout events and noise effects, impacting subsequent analyses. Here, we introduce a novel algorithm, Single-cell Gene Importance Ranking (scGIR), which utilizes a single-cell gene
-
Cracking the black box of deep sequence-based protein–protein interaction prediction Brief. Bioinform. (IF 9.5) Pub Date : 2024-03-06 Judith Bernett, David B Blumenthal, Markus List
Identifying protein–protein interactions (PPIs) is crucial for deciphering biological pathways. Numerous prediction methods have been developed as cheap alternatives to biological experiments, reporting surprisingly high accuracy estimates. We systematically investigated how much reproducible deep learning models depend on data leakage, sequence similarities and node degree information, and compared
-
Partial order relation–based gene ontology embedding improves protein function prediction Brief. Bioinform. (IF 9.5) Pub Date : 2024-03-06 Wenjing Li, Bin Wang, Jin Dai, Yan Kou, Xiaojun Chen, Yi Pan, Shuangwei Hu, Zhenjiang Zech Xu
Protein annotation has long been a challenging task in computational biology. Gene Ontology (GO) has become one of the most popular frameworks to describe protein functions and their relationships. Prediction of a protein annotation with proper GO terms demands high-quality GO term representation learning, which aims to learn a low-dimensional dense vector representation with accompanying semantic
-
A novel approach to study multi-domain motions in JAK1’s activation mechanism based on energy landscape Brief. Bioinform. (IF 9.5) Pub Date : 2024-03-06 Shengjie Sun, Georgialina Rodriguez, Gaoshu Zhao, Jason E Sanchez, Wenhan Guo, Dan Du, Omar J Rodriguez Moncivais, Dehua Hu, Jing Liu, Robert Arthur Kirken, Lin Li
The family of Janus Kinases (JAKs) associated with the JAK–signal transducers and activators of transcription signaling pathway plays a vital role in the regulation of various cellular processes. The conformational change of JAKs is the fundamental steps for activation, affecting multiple intracellular signaling pathways. However, the transitional process from inactive to active kinase is still a mystery
-
New classifications for quantum bioinformatics: Q-bioinformatics, QCt-bioinformatics, QCg-bioinformatics, and QCr-bioinformatics Brief. Bioinform. (IF 9.5) Pub Date : 2024-03-06 Majid Mokhtari, Samane Khoshbakht, Kobra Ziyaei, Mohammad Esmaeil Akbari, Sayyed Sajjad Moravveji
Bioinformatics has revolutionized biology and medicine by using computational methods to analyze and interpret biological data. Quantum mechanics has recently emerged as a promising tool for the analysis of biological systems, leading to the development of quantum bioinformatics. This new field employs the principles of quantum mechanics, quantum algorithms, and quantum computing to solve complex problems
-
Prediction of protein–ligand binding affinity via deep learning models Brief. Bioinform. (IF 9.5) Pub Date : 2024-03-06 Huiwen Wang
Accurately predicting the binding affinity between proteins and ligands is crucial in drug screening and optimization, but it is still a challenge in computer-aided drug design. The recent success of AlphaFold2 in predicting protein structures has brought new hope for deep learning (DL) models to accurately predict protein–ligand binding affinity. However, the current DL models still face limitations
-
Integrating network pharmacology and in silico analysis deciphers Withaferin-A’s anti-breast cancer potential via hedgehog pathway and target network interplay Brief. Bioinform. (IF 9.5) Pub Date : 2024-03-06 Mythili Srinivasan, Apeksha Gangurde, Ashwini Y Chandane, Amol Tagalpallewar, Anil Pawar, Akshay M Baheti
This study examines the remarkable effectiveness of Withaferin-A (WA), a withanolide obtained from Withania somnifera (Ashwagandha), in encountering the mortiferous breast malignancy, a global peril. The predominant objective is to investigate WA’s intrinsic target proteins and hedgehog (Hh) pathway proteins in breast cancer targeting through the application of in silico computational techniques and
-
Enhanced polygenic risk score incorporating gene–environment interaction suggests the association of major depressive disorder with cardiac and lung function Brief. Bioinform. (IF 9.5) Pub Date : 2024-03-04 Chuyu Pan, Bolun Cheng, Xiaoyue Qin, Shiqiang Cheng, Li Liu, Xuena Yang, Peilin Meng, Na Zhang, Dan He, Qingqing Cai, Wenming Wei, Jingni Hui, Yan Wen, Yumeng Jia, Huan Liu, Feng Zhang
Background Depression has been linked to an increased risk of cardiovascular and respiratory diseases; however, its impact on cardiac and lung function remains unclear, especially when accounting for potential gene–environment interactions. Methods We developed a novel polygenic and gene–environment interaction risk score (PGIRS) integrating the major genetic effect and gene–environment interaction
-
Benchmarking enrichment analysis methods with the disease pathway network Brief. Bioinform. (IF 9.5) Pub Date : 2024-03-04 Davide Buzzao, Miguel Castresana-Aguirre, Dimitri Guala, Erik L L Sonnhammer
Enrichment analysis (EA) is a common approach to gain functional insights from genome-scale experiments. As a consequence, a large number of EA methods have been developed, yet it is unclear from previous studies which method is the best for a given dataset. The main issues with previous benchmarks include the complexity of correctly assigning true pathways to a test dataset, and lack of generality
-
scDOT: enhancing single-cell RNA-Seq data annotation and uncovering novel cell types through multi-reference integration Brief. Bioinform. (IF 9.5) Pub Date : 2024-03-04 Yi-Xuan Xiong, Xiao-Fei Zhang
The proliferation of single-cell RNA-seq data has greatly enhanced our ability to comprehend the intricate nature of diverse tissues. However, accurately annotating cell types in such data, especially when handling multiple reference datasets and identifying novel cell types, remains a significant challenge. To address these issues, we introduce Single Cell annotation based on Distance metric learning
-
RNAdvisor: a comprehensive benchmarking tool for the measure and prediction of RNA structural model quality Brief. Bioinform. (IF 9.5) Pub Date : 2024-03-04 Clement Bernard, Guillaume Postic, Sahar Ghannay, Fariza Tahi
RNA is a complex macromolecule that plays central roles in the cell. While it is well known that its structure is directly related to its functions, understanding and predicting RNA structures is challenging. Assessing the real or predictive quality of a structure is also at stake with the complex 3D possible conformations of RNAs. Metrics have been developed to measure model quality while scoring
-
Exploring miRNA–target gene pair detection in disease with coRmiT Brief. Bioinform. (IF 9.5) Pub Date : 2024-03-04 Jose Cordoba-Caballero, James R Perkins, Federico García-Criado, Diana Gallego, Alicia Navarro-Sánchez, Mireia Moreno-Estellés, Concepción Garcés, Fernando Bonet, Carlos Romá-Mateo, Rocio Toro, Belén Perez, Pascual Sanz, Matthias Kohl, Elena Rojano, Pedro Seoane, Juan A G Ranea
A wide range of approaches can be used to detect micro RNA (miRNA)–target gene pairs (mTPs) from expression data, differing in the ways the gene and miRNA expression profiles are calculated, combined and correlated. However, there is no clear consensus on which is the best approach across all datasets. Here, we have implemented multiple strategies and applied them to three distinct rare disease datasets
-
Innovative super-resolution in spatial transcriptomics: a transformer model exploiting histology images and spatial gene expression Brief. Bioinform. (IF 9.5) Pub Date : 2024-03-04 Chongyue Zhao, Zhongli Xu, Xinjun Wang, Shiyue Tao, William A MacDonald, Kun He, Amanda C Poholek, Kong Chen, Heng Huang, Wei Chen
Spatial transcriptomics technologies have shed light on the complexities of tissue structures by accurately mapping spatial microenvironments. Nonetheless, a myriad of methods, especially those utilized in platforms like Visium, often relinquish spatial details owing to intrinsic resolution limitations. In response, we introduce TransformerST, an innovative, unsupervised model anchored in the Transformer
-
CRISPRlnc: a machine learning method for lncRNA-specific single-guide RNA design of CRISPR/Cas9 system Brief. Bioinform. (IF 9.5) Pub Date : 2024-03-01 Zitian Yang, Zexin Zhang, Jing Li, Wen Chen, Changning Liu
CRISPR/Cas9 is a promising RNA-guided genome editing technology, which consists of a Cas9 nuclease and a single-guide RNA (sgRNA). So far, a number of sgRNA prediction softwares have been developed. However, they were usually designed for protein-coding genes without considering that long non-coding RNA (lncRNA) genes may have different characteristics. In this study, we first evaluated the performances
-
TG468: a text graph convolutional network for predicting clinical response to immune checkpoint inhibitor therapy Brief. Bioinform. (IF 9.5) Pub Date : 2024-02-23 Kun Wang, Jiangshan Shi, Xiaochu Tong, Ning Qu, Xiangtai Kong, Shengkun Ni, Jing Xing, Xutong Li, Mingyue Zheng
Enhancing cancer treatment efficacy remains a significant challenge in human health. Immunotherapy has witnessed considerable success in recent years as a treatment for tumors. However, due to the heterogeneity of diseases, only a fraction of patients exhibit a positive response to immune checkpoint inhibitor (ICI) therapy. Various single-gene-based biomarkers and tumor mutational burden (TMB) have
-
ADH-Enhancer: an attention-based deep hybrid framework for enhancer identification and strength prediction Brief. Bioinform. (IF 9.5) Pub Date : 2024-02-22 Faiza Mehmood, Shazia Arshad, Muhammad Shoaib
Enhancers play an important role in the process of gene expression regulation. In DNA sequence abundance or absence of enhancers and irregularities in the strength of enhancers affects gene expression process that leads to the initiation and propagation of diverse types of genetic diseases such as hemophilia, bladder cancer, diabetes and congenital disorders. Enhancer identification and strength prediction
-
Kled: an ultra-fast and sensitive structural variant detection tool for long-read sequencing data Brief. Bioinform. (IF 9.5) Pub Date : 2024-02-22 Zhendong Zhang, Tao Jiang, Gaoyang Li, Shuqi Cao, Yadong Liu, Bo Liu, Yadong Wang
Structural Variants (SVs) are a crucial type of genetic variant that can significantly impact phenotypes. Therefore, the identification of SVs is an essential part of modern genomic analysis. In this article, we present kled, an ultra-fast and sensitive SV caller for long-read sequencing data given the specially designed approach with a novel signature-merging algorithm, custom refinement strategies
-
STW-MD: a novel spatio-temporal weighting and multi-step decision tree method for considering spatial heterogeneity in brain gene expression data Brief. Bioinform. (IF 9.5) Pub Date : 2024-02-22 Shanjun Mao, Xiao Huang, Runjiu Chen, Chenyang Zhang, Yizhu Diao, Zongjin Li, Qingzhe Wang, Shan Tang, Shuixia Guo
Gene expression during brain development or abnormal development is a biological process that is highly dynamic in spatio and temporal. Previous studies have mainly focused on individual brain regions or a certain developmental stage. Our motivation is to address this gap by incorporating spatio-temporal information to gain a more complete understanding of brain development or abnormal brain development
-
ABAG-docking benchmark: a non-redundant structure benchmark dataset for antibody–antigen computational docking Brief. Bioinform. (IF 9.5) Pub Date : 2024-02-22 Nan Zhao, Bingqing Han, Cuicui Zhao, Jinbo Xu, Xinqi Gong
Accurate prediction of antibody–antigen complex structures is pivotal in drug discovery, vaccine design and disease treatment and can facilitate the development of more effective therapies and diagnostics. In this work, we first review the antibody–antigen docking (ABAG-docking) datasets. Then, we present the creation and characterization of a comprehensive benchmark dataset of antibody–antigen complexes
-
Language model enables end-to-end accurate detection of cancer from cell-free DNA Brief. Bioinform. (IF 9.5) Pub Date : 2024-02-22 Hongru Shen, Jilei Liu, Kexin Chen, Xiangchun Li
We present a language model Affordable Cancer Interception and Diagnostics (ACID) that can achieve high classification performance in the diagnosis of cancer exclusively from using raw cfDNA sequencing reads. We formulate ACID as an autoregressive language model. ACID is pretrained with language sentences that are obtained from concatenation of raw sequencing reads and diagnostic labels. We benchmark
-
Hound: a novel tool for automated mapping of genotype to phenotype in bacterial genomes assembled de novo Brief. Bioinform. (IF 9.5) Pub Date : 2024-02-22 Carlos Reding, Naphat Satapoomin, Matthew B Avison
Increasing evidence suggests that microbial species have a strong within species genetic heterogeneity. This can be problematic for the analysis of prokaryote genomes, which commonly relies on a reference genome to guide the assembly process. Differences between reference and sample genomes will therefore introduce errors in final assembly, jeopardizing the detection from structural variations to point
-
BiGATAE: a bipartite graph attention auto-encoder enhancing spatial domain identification from single-slice to multi-slices Brief. Bioinform. (IF 9.5) Pub Date : 2024-02-22 Yuhao Tao, Xiaoang Sun, Fei Wang
Recent advancements in spatial transcriptomics technology have revolutionized our ability to comprehensively characterize gene expression patterns within the tissue microenvironment, enabling us to grasp their functional significance in a spatial context. One key field of research in spatial transcriptomics is the identification of spatial domains, which refers to distinct regions within the tissue
-
ChemMORT: an automatic ADMET optimization platform using deep learning and multi-objective particle swarm optimization Brief. Bioinform. (IF 9.5) Pub Date : 2024-02-22 Jia-Cai Yi, Zi-Yi Yang, Wen-Tao Zhao, Zhi-Jiang Yang, Xiao-Chen Zhang, Cheng-Kun Wu, Ai-Ping Lu, Dong-Sheng Cao
Drug discovery and development constitute a laborious and costly undertaking. The success of a drug hinges not only good efficacy but also acceptable absorption, distribution, metabolism, elimination, and toxicity (ADMET) properties. Overall, up to 50% of drug development failures have been contributed from undesirable ADMET profiles. As a multiple parameter objective, the optimization of the ADMET
-
PB-LKS: a python package for predicting phage–bacteria interaction through local K-mer strategy Brief. Bioinform. (IF 9.5) Pub Date : 2024-02-12 Jingxuan Qiu, Wanchun Nie, Hao Ding, Jia Dai, Yiwen Wei, Dezhi Li, Yuxi Zhang, Junting Xie, Xinxin Tian, Nannan Wu, Tianyi Qiu
Bacteriophages can help the treatment of bacterial infections yet require in-silico models to deal with the great genetic diversity between phages and bacteria. Despite the tolerable prediction performance, the application scope of current approaches is limited to the prediction at the species level, which cannot accurately predict the relationship of phages across strain mutants. This has hindered
-
VirGrapher: a graph-based viral identifier for long sequences from metagenomes Brief. Bioinform. (IF 9.5) Pub Date : 2024-02-12 Yan Miao, Zhenyuan Sun, Chenjing Ma, Chen Lin, Guohua Wang, Chunxue Yang
Viruses are the most abundant biological entities on earth and are important components of microbial communities. A metagenome contains all microorganisms from an environmental sample. Correctly identifying viruses from these mixed sequences is critical in viral analyses. It is common to identify long viral sequences, which has already been passed thought pipelines of assembly and binning. Existing
-
Introducing π-HelixNovo for practical large-scale de novo peptide sequencing Brief. Bioinform. (IF 9.5) Pub Date : 2024-02-10 Tingpeng Yang, Tianze Ling, Boyan Sun, Zhendong Liang, Fan Xu, Xiansong Huang, Linhai Xie, Yonghong He, Leyuan Li, Fuchu He, Yu Wang, Cheng Chang
De novo peptide sequencing is a promising approach for novel peptide discovery, highlighting the performance improvements for the state-of-the-art models. The quality of mass spectra often varies due to unexpected missing of certain ions, presenting a significant challenge in de novo peptide sequencing. Here, we use a novel concept of complementary spectra to enhance ion information of the experimental
-
On the core segmentation algorithms of copy number variation detection tools Brief. Bioinform. (IF 9.5) Pub Date : 2024-02-10 Yibo Zhang, Wenyu Liu, Junbo Duan
Shotgun sequencing is a high-throughput method used to detect copy number variants (CNVs). Although there are numerous CNV detection tools based on shotgun sequencing, their quality varies significantly, leading to performance discrepancies. Therefore, we conducted a comprehensive analysis of next-generation sequencing-based CNV detection tools over the past decade. Our findings revealed that the majority
-
SynergyX: a multi-modality mutual attention network for interpretable drug synergy prediction Brief. Bioinform. (IF 9.5) Pub Date : 2024-02-10 Yue Guo, Haitao Hu, Wenbo Chen, Hao Yin, Jian Wu, Chang-Yu Hsieh, Qiaojun He, Ji Cao
Discovering effective anti-tumor drug combinations is crucial for advancing cancer therapy. Taking full account of intricate biological interactions is highly important in accurately predicting drug synergy. However, the extremely limited prior knowledge poses great challenges in developing current computational methods. To address this, we introduce SynergyX, a multi-modality mutual attention network
-
Network propagation for GWAS analysis: a practical guide to leveraging molecular networks for disease gene discovery Brief. Bioinform. (IF 9.5) Pub Date : 2024-02-10 Giovanni Visonà, Emmanuelle Bouzigon, Florence Demenais, Gabriele Schweikert
Motivation Genome-wide association studies (GWAS) have enabled large-scale analysis of the role of genetic variants in human disease. Despite impressive methodological advances, subsequent clinical interpretation and application remains challenging when GWAS suffer from a lack of statistical power. In recent years, however, the use of information diffusion algorithms with molecular networks has led
-
Deciphering principles of nucleosome interactions and impact of cancer-associated mutations from comprehensive interaction network analysis Brief. Bioinform. (IF 9.5) Pub Date : 2024-02-08 Wang Xu, Houfang Zhang, Wenhan Guo, Lijun Jiang, Yunjie Zhao, Yunhui Peng
Nucleosomes represent hubs in chromatin organization and gene regulation and interact with a plethora of chromatin factors through different modes. In addition, alterations in histone proteins such as cancer mutations and post-translational modifications have profound effects on histone/nucleosome interactions. To elucidate the principles of histone interactions and the effects of those alterations
-
Integrative open workflow for confident annotation and molecular networking of metabolomics MSE/DIA data Brief. Bioinform. (IF 9.5) Pub Date : 2024-02-07 Albert Katchborian-Neto, Matheus F Alves, Paula C P Bueno, Karen de Jesus Nicácio, Miller S Ferreira, Tiago B Oliveira, Henrique Barbosa, Michael Murgu, Ana C C de Paula Ladvocat, Danielle F Dias, Marisi G Soares, João H G Lago, Daniela A Chagas-Paula
Liquid chromatography coupled with high-resolution mass spectrometry data-independent acquisition (LC-HRMS/DIA), including MSE, enable comprehensive metabolomics analyses though they pose challenges for data processing with automatic annotation and molecular networking (MN) implementation. This motivated the present proposal, in which we introduce DIA-IntOpenStream, a new integrated workflow combining
-
Likelihood-based feature representation learning combined with neighborhood information for predicting circRNA–miRNA associations Brief. Bioinform. (IF 9.5) Pub Date : 2024-02-07 Lu-Xiang Guo, Lei Wang, Zhu-Hong You, Chang-Qing Yu, Meng-Lei Hu, Bo-Wei Zhao, Yang Li
Connections between circular RNAs (circRNAs) and microRNAs (miRNAs) assume a pivotal position in the onset, evolution, diagnosis and treatment of diseases and tumors. Selecting the most potential circRNA-related miRNAs and taking advantage of them as the biological markers or drug targets could be conducive to dealing with complex human diseases through preventive strategies, diagnostic procedures
-
Spatially contrastive variational autoencoder for deciphering tissue heterogeneity from spatially resolved transcriptomics Brief. Bioinform. (IF 9.5) Pub Date : 2024-02-07 Yaofeng Hu, Kai Xiao, Hengyu Yang, Xiaoping Liu, Chuanchao Zhang, Qianqian Shi
Recent advances in spatially resolved transcriptomics (SRT) have brought ever-increasing opportunities to characterize expression landscape in the context of tissue spatiality. Nevertheless, there still exist multiple challenges to accurately detect spatial functional regions in tissue. Here, we present a novel contrastive learning framework, SPAtially Contrastive variational AutoEncoder (SpaCAE),
-
scDecouple: decoupling cellular response from infected proportion bias in scCRISPR-seq Brief. Bioinform. (IF 9.5) Pub Date : 2024-02-07 Qiuchen Meng, Lei Wei, Kun Ma, Ming Shi, Xinyi Lin, Joshua W K Ho, Yinqing Li, Xuegong Zhang
Single-cell clustered regularly interspaced short palindromic repeats-sequencing (scCRISPR-seq) is an emerging high-throughput CRISPR screening technology where the true cellular response to perturbation is coupled with infected proportion bias of guide RNAs (gRNAs) across different cell clusters. The mixing of these effects introduces noise into scCRISPR-seq data analysis and thus obstacles to relevant
-
ChIP-GPT: a managed large language model for robust data extraction from biomedical database records Brief. Bioinform. (IF 9.5) Pub Date : 2024-02-05 Olivier Cinquin
Increasing volumes of biomedical data are amassing in databases. Large-scale analyses of these data have wide-ranging applications in biology and medicine. Such analyses require tools to characterize and process entries at scale. However, existing tools, mainly centered on extracting predefined fields, often fail to comprehensively process database entries or correct evident errors—a task humans can
-
Deqformer: high-definition and scalable deep learning probe design method Brief. Bioinform. (IF 9.5) Pub Date : 2024-02-02 Yantong Cai, Jia Lv, Rui Li, Xiaowen Huang, Shi Wang, Zhenmin Bao, Qifan Zeng
Target enrichment sequencing techniques are gaining widespread use in the field of genomics, prized for their economic efficiency and swift processing times. However, their success depends on the performance of probes and the evenness of sequencing depth among each probe. To accurately predict probe coverage depth, a model called Deqformer is proposed in this study. Deqformer utilizes the oligonucleotides
-
MoDAFold: a strategy for predicting the structure of missense mutant protein based on AlphaFold2 and molecular dynamics Brief. Bioinform. (IF 9.5) Pub Date : 2024-02-02 Lingyan Zheng, Shuiyang Shi, Xiuna Sun, Mingkun Lu, Yang Liao, Sisi Zhu, Hongning Zhang, Ziqi Pan, Pan Fang, Zhenyu Zeng, Honglin Li, Zhaorong Li, Weiwei Xue, Feng Zhu
Protein structure prediction is a longstanding issue crucial for identifying new drug targets and providing a mechanistic understanding of protein functions. To enhance the progress in this field, a spectrum of computational methodologies has been cultivated. AlphaFold2 has exhibited exceptional precision in predicting wild-type protein structures, with performance exceeding that of other methods.
-
Structure prediction of linear and cyclic peptides using CABS-flex Brief. Bioinform. (IF 9.5) Pub Date : 2024-02-02 Aleksandra Badaczewska-Dawid, Karol Wróblewski, Mateusz Kurcinski, Sebastian Kmiecik
The structural modeling of peptides can be a useful aid in the discovery of new drugs and a deeper understanding of the molecular mechanisms of life. Here we present a novel multiscale protocol for the structure prediction of linear and cyclic peptides. The protocol combines two main stages: coarse-grained simulations using the CABS-flex standalone package and an all-atom reconstruction-optimization
-
Challenges in distinguishing functional proteins from polyproteins in databases: implications for drug discovery Brief. Bioinform. (IF 9.5) Pub Date : 2024-02-02 Ariadna Llop-Peiró, Gerard Pujadas, Santiago Garcia-Vallvé
This opinion article addresses a major issue in molecular biology and drug discovery by highlighting the complications that arise from combining polyproteins and their functional products within the same database entry. This problem, exemplified by the discovery of novel inhibitors for the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) main protease, has an influence on our ability to
-
scMMT: a multi-use deep learning approach for cell annotation, protein prediction and embedding in single-cell RNA-seq data Brief. Bioinform. (IF 9.5) Pub Date : 2024-02-01 Songqi Zhou, Yang Li, Wenyuan Wu, Li Li
Accurate cell type annotation in single-cell RNA-sequencing data is essential for advancing biological and medical research, particularly in understanding disease progression and tumor microenvironments. However, existing methods are constrained by single feature extraction approaches, lack of adaptability to immune cell types with similar molecular profiles but distinct functions and a failure to
-
labelSeg: segment annotation for tumor copy number alteration profiles Brief. Bioinform. (IF 9.5) Pub Date : 2024-02-01 Hangjia Zhao, Michael Baudis
Somatic copy number alterations (SCNAs) are a predominant type of oncogenomic alterations that affect a large proportion of the genome in the majority of cancer samples. Current technologies allow high-throughput measurement of such copy number aberrations, generating results consisting of frequently large sets of SCNA segments. However, the automated annotation and integration of such data are particularly
-
Multi-modal features-based human-herpesvirus protein–protein interaction prediction by using LightGBM Brief. Bioinform. (IF 9.5) Pub Date : 2024-01-27 Xiaodi Yang, Stefan Wuchty, Zeyin Liang, Li Ji, Bingjie Wang, Jialin Zhu, Ziding Zhang, Yujun Dong
The identification of human-herpesvirus protein–protein interactions (PPIs) is an essential and important entry point to understand the mechanisms of viral infection, especially in malignant tumor patients with common herpesvirus infection. While natural language processing (NLP)-based embedding techniques have emerged as powerful approaches, the application of multi-modal embedding feature fusion
-
The CUT&RUN greenlist: genomic regions of consistent noise are effective normalizing factors for quantitative epigenome mapping Brief. Bioinform. (IF 9.5) Pub Date : 2024-01-27 Fabio N de Mello, Ana C Tahira, Maria Gabriela Berzoti-Coelho, Sergio Verjovski-Almeida
Cleavage Under Targets and Release Using Nuclease (CUT&RUN) is a recent development for epigenome mapping, but its unique methodology can hamper proper quantitative analyses. As traditional normalization approaches have been shown to be inaccurate, we sought to determine endogenous normalization factors based on the human genome regions of constant nonspecific signal. This constancy was determined
-
Interpretable feature extraction and dimensionality reduction in ESM2 for protein localization prediction Brief. Bioinform. (IF 9.5) Pub Date : 2024-01-27 Zeyu Luo, Rui Wang, Yawen Sun, Junhao Liu, Zongqing Chen, Yu-Juan Zhang
As the application of large language models (LLMs) has broadened into the realm of biological predictions, leveraging their capacity for self-supervised learning to create feature representations of amino acid sequences, these models have set a new benchmark in tackling downstream challenges, such as subcellular localization. However, previous studies have primarily focused on either the structural
-
The landscape of the methodology in drug repurposing using human genomic data: a systematic review Brief. Bioinform. (IF 9.5) Pub Date : 2024-01-27 Lijuan Wang, Ying Lu, Doudou Li, Yajing Zhou, Lili Yu, Ines Mesa Eguiagaray, Harry Campbell, Xue Li, Evropi Theodoratou
The process of drug development is expensive and time-consuming. In contrast, drug repurposing can be introduced to clinical practice more quickly and at a reduced cost. Over the last decade, there has been a significant expansion of large biobanks that link genomic data to electronic health record data, public availability of various databases containing biological and clinical information and rapid