-
De Novo Prediction of Drug–Target Interactions Using Laplacian Regularized Schatten p-Norm Minimization J. Comput. Biol. (IF 1.054) Pub Date : 2021-01-21 Gaoyan Wu; Mengyun Yang; Yaohang Li; Jianxin Wang
In pharmaceutical sciences, a crucial step of the drug discovery is the identification of drug–target interactions (DTIs). However, only a small portion of the DTIs have been experimentally validated. Moreover, it is an extremely laborious, expensive, and time-consuming procedure to capture new interactions between drugs and targets through traditional biochemical experiments. Therefore, designing
-
Prediction of Virus–Receptor Interactions Based on Improving Similarities J. Comput. Biol. (IF 1.054) Pub Date : 2021-01-21 Lingzhi Zhu; Cheng Yan; Guihua Duan
Viral infectious diseases have been seriously threatening human health. The receptor binding is the first step of viral infection. Predicting virus–receptor interactions will be helpful for the interaction mechanism of viruses and receptors, and further find some effective ways of preventing and treating viral infectious diseases so as to reduce the morbidity and mortality caused by viruses. Some computation
-
Supervised Adversarial Alignment of Single-Cell RNA-seq Data J. Comput. Biol. (IF 1.054) Pub Date : 2021-01-19 Songwei Ge; Haohan Wang; Amir Alavi; Eric Xing; Ziv Bar-joseph
Dimensionality reduction is an important first step in the analysis of single-cell RNA-sequencing (scRNA-seq) data. In addition to enabling the visualization of the profiled cells, such representations are used by many downstream analyses methods ranging from pseudo-time reconstruction to clustering to alignment of scRNA-seq data from different experiments, platforms, and laboratories. Both supervised
-
A Novel Multi-Ensemble Method for Identifying Essential Proteins J. Comput. Biol. (IF 1.054) Pub Date : 2021-01-13 Wei Dai; Bingxi Chen; Wei Peng; Xia Li; Jiancheng Zhong; Jianxin Wang
Essential proteins possess critical functions for cell survival. Identifying essential proteins improves our understanding of how a cell works and also plays a vital role in the research fields of disease treatment and drug development. Recently, some machine-learning methods and ensemble learning methods have been proposed to identify essential proteins by introducing effective protein features. However
-
NetMix: A Network-Structured Mixture Model for Reduced-Bias Estimation of Altered Subnetworks J. Comput. Biol. (IF 1.054) Pub Date : 2021-01-05 Matthew A. Reyna; Uthsav Chitra; Rebecca Elyanow; Benjamin J. Raphael
A classic problem in computational biology is the identification of altered subnetworks: subnetworks of an interaction network that contain genes/proteins that are differentially expressed, highly mutated, or otherwise aberrant compared with other genes/proteins. Numerous methods have been developed to solve this problem under various assumptions, but the statistical properties of these methods are
-
Predicting Protein Functions Based on Differential Co-expression and Neighborhood Analysis. J. Comput. Biol. (IF 1.054) Pub Date : 2021-01-06 Jael Sanyanda Wekesa,Yushi Luan,Jun Meng
Proteins are polypeptides essential in biological processes. Protein physical interactions are complemented by other types of functional relationship data including genetic interactions, knowledge about co-expression, and evolutionary pathways. Existing algorithms integrate protein interaction and gene expression data to retrieve context-specific subnetworks composed of genes/proteins with known and
-
Reconstructing Genotypes in Private Genomic Databases from Genetic Risk Scores J. Comput. Biol. (IF 1.054) Pub Date : 2021-01-05 Brooks Paige; James Bell; Aurélien Bellet; Adrià Gascón; Daphne Ezer
Some organizations such as 23andMe and the UK Biobank have large genomic databases that they re-use for multiple different genome-wide association studies. Even research studies that compile smaller genomic databases often utilize these databases to investigate many related traits. It is common for the study to report a genetic risk score (GRS) model for each trait within the publication. Here, we
-
Mathematical Analyses on the Effects of Control Measures for a Waterborne Disease Model with Socioeconomic Conditions. J. Comput. Biol. (IF 1.054) Pub Date : 2021-01-06 Obiora Cornelius Collins,Kevin Jan Duffy
Waterborne diseases are present major health problems to humanity especially in rural communities where many individuals belong to the lower socioeconomic classes (SECs). The impacts of introducing waterborne disease control measures for such communities are investigated by considering a waterborne disease model. The model is extended by introducing treatment of infected individuals and water purification
-
Inferring MicroRNA-Disease Associations Based on the Identification of a Functional Module. J. Comput. Biol. (IF 1.054) Pub Date : 2021-01-06 Buwen Cao,Shuguang Deng,Hua Qin,Jiawei Luo,Guanghui Li,Cheng Liang
Inferring potential associations between microRNAs (miRNAs) and human diseases can help people understand the pathogenesis of complex human diseases. Several computational approaches have been presented to discover novel miRNA-disease associations based on a top-ranked association model. However, some top-ranked miRNAs are not easily used to reveal the association between miRNAs and diseases. This
-
Identification of Selective Inhibitors of LdDHFR Enzyme Using Pharmacoinformatic Methods J. Comput. Biol. (IF 1.054) Pub Date : 2021-01-06 Vishnu Kumar Sharma; Prasad V. Bharatam
Dihydrofolate reductase (DHFR) is a well-known enzyme of the folate metabolic pathway and it is a validated drug target for leishmaniasis. However, only a few leads are reported against Leishmania donovani DHFR (LdDHFR), and thus, there is a need to identify new inhibitors. In this article, pharmacoinformatic tools such as molecular docking, virtual screening, absorption, distribution, metabolism,
-
Identification of Differentially Expressed Genes Associated with Idiopathic Pulmonary Arterial Hypertension by Integrated Bioinformatics Approaches. J. Comput. Biol. (IF 1.054) Pub Date : 2021-01-06 Enfa Zhao,Hang Xie,Yushun Zhang
Idiopathic pulmonary arterial hypertension (IPAH) is a fatal cardiovascular disease event with significant morbidity and mortality. However, its potential molecular mechanisms and potential key genes have not been totally evaluated. The gene expression profile of GSE33463, including 30 individuals diagnosed with IPAH and 41 normal controls, was downloaded from Gene Expression Omnibus database. The
-
Identification of Potential Hub Genes of Atherosclerosis Through Bioinformatic Analysis J. Comput. Biol. (IF 1.054) Pub Date : 2021-01-06 Yang Yin; Yang-fan Zou; Yu Xiao; Tian-xi Wang; Ya-ni Wang; Zhi-cheng Dong; Yu-hu Huo; Bo-chen Yao; Ling-bing Meng; Shuang-xia Du
Cardiovascular and cerebrovascular diseases, which mainly consist of atherosclerosis (AS), are major causes of death. A great deal of research has been carried out to clarify the molecular mechanisms of AS. However, the etiology of AS remains poorly understood. To screen the potential genes of AS occurrence and development, GSE43292 and GSE57691 were obtained from the Gene Expression Omnibus (GEO)
-
Screening of Clinical Factors Related to Prognosis of Breast Cancer Based on the Cox Proportional Risk Model. J. Comput. Biol. (IF 1.054) Pub Date : 2021-01-06 Wan Tang,Degong Mu,Ling Han,Xianmin Guo,Bing Han,Dong Song
Proper evaluation of the relevant clinical factors for the prognosis of breast cancer is particularly important in the selection of appropriate therapeutic strategies. To further screen and identify the clinically significant factors associated with breast cancer, the Cox risk regression model analysis was performed in this study. The follow-up data of intact breast cancer patients were downloaded
-
Exploration of DNA Methylation-Driven Genes in Papillary Thyroid Carcinoma Based on the Cancer Genome Atlas. J. Comput. Biol. (IF 1.054) Pub Date : 2021-01-06 Yanwei Chen,Keke Wang,Mengyuan Shang,Shuangshuang Zhao,Zheng Zhang,Haizhen Yang,Zheming Chen,Rui Du,Qilong Wang,Baoding Chen
Although the incidence of thyroid carcinoma is reported to be the highest among malignancies of endocrine system, its diagnosis is still unsatisfactory. This study sought to explore the key DNA methylation-driven genes in the development of papillary thyroid carcinoma (PTC) via a bioinformatic analysis based on the Cancer Genome Atlas (TCGA) database and was validated using the Gene Expression Omnibus
-
Toward an Information Theory of Quantitative Genetics J. Comput. Biol. (IF 1.054) Pub Date : 2020-12-31 David J. Galas; James Kunert-graf; Lisa Uechi; Nikita A. Sakhanenko
Quantitative genetics has evolved dramatically in the past century, and the proliferation of genetic data, in quantity as well as type, enables the characterization of complex interactions and mechanisms beyond the scope of its theoretical foundations. In this article, we argue that revisiting the framework for analysis is important and we begin to lay the foundations of an alternative formulation
-
Computing the Rearrangement Distance of Natural Genomes J. Comput. Biol. (IF 1.054) Pub Date : 2020-12-30 Leonard Bohnenkämper; Marília D.V. Braga; Daniel Doerr; Jens Stoye
The computation of genomic distances has been a very active field of computational comparative genomics over the past 25 years. Substantial results include the polynomial-time computability of the inversion distance by Hannenhalli and Pevzner in 1995 and the introduction of the double cut and join distance by Yancopoulos et al. in 2005. Both results, however, rely on the assumption that the genomes
-
Energy Consumption and Entropy Production in a Stochastic Formulation of BCM Learning J. Comput. Biol. (IF 1.054) Pub Date : 2020-12-23 Gastone Castellani; Leon N. Cooper; Luciana Renata De Oliveira; Brian S. Blais
In a series of previous studies, we provided a stochastic description of a theory of synaptic plasticity. This theory, called BCM from the names of the three authors, has been formulated in two ways: the original formulation, where the plasticity threshold is defined as the square of the time-averaged neuronal activity, and a newer formulation, where the plasticity threshold is defined as the time
-
Metric Labeling and Semimetric Embedding for Protein Annotation Prediction J. Comput. Biol. (IF 1.054) Pub Date : 2020-12-23 Emre Sefer; Carl Kingsford
Computational techniques have been successful at predicting protein function from relational data (functional or physical interactions). These techniques have been used to generate hypotheses and to direct experimental validation. With few exceptions, the task is modeled as multilabel classification problems where the labels (functions) are treated independently or semi-independently. However, databases
-
Polynomial-Time Statistical Estimation of Species Trees Under Gene Duplication and Loss J. Comput. Biol. (IF 1.054) Pub Date : 2020-12-15 Brandon Legried; Erin K. Molloy; Tandy Warnow; Sébastien Roch
Phylogenomics—the estimation of species trees from multilocus data sets—is a common step in many biological studies. However, this estimation is challenged by the fact that genes can evolve under processes, including incomplete lineage sorting (ILS) and gene duplication and loss (GDL), that make their trees different from the species tree. In this article, we address the challenge of estimating the
-
A Tool for Detecting Complementary Single Nucleotide Polymorphism Pairs in Genome-Wide Association Studies for Epistasis Testing J. Comput. Biol. (IF 1.054) Pub Date : 2020-12-15 Gizem Caylak; Oznur Tastan; A. Ercument Cicek
Detecting interacting loci pairs has been instrumental to understand disease etiology when single locus associations do not fully account for the underlying heritability. However, the number of loci to test is prohibitively large. Epistasis test prioritization algorithms rank likely epistatic single nucleotide polymorphism (SNP) pairs to limit the number of statistical tests. Potpourri detects epistatic
-
Lower Density Selection Schemes via Small Universal Hitting Sets with Short Remaining Path Length J. Comput. Biol. (IF 1.054) Pub Date : 2020-12-15 Hongyu Zheng; Carl Kingsford; Guillaume Marçais
Universal hitting sets (UHS) are sets of words that are unavoidable: every long enough sequence is hit by the set (i.e., it contains a word from the set). There is a tight relationship between UHS and minimizer schemes, where minimizer schemes with low density (i.e., efficient schemes) correspond to UHS of small size. Local schemes are a generalization of minimizer schemes that can be used as replacement
-
Microarray Analysis for Differentially Expressed Genes Between Stromal and Epithelial Cells in Development and Metastasis of Invasive Breast Cancer. J. Comput. Biol. (IF 1.054) Pub Date : 2020-12-04 Rong Wang,Lei Fu,Jinbin Li,Di Zhao,Yulan Zhao,Ling Yin
Both epithelium and stroma are involved in breast cancer invasion and metastasis. This study aimed at identifying the roles of the stroma in breast cancer tumorigenesis and metastasis. Gene expression profiling GSE10797 was downloaded from the Gene Expression Omnibus database, and it included 28-paired stroma and epithelium breast tissue samples from invasive breast cancer patients and 10 paired normal
-
A Comprehensive Repertoire of Transfer RNA-Derived Fragments and Their Regulatory Networks in Colorectal Cancer. J. Comput. Biol. (IF 1.054) Pub Date : 2020-12-04 Xiaojie Wang,Yiyi Zhang,Waleed M Ghareeb,Shuangming Lin,Xingrong Lu,Ying Huang,Shenghui Huang,Zongbin Xu,Pan Chi
To provide systematic insight into the composition and expression of transfer RNA (tRNA) derivatives transcriptome in colorectal cancer (CRC). tRNA derivatives expression profiles in three pairs of CRC and adjacent normal colon tissues were performed by tRNA-derived small RNA fragments (tRFs) and tRNA halves (tiRNA) sequencing, and microarray data of transcriptomes from CRC and paired controls were
-
Comprehensive Analysis Identifying Wnt Ligands Gene Family for Biochemical Recurrence in Prostate Adenocarcinoma and Construction of a Nomogram J. Comput. Biol. (IF 1.054) Pub Date : 2020-12-04 Maolin Hu; Jiangling Xie; Zhifeng Liu; Xuan Wang; Ming Liu; Jianye Wang
There is little research to explore the relationship between Wnt ligands gene family and biochemical recurrence of prostate adenocarcinoma. The purpose of this study was to systematically evaluate the role of Wnt ligands gene family in biochemical recurrence in prostate adenocarcinoma. RNA-seq transcriptome data and clinicopathological data of 489 prostate adenocarcinoma tissues and 51 nontumor tissues
-
DNA Methylation Heterogeneity Induced by Collaborations Between Enhancers. J. Comput. Biol. (IF 1.054) Pub Date : 2020-12-04 Yusong Ye,Zhuoqin Yang,Jinzhi Lei
During mammalian embryo development, reprogramming of DNA methylation plays important roles in the erasure of parental epigenetic memory and the establishment of naive pluripotent cells. Multiple enzymes that regulate the processes of methylation and demethylation work together to shape the pattern of genome-scale DNA methylation and guide the process of cell differentiation. Recent availability of
-
Drug-Target Interaction Network Analysis of Gene-Phenotype Connectivity Maintained by Genistein J. Comput. Biol. (IF 1.054) Pub Date : 2020-12-04 Baoshan Li; Yi Jiang; Jingxin Chu; Qian Zhou
Genistein is a type of isoflavone, which has been widely described as an antitumor agent in many cancers. The present study aimed to provide information on the mechanisms of genistein's activity and thus enable a wider range of targeted therapies in hepatitis B virus (HBV)-related liver cancer. We searched the DrugBank database for direct targets of genistein, which were then analyzed through the STRING
-
A New Method Based on Coding Sequence Density to Cluster Bacteria. J. Comput. Biol. (IF 1.054) Pub Date : 2020-12-04 Nan Sun,Rui Dong,Shaojun Pei,Changchuan Yin,Stephen S-T Yau
Bacterial evolution is an important study field, biological sequences are often used to construct phylogenetic relationships. Multiple sequence alignment is very time-consuming and cannot deal with large scales of bacterial genome sequences in a reasonable time. Hence, a new mathematical method, joining density vector method, is proposed to cluster bacteria, which characterizes the features of coding
-
A Fast Algorithm for Computing the Fourier Spectrum of a Fractional Period J. Comput. Biol. (IF 1.054) Pub Date : 2020-12-07 Jiasong Wang; Changchuan Yin
Directly computing Fourier power spectra at fractional periods of real sequences can be beneficial in many digital signal processing applications. In this article, we present a fast algorithm to compute the fractional Fourier power spectra of real sequences. For a real sequence of length ofwe may deduce its congruence derivative sequence with a length of l. The discrete Fourier transform of the original
-
Representation of k-Mer Sets Using Spectrum-Preserving String Sets J. Comput. Biol. (IF 1.054) Pub Date : 2020-12-07 Amatur Rahman; Paul Medevedev
Given the popularity and elegance of k-mer-based tools, finding a space-efficient way to represent a set of k-mers is important for improving the scalability of bioinformatics analyses. One popular approach is to convert the set of k-mers into the more compact set of unitigs. We generalize this approach and formulate it as the problem of finding a smallest spectrum-preserving string set (SPSS) representation
-
Potpourri: An Epistasis Test Prioritization Algorithm via Diverse SNP Selection J. Comput. Biol. (IF 1.054) Pub Date : 2020-12-03 Gizem Caylak; Oznur Tastan; A. Ercument Cicek
Genome-wide association studies (GWAS) explain a fraction of the underlying heritability of genetic diseases. Investigating epistatic interactions between two or more loci help to close this gap. Unfortunately, the sheer number of loci combinations to process and hypotheses prohibit the process both computationally and statistically. Epistasis test prioritization algorithms rank likely epistatic single
-
A Study of Potential SARS-CoV-2 Antiviral Drugs and Preliminary Research of Their Molecular Mechanism, Based on Anti-SARS-CoV Drug Screening and Molecular Dynamics Simulation J. Comput. Biol. (IF 1.054) Pub Date : 2020-12-01 Xiaomeng Zhao; Ruixia Liu; Zhi Miao; Nan Ye; Wenyu Lu
This research was based on virtual docking screening and molecular dynamics simulation among the 30 drugs analyzed, which drug had the best inhibitory effect on 3CL protease (Mpro) hydrolase. AutoDock Vina is used for molecular docking. Through our research, the binding affinity of saquinavir and raltegravir to the protein is higher than other candidate drugs in molecular docking; they are −9.1 kcal/mol
-
Multiscale Feedback Loops in SARS-CoV-2 Viral Evolution J. Comput. Biol. (IF 1.054) Pub Date : 2020-12-01 Christopher Barrett; Andrei C. Bura; Qijun He; Fenix W. Huang; Thomas J.X. Li; Michael S. Waterman; Christian M. Reidys
COVID-19 is an infectious disease caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The viral genome is considered to be relatively stable and the mutations that have been observed and reported thus far are mainly focused on the coding region. This article provides evidence that macrolevel pandemic dynamics, such as social distancing, modulate the genomic evolution of SARS-CoV-2
-
EPTool: A New Enhancing PSSM Tool for Protein Secondary Structure Prediction J. Comput. Biol. (IF 1.054) Pub Date : 2020-12-01 Yuzhi Guo; Jiaxiang Wu; Hehuan Ma; Sheng Wang; Junzhou Huang
Recently, a deep learning-based enhancing Position-Specific Scoring Matrix (PSSM) method (Bagging Multiple Sequence Alignment [MSA] Learning) Guo et al. has been proposed, and its effectiveness has been empirically proved. Program EPTool is the implementation of Bagging MSA Learning, which provides a complete training and evaluation workflow for the enhancing PSSM model. It is capable of handling different
-
Deep Learning of Sequence Patterns for CCCTC-Binding Factor-Mediated Chromatin Loop Formation J. Comput. Biol. (IF 1.054) Pub Date : 2020-11-25 Shuzhen Kuang; Liangjiang Wang
The three-dimensional (3D) organization of the human genome is of crucial importance for gene regulation, and the CCCTC-binding factor (CTCF) plays an important role in chromatin interactions. However, it is still unclear what sequence patterns in addition to CTCF motif pairs determine chromatin loop formation. To discover the underlying sequence patterns, we have developed a deep learning model, called
-
Human MicroRNA Target Prediction via Multi-Hypotheses Learning J. Comput. Biol. (IF 1.054) Pub Date : 2020-11-25 Mohammad Mohebbi; Liang Ding; Russell L. Malmberg; Liming Cai
MicroRNAs are involved in many critical cellular activities through binding to their mRNA targets, for example, in cell proliferation, differentiation, death, growth control, and developmental timing. Prediction of microRNA targets can assist in efficient experimental investigations on the functional roles of these small noncoding RNAs. Their accurate prediction, however, remains a challenge due to
-
The Agility of a Neuron: Phase Shift Between Sinusoidal Current Input and Firing Rate Curve J. Comput. Biol. (IF 1.054) Pub Date : 2020-11-17 Chu-Yu Cheng; Chung-Chin Lu
The response of a neuron when receiving a periodic input current signal is a periodic spike firing rate signal. The frequency of an input sinusoidal current and the surrounding environment such as background noises are two important factors that affect the firing rate output signal of a neuron model. This study focuses on the phase shift between input and output signals, and here we present a new concept:
-
Optimized Fluorescence-Based Detection in Single Molecule Synthesis Process J. Comput. Biol. (IF 1.054) Pub Date : 2020-11-17 Hsin-Hao Chen; Chung-Chin Lu
Single molecule sequencing is imperative to overall genetic analysis in areas such as genomics, transcriptomics, clinical test, drug development, and cancer screening. In addition, fluorescence-based sequencing is primarily applied in single molecule sequencing besides other methods, precisely in the fields of DNA sequencing. Modern-day fluorescence labeling methods exploit a charge-coupled device
-
GeneDMRs: An R Package for Gene-Based Differentially Methylated Regions Analysis J. Comput. Biol. (IF 1.054) Pub Date : 2020-11-13 Xiao Wang; Dan Hao; Haja N. Kadarmideen
DNA methylation in gene or gene body could influence gene transcription. Moreover, methylation in gene regions along with CpG island regions could modulate the transcription to undetectable gene expression levels. Therefore, it is necessary to investigate the methylation levels within the gene, gene body, CpG island regions, and their overlapped regions and then identify the gene-based differentially
-
Integrated Analysis of an lncRNA-Associated ceRNA Network Reveals Potential Biomarkers for Hepatocellular Carcinoma J. Comput. Biol. (IF 1.054) Pub Date : 2020-11-04 Jie Yang; Qing-chun Xu; Zhen-yu Wang; Xun Lu; Liu-kui Pan; Jun Wu; Chen Wang
Hepatocellular carcinoma (HCC) is a common malignant tumor worldwide. In this study, we aimed to explore the potential biomarkers and key regulatory pathways related to HCC using integrated bioinformatic analysis and validation. The microarray data of GSE12717 and GSE54238 were downloaded from the Gene Expression Omnibus database. A competing endogenous RNA (ceRNA) network was constructed based on
-
A New Paradigm for Identifying Reconciliation-Scenario Altering Mutations Conferring Environmental Adaptation J. Comput. Biol. (IF 1.054) Pub Date : 2020-11-05 Roni Zoller; Meirav Zehavi; Michal Ziv-Ukelson
An important goal in microbial computational genomics is to identify crucial events in the evolution of a gene that severely alter the duplication, loss, and mobilization patterns of the gene within the genomes in which it disseminates. In this article, we formalize this microbiological goal as a new pattern-matching problem in the domain of gene tree and species tree reconciliation, denoted “Reconciliation-Scenario
-
Conditional Random Fields with Least Absolute Shrinkage and Selection Operator to Classifying the Barley Genes Based on Expression Level Affected by the Fungal Infection J. Comput. Biol. (IF 1.054) Pub Date : 2020-11-05 Xiyuan Liu; Di Gao; Gang Shen
The classical methods for the classification problem include hypothesis test with the Benjamini–Hochberg method, hidden Markov chain model, and support vector machine. One major application of the classification problem is gene expression analysis, for example, detecting the host genes having interaction with pathogen. The classical methods can be applied and have a good performance when the number
-
Bioinformatic Analysis Suggests That Three Hub Genes May Be a Vital Prognostic Biomarker in Pancreatic Ductal Adenocarcinoma J. Comput. Biol. (IF 1.054) Pub Date : 2020-11-05 Xin Chang; Mei-Feng Yang; Wei Fan; Li-Sheng Wang; Jun Yao; Zhao-Shen Li; De-Feng Li
Pancreatic ductal adenocarcinoma (PDAC) is one of the most lethal malignancies worldwide due to its ineffective diagnosis and poor prognosis. It is essential to identify differentially expressed genes (DEGs) in PDAC to gain new insights into its underlying molecular mechanisms, as well as identify potential diagnostic and therapeutic targets. We screened 135 DEGs from the GSE15417, GSE16515, and GSE28735
-
Gene Prioritization in Parkinson's Disease Using Human Protein-Protein Interaction Network. J. Comput. Biol. (IF 1.054) Pub Date : 2020-11-05 Rutvi Prajapati,Isaac Arnold Emerson
Parkinson's disease (PD) is the second-most common neurodegenerative disorder, and the actual cause of this disease is still unknown. Identifying the target genes that are associated with disease plays an essential role in the treatment of PD. Various genetic studies have determined the significant target genes for disease progression, although this continues to be challenging in the field of drug
-
Conserved High Free Energy Sites in Human Coronavirus Spike Glycoprotein Backbones. J. Comput. Biol. (IF 1.054) Pub Date : 2020-11-05 Robert C Penner
Methods previously developed by the author are applied to uncover several sites of interest in the spike glycoproteins of all known human coronaviruses (hCoVs), including SARS-CoV-2 that causes COVID-19. The sites comprise three-dimensional neighborhoods of peptides characterized by four key properties: (1) they pinpoint regions of high free energy in the backbone whose obstruction might interrupt
-
Rank-Similarity Measures for Comparing Gene Prioritizations: A Case Study in Autism J. Comput. Biol. (IF 1.054) Pub Date : 2020-10-27 Concettina Guerra; Sarang Joshi; Yinquan Lu; Francesco Palini; Umberto Ferraro Petrillo; Jarek Rossignac
We discuss the challenge of comparing three gene prioritization methods: network propagation, integer linear programming rank aggregation (RA), and statistical RA. These methods are based on different biological categories and estimate disease–gene association. Previously proposed comparison schemes are based on three measures of performance: receiver operating curve, area under the curve, and median
-
Genome Rearrangement Distance with Reversals, Transpositions, and Indels J. Comput. Biol. (IF 1.054) Pub Date : 2020-10-20 Alexsandro Oliveira Alexandrino; Andre Rodrigues Oliveira; Ulisses Dias; Zanoni Dias
The rearrangement distance is a well-known problem in the field of comparative genomics. Given two genomes, the rearrangement distance is the minimum number of rearrangements in a set of allowed rearrangements (rearrangement model), which transforms one genome into the other. In rearrangement distance problems, a genome is modeled as a string, where each element represents a conserved region within
-
PopInf: An Approach for Reproducibly Visualizing and Assigning Population Affiliation in Genomic Samples of Uncertain Origin J. Comput. Biol. (IF 1.054) Pub Date : 2020-10-19 Angela M. Taravella Oill; Anagha J. Deshpande; Heini M. Natri; Melissa A. Wilson
Germline genetic variation contributes to cancer etiology, but self-reported race is not always consistent with genetic ancestry, and samples may not have identifying ancestry information. In this study, we describe a flexible computational pipeline, PopInf, to visualize principal component analysis output and assign ancestry to samples with unknown genetic ancestry, given a reference population panel
-
Diagnosis of Autism Spectrum Disorder Based on Functional Brain Networks with Deep Learning J. Comput. Biol. (IF 1.054) Pub Date : 2020-10-19 Wutao Yin; Sakib Mostafa; Fang-xiang Wu
Autism spectrum disorder (ASD) is a neurological and developmental disorder. Traditional diagnosis of ASD is typically performed through the observation of behaviors and interview of a patient. However, these diagnosis methods are time-consuming and can be misleading sometimes. Integrating machine learning algorithms with neuroimages, a diagnosis method, can possibly be established to detect ASD subjects
-
Backbone Free Energy Estimator Applied to Viral Glycoproteins J. Comput. Biol. (IF 1.054) Pub Date : 2020-10-13 Robert C. Penner
Earlier analysis of the Protein Data Bank derived the distribution of rotations from the plane of a protein hydrogen bond donor peptide group to the plane of its acceptor peptide group. The quasi Boltzmann formalism of Pohl–Finkelstein is employed to estimate free energies of protein elements with these hydrogen bonds, pinpointing residues with a high propensity for conformational change. This is applied
-
Coexpression of PBX1 and EMP2 as Prognostic Biomarkers in Estrogen Receptor-Negative Breast Cancer via Data Mining J. Comput. Biol. (IF 1.054) Pub Date : 2020-10-13 Yier Qiu; Guowen Lu; Yingjie Wu
Previous studies revealed that PBX1 ranked the third in the differentially expressed genes about development and progression of breast cancer (BC). Nevertheless, the role of PBX1 contributing to progression of BC has been unevaluated. Here, on the basis of ONCOMINE and GOBO databases, we compared BC samples with normal controls about the expression of PBX1 in various types of cancers, as well as their
-
Identification of Biomarkers Related to Systemic Sclerosis With or Without Pulmonary Hypertension Using Co-expression Analysis J. Comput. Biol. (IF 1.054) Pub Date : 2020-10-13 Yiyang Tang; Lihuang Zha; Xiaofang Zeng; Zaixin Yu
Systemic sclerosis (SSc), also known as scleroderma, is an autoimmune disease with multiple system involvement, and pulmonary complications, including pulmonary hypertension (PH), are leading causes of death. This study aimed to develop early biomarkers to distinguish SSc with or without PH from normal population using bioinformatics approaches. The gene expression profile GSE22356, which contains
-
Identification Six Metabolic Genes as Potential Biomarkers for Lung Adenocarcinoma J. Comput. Biol. (IF 1.054) Pub Date : 2020-10-13 Shusen Zhang; Yuanyuan Lu; Zhongxin Liu; Xiaopeng Li; Zhihua Wang; Zhigang Cai
Metabolic genes have been reported to act as crucial roles in tumor progression. Lung adenocarcinoma (LUAD) is one of the most common cancers worldwide. This study aimed to predict the potential mechanism and novel markers of metabolic signature in LUAD. The gene expression profiles and the clinical parameters were obtained from The Cancer Genome Atlas-Lung adenocarcinoma (TCGA-LUAD) and Gene Expression
-
Target Specificity of the CRISPR-Cas9 System in Arabidopsis thaliana, Oryza sativa, and Glycine max Genomes J. Comput. Biol. (IF 1.054) Pub Date : 2020-10-13 Pan Zou; Lijin Duan; Shasha Zhang; Xue Bai; Zhenghui Liu; Fengmei Jin; Haibo Sun; Wentao Xu; Rui Chen
Clustered regularly interspaced short palindromic repeats (CRISPR), a class of immune-associated sequences in bacteria, have been developed as a powerful tool for editing eukaryotic genomes in diverse cells and organisms in recent years. The CRISPR-Cas9 system can recognize upstream 20 nucleotides (guide sequence) adjacent to the protospacer-adjacent motif site and trigger double-stranded DNA cleavage
-
Tumor Mutation Burden Computation in Two Pan-Cancer Precision Medicine Next-Generation Sequencing Panels. J. Comput. Biol. (IF 1.054) Pub Date : 2020-10-13 Yongqian Shu,Xiaohong Wu,Jia Shen,Dongdong Luo,Xiang Li,Hailong Wang,Yuanhua Tom Tang
FD-180 and FD-600 are two next-generation sequencing panels developed by First Dimension Biosciences Co. for detecting mutations in cancer tissues and providing therapeutics guidance in precision medicine applications. FD-180 includes the coding exons of about 180 genes, including all the known drug target genes and some important driver genes; whereas FD-600 includes the coding exons of 578 cancer
-
Screening and Identification of Key Biomarkers in Melanoma: Evidence from Bioinformatic Analyses. J. Comput. Biol. (IF 1.054) Pub Date : 2020-09-25 Yijun Xia,Juan Xie,Jun Zhao,Yin Lou,Dongsheng Cao
Melanoma is an extremely malignant and occult tumor. To identify candidate genes related to melanoma carcinogenesis and progression, the microarray data sets GSE83583, GSE130244, and GSE31879 were retrieved from the Gene Expression Omnibus (GEO) database using the GEO2R analytical tool provided by the National Center for Biotechnology Information (NCBI). Gene expression analysis was carried out using
-
Machine-Learning Models for Multicenter Prostate Cancer Treatment Plans. J. Comput. Biol. (IF 1.054) Pub Date : 2020-09-25 Khajamoinuddin Syed,William Sleeman,Payal Soni,Michael Hagan,Jatinder Palta,Rishabh Kapoor,Preetam Ghosh
Clinical factors, including T-stage, Gleason score, and baseline prostate-specific antigen, are used to stratify patients with prostate cancer (PCa) into risk groups. This provides prognostic information for a heterogeneous disease such as PCa and guides treatment selection. In this article, we hypothesize that nonclinical factors may also impact treatment selection and their adherence to treatment
-
R/PY-SUMMA: An R/Python Package for Unsupervised Ensemble Learning for Binary Classification Problems in Bioinformatics. J. Comput. Biol. (IF 1.054) Pub Date : 2020-09-04 Mehmet Eren Ahsen,Robert Vogel,Gustavo A Stolovitzky
The increasing availability of complex data in biology and medicine has promoted the use of machine learning in classification tasks to address important problems in translational and fundamental science. Two important obstacles, however, may limit the unraveling of the full potential of machine learning in these fields: the lack of generalization of the resulting models and the limited number of labeled
-
Identification of Potential Biomarkers for Intervertebral Disc Degeneration Using the Genome-Wide Expression Analysis. J. Comput. Biol. (IF 1.054) Pub Date : 2020-09-04 Zongjiang Fan,Wanqiu Zhao,Shengning Fan,Chunxiao Li,Jing Qiao,Yongqing Xu
Intervertebral disc degeneration (IDD) is the major cause of low back pain. The current study was aimed to further elucidate the mechanisms underlying it. Microarray data sets GSE70362 containing Thompson degeneration grades I–V were divided into the control and the degenerative group and were analyzed. Differentially expressed genes (DEGs) were screened and clustered, followed by functional enrichment
-
Variant-Kudu: An Efficient Tool kit Leveraging Distributed Bitmap Index for Analysis of Massive Genetic Variation Datasets. J. Comput. Biol. (IF 1.054) Pub Date : 2020-09-04 Jianye Fan,Shoubin Dong,Bo Wang
The storage and analysis of massive genetic variation datasets in variant call format (VCF) become a great challenge with the rapid growth of genetic variation data in recent years. Traditional single process based tool kits become increasingly inefficient when analyzing massive genetic variation data. While emerging distributed storage technology such as Apache Kudu offers attractive solution, it
-
New Approximate Statistical Significance of Gapped Alignments Based on the Greedy Extension Model. J. Comput. Biol. (IF 1.054) Pub Date : 2020-09-04 Amirhossein Karami,Afshin Fayyaz Movaghar,Sabine Mercier,Louis Ferre
Sequence alignment is a fundamental concept in bioinformatics to distinguish regions of similarity among various sequences. The degree of similarity has been considered as a score. There are a number of various methods to find the statistical significance of similarity in the gapped and ungapped cases. In this article, we improve the statistical significance accuracy of the local score by introducing
Contents have been reproduced by permission of the publishers.