-
MendelVar: gene prioritisation at GWAS loci using phenotypic enrichment of mendelian disease genes Bioinformatics (IF 5.61) Pub Date : 2021-01-16 Sobczyk M K; Gaunt T R; Paternoster L
Gene prioritisation at human GWAS loci is challenging due to linkage disequilibrium and long-range gene regulatory mechanisms. However, identifying the causal gene is crucial to enable identification of potential drug targets and better understanding of molecular mechanisms. Mapping GWAS traits to known phenotypically-relevant Mendelian disease genes near a locus is a promising approach to gene prioritisation
-
A two-step approach to testing overall effect of gene-environment interaction for multiple phenotypes Bioinformatics (IF 5.61) Pub Date : 2021-01-16 Arunabha Majumdar; Kathryn S Burch; Tanushree Haldar; Sriram Sankararaman; Bogdan Pasaniuc; W James Gauderman; John S Witte
While gene-environment (GxE) interactions contribute importantly to many different phenotypes, detecting such interactions requires well-powered studies and has proven difficult. To address this, we combine two approaches to improve GxE power: simultaneously evaluating multiple phenotypes and using a two-step analysis approach. Previous work shows that the power to identify a main genetic effect can
-
ECCB2020: the 19th European Conference on Computational Biology Bioinformatics (IF 5.61) Pub Date : 2020-12-29 Capella-Gutierrez S, Alloza E, Gallastegui E, et al.
This volume of Bioinformatics includes the proceedings papers of the 19th European Conference in Computational Biology (ECCB), an annual international conference for research in computational biology and bioinformatics.
-
Machine-OlF-Action: A unified framework for developing and interpreting machine-learning models for chemosensory research Bioinformatics (IF 5.61) Pub Date : 2021-01-08 Anku Gupta; Mohit Choudhary; Sanjay Kumar Mohanty; Aayushi Mittal; Krishan Gupta; Aditya Arya; Suvendu Kumar; Nikhil Katyayan; Nilesh Kumar Dixit; Siddhant Kalra; Manshi Goel; Megha Sahni; Vrinda Singhal; Tripti Mishra; Debarka Sengupta; Gaurav Ahuja
Machine Learning-based techniques are emerging as state-of-the-art methods in chemoinformatics to selectively, effectively, and speedily identify biologically-relevant molecules from large databases. So far, a multitude of such techniques have been proposed, but unfortunately due to their sparse availability, and the dependency on high-end computational literacy, their wider adaptation faces challenges
-
SoluProt: Prediction of Soluble Protein Expression in Escherichia coli Bioinformatics (IF 5.61) Pub Date : 2021-01-08 Jiri Hon; Martin Marusiak; Tomas Martinek; Antonin Kunka; Jaroslav Zendulka; David Bednar; Jiri Damborsky
Poor protein solubility hinders the production of many therapeutic and industrially useful proteins. Experimental efforts to increase solubility are plagued by low success rates and often reduce biological activity. Computational prediction of protein expressibility and solubility in Escherichia coli using only sequence information could reduce the cost of experimental studies by enabling prioritisation
-
Interactive gene networks with KNIT Bioinformatics (IF 5.61) Pub Date : 2021-01-08 D S Magruder; A M Liebhoff; J Bethune; S Bonn
KNIT is a web application that provides a hierarchical, directed graph on how a set of genes is connected to a particular gene of interest. Its primary aim is to aid researchers in discerning direct from indirect effects that a gene might have on the expression of other genes and molecular pathways, a very common problem in omics analysis. As such, KNIT provides deep contextual information for experiments
-
xGAP: A python based efficient, modular, extensible and fault tolerant genomic analysis pipeline for variant discovery Bioinformatics (IF 5.61) Pub Date : 2021-01-08 Aditya Gorla; Brandon Jew; Luke Zhang; Jae Hoon Sul
Since the first human genome was sequenced in 2001, there has been a rapid growth in the number of bioinformatic methods to process and analyze next generation sequencing (NGS) data for research and clinical studies that aim to identify genetic variants influencing diseases and traits. To achieve this goal, one first needs to call genetic variants from NGS data which requires multiple computationally
-
Recognition of small molecule-RNA binding sites using RNA sequence and structure Bioinformatics (IF 5.61) Pub Date : 2021-01-08 Hong Su; Zhenling Peng; Jianyi Yang
RNA molecules become attractive small-molecule drug targets to treat disease in recent years. Computer-aided drug design can be facilitated by detecting the RNA sites that bind small molecules. However, very limited progress has been reported for the prediction of small molecule-RNA binding sites.
-
The VCBS superfamily forms a third supercluster of β-propellers that includes tachylectin and integrins Bioinformatics (IF 5.61) Pub Date : 2021-01-08 Joana Pereira; Andrei N Lupas
β-Propellers are found in great variety across all kingdoms of life. They assume many cellular roles, primarily as scaffolds for macromolecular interactions and catalysis. Despite their diversity, most β-propeller families clearly originated by amplification from the same ancient peptide—the “blade”. In cluster analyses, β-propellers of the WD40 superfamily always formed the largest group, to which
-
The iPPI-DB initiative: A Community-centered database of Protein-Protein Interaction modulators Bioinformatics (IF 5.61) Pub Date : 2021-01-08 Rachel Torchet; Karen Druart; Luis Checa Ruano; Alexandra Moine-Franel; Hélène Borges; Olivia Doppelt-Azeroual; Bryan Brancotte; Fabien Mareuil; Michael Nilges; Hervé Ménager; Olivier Sperandio
One avenue to address the paucity of clinically testable targets is to reinvestigate the druggable genome by tackling complicated types of targets such as Protein-Protein Interactions (PPIs). Given the challenge to target those interfaces with small chemical compounds, it has become clear that learning from successful examples of PPI modulation is a powerful strategy. Freely-accessible databases of
-
NinimHMDA: Neural integration of neighborhood information on a multiplex heterogeneous network for multiple types of human Microbe-Disease association Bioinformatics (IF 5.61) Pub Date : 2021-01-08 Yuanjing Ma; Hongmei Jiang
Many computational methods have been recently proposed to identify differentially abundant microbes related to a single disease; however, few studies have focused on large-scale microbe-disease association prediction using existing experimentally verified associations. This area has critical meanings. For example, it can help to rank and select potential candidate microbes for different diseases at-scale
-
Accurate, scalable cohort variant calls using DeepVariant and GLnexus Bioinformatics (IF 5.61) Pub Date : 2021-01-05 Taedong Yun; Helen Li; Pi-Chuan Chang; Michael F Lin; Andrew Carroll; Cory Y McLean
Population-scale sequenced cohorts are foundational resources for genetic analyses, but processing raw reads into analysis-ready cohort-level variants remains challenging.
-
Dementia key gene identification with multi-layered SNP-gene-disease network Bioinformatics (IF 5.61) Pub Date : 2020-12-29 Dong-gi Lee; Myungjun Kim; Sang Joon Son; Chang Hyung Hong; Hyunjung Shin
Recently, various approaches for diagnosing and treating dementia have received significant attention, especially in identifying key genes that are crucial for dementia. If the mutations of such key genes could be tracked, it would be possible to predict the time of onset of dementia and significantly aid in developing drugs to treat dementia. However, gene finding involves tremendous cost, time and
-
panRGP: a pangenome-based method to predict genomic islands and explore their diversity Bioinformatics (IF 5.61) Pub Date : 2020-12-29 Adelme Bazin; Guillaume Gautreau; Claudine Médigue; David Vallenet; Alexandra Calteau
Horizontal gene transfer (HGT) is a major source of variability in prokaryotic genomes. Regions of genome plasticity (RGPs) are clusters of genes located in highly variable genomic regions. Most of them arise from HGT and correspond to genomic islands (GIs). The study of those regions at the species level has become increasingly difficult with the data deluge of genomes. To date, no methods are available
-
GRaSP: a graph-based residue neighborhood strategy to predict binding sites Bioinformatics (IF 5.61) Pub Date : 2020-12-29 Charles A Santana; Sabrina de A Silveira; João P A Moraes; Sandro C Izidoro; Raquel C de Melo-Minardi; António J M Ribeiro; Jonathan D Tyzack; Neera Borkakoti; Janet M Thornton
The discovery of protein–ligand-binding sites is a major step for elucidating protein function and for investigating new functional roles. Detecting protein–ligand-binding sites experimentally is time-consuming and expensive. Thus, a variety of in silico methods to detect and predict binding sites was proposed as they can be scalable, fast and present low cost.
-
FBA reveals guanylate kinase as a potential target for antiviral therapies against SARS-CoV-2 Bioinformatics (IF 5.61) Pub Date : 2020-12-29 Alina Renz; Lina Widerspick; Andreas Dräger
The novel coronavirus (SARS-CoV-2) currently spreads worldwide, causing the disease COVID-19. The number of infections increases daily, without any approved antiviral therapy. The recently released viral nucleotide sequence enables the identification of therapeutic targets, e.g. by analyzing integrated human-virus metabolic models. Investigations of changed metabolic processes after virus infections
-
MirCure: a tool for quality control, filter and curation of microRNAs of animals and plants Bioinformatics (IF 5.61) Pub Date : 2020-12-29 Guillem Ylla; Tianyuan Liu; Ana Conesa
microRNAs (miRNAs) are essential components of gene expression regulation at the post-transcriptional level. miRNAs have a well-defined molecular structure and this has facilitated the development of computational and high-throughput approaches to predict miRNAs genes. However, due to their short size, miRNAs have often been incorrectly annotated in both plants and animals. Consequently, published
-
Exploring chromatin conformation and gene co-expression through graph embedding Bioinformatics (IF 5.61) Pub Date : 2020-12-29 Marco Varrone; Luca Nanni; Giovanni Ciriello; Stefano Ceri
The relationship between gene co-expression and chromatin conformation is of great biological interest. Thanks to high-throughput chromosome conformation capture technologies (Hi-C), researchers are gaining insights on the tri-dimensional organization of the genome. Given the high complexity of Hi-C data and the difficult definition of gene co-expression networks, the development of proper computational
-
Feasible-metabolic-pathway-exploration technique using chemical latent space Bioinformatics (IF 5.61) Pub Date : 2020-12-29 Taiki Fuji; Shiori Nakazawa; Kiyoto Ito
Exploring metabolic pathways is one of the key techniques for developing highly productive microbes for the bioproduction of chemical compounds. To explore feasible pathways, not only examining a combination of well-known enzymatic reactions but also finding potential enzymatic reactions that can catalyze the desired structural changes are necessary. To achieve this, most conventional techniques use
-
Ensembling graph attention networks for human microbe–drug association prediction Bioinformatics (IF 5.61) Pub Date : 2020-12-29 Yahui Long; Min Wu; Yong Liu; Chee Keong Kwoh; Jiawei Luo; Xiaoli Li
Human microbes get closely involved in an extensive variety of complex human diseases and become new drug targets. In silico methods for identifying potential microbe–drug associations provide an effective complement to conventional experimental methods, which can not only benefit screening candidate compounds for drug development but also facilitate novel knowledge discovery for understanding microbe–drug
-
A general near-exact k-mer counting method with low memory consumption enables de novo assembly of 106× human sequence data in 2.7 hours Bioinformatics (IF 5.61) Pub Date : 2020-12-29 Christina Huan Shi; Kevin Y. Yip
In de novo sequence assembly, a standard pre-processing step is k-mer counting, which computes the number of occurrences of every length-k sub-sequence in the sequencing reads. Sequencing errors can produce many k-mers that do not appear in the genome, leading to the need for an excessive amount of memory during counting. This issue is particularly serious when the genome to be assembled is large,
-
Adversarial deconfounding autoencoder for learning robust gene expression embeddings Bioinformatics (IF 5.61) Pub Date : 2020-12-29 Ayse B Dincer; Joseph D Janizek; Su-In Lee
Increasing number of gene expression profiles has enabled the use of complex models, such as deep unsupervised neural networks, to extract a latent space from these profiles. However, expression profiles, especially when collected in large numbers, inherently contain variations introduced by technical artifacts (e.g. batch effects) and uninteresting biological variables (e.g. age) in addition to the
-
DeepCDR: a hybrid graph convolutional network for predicting cancer drug response Bioinformatics (IF 5.61) Pub Date : 2020-12-29 Qiao Liu; Zhiqiang Hu; Rui Jiang; Mu Zhou
Accurate prediction of cancer drug response (CDR) is challenging due to the uncertainty of drug efficacy and heterogeneity of cancer patients. Strong evidences have implicated the high dependence of CDR on tumor genomic and transcriptomic profiles of individual patients. Precise identification of CDR is crucial in both guiding anti-cancer drug design and understanding cancer biology.
-
CLPred: a sequence-based protein crystallization predictor using BLSTM neural network Bioinformatics (IF 5.61) Pub Date : 2020-12-29 Wenjing Xuan; Ning Liu; Neng Huang; Yaohang Li; Jianxin Wang
Determining the structures of proteins is a critical step to understand their biological functions. Crystallography-based X-ray diffraction technique is the main method for experimental protein structure determination. However, the underlying crystallization process, which needs multiple time-consuming and costly experimental steps, has a high attrition rate. To overcome this issue, a series of in
-
Conditional out-of-distribution generation for unpaired data using transfer VAE Bioinformatics (IF 5.61) Pub Date : 2020-12-29 Mohammad Lotfollahi; Mohsen Naghipourfar; Fabian J Theis; F Alexander Wolf
While generative models have shown great success in sampling high-dimensional samples conditional on low-dimensional descriptors (stroke thickness in MNIST, hair color in CelebA, speaker identity in WaveNet), their generation out-of-distribution poses fundamental problems due to the difficulty of learning compact joint distribution across conditions. The canonical example of the conditional variational
-
Supervised learning on phylogenetically distributed data Bioinformatics (IF 5.61) Pub Date : 2020-12-29 Elliot Layne; Erika N Dort; Richard Hamelin; Yue Li; Mathieu Blanchette
The ability to develop robust machine-learning (ML) models is considered imperative to the adoption of ML techniques in biology and medicine fields. This challenge is particularly acute when data available for training is not independent and identically distributed (iid), in which case trained models are vulnerable to out-of-distribution generalization problems. Of particular interest are problems
-
Matrix (factorization) reloaded: flexible methods for imputing genetic interactions with cross-species and side information Bioinformatics (IF 5.61) Pub Date : 2020-12-29 Jason Fan; Xuan Cindy Li; Mark Crovella; Mark D M Leiserson
Mapping genetic interactions (GIs) can reveal important insights into cellular function and has potential translational applications. There has been great progress in developing high-throughput experimental systems for measuring GIs (e.g. with double knockouts) as well as in defining computational methods for inferring (imputing) unknown interactions. However, existing computational methods for imputation
-
The effect of kinship in re-identification attacks against genomic data sharing beacons Bioinformatics (IF 5.61) Pub Date : 2020-12-29 Kerem Ayoz; Miray Aysen; Erman Ayday; A Ercument Cicek
Big data era in genomics promises a breakthrough in medicine, but sharing data in a private manner limit the pace of field. Widely accepted ‘genomic data sharing beacon’ protocol provides a standardized and secure interface for querying the genomic datasets. The data are only shared if the desired information (e.g. a certain variant) exists in the dataset. Various studies showed that beacons are vulnerable
-
PathFinder: Bayesian inference of clone migration histories in cancer Bioinformatics (IF 5.61) Pub Date : 2020-12-29 Sudhir Kumar; Antonia Chroni; Koichiro Tamura; Maxwell Sanderford; Olumide Oladeinde; Vivian Aly; Tracy Vu; Sayaka Miura
Metastases cause a vast majority of cancer morbidity and mortality. Metastatic clones are formed by dispersal of cancer cells to secondary tissues, and are not medically detected or visible until later stages of cancer development. Clone phylogenies within patients provide a means of tracing the otherwise inaccessible dynamic history of migrations of cancer cells.
-
Probabilistic graphlets capture biological function in probabilistic molecular networks Bioinformatics (IF 5.61) Pub Date : 2020-12-29 Sergio Doria-Belenguer; Markus K. Youssef; René Böttcher; Noël Malod-Dognin; Nataša Pržulj
Molecular interactions have been successfully modeled and analyzed as networks, where nodes represent molecules and edges represent the interactions between them. These networks revealed that molecules with similar local network structure also have similar biological functions. The most sensitive measures of network structure are based on graphlets. However, graphlet-based methods thus far are only
-
svMIL: predicting the pathogenic effect of TAD boundary-disrupting somatic structural variants through multiple instance learning Bioinformatics (IF 5.61) Pub Date : 2020-12-29 Marleen M. Nieboer; Jeroen de Ridder
Despite the fact that structural variants (SVs) play an important role in cancer, methods to predict their effect, especially for SVs in non-coding regions, are lacking, leaving them often overlooked in the clinic. Non-coding SVs may disrupt the boundaries of Topologically Associated Domains (TADs), thereby affecting interactions between genes and regulatory elements such as enhancers. However, it
-
Inferring signaling pathways with probabilistic programming Bioinformatics (IF 5.61) Pub Date : 2020-12-29 David Merrell; Anthony Gitter
Cells regulate themselves via dizzyingly complex biochemical processes called signaling pathways. These are usually depicted as a network, where nodes represent proteins and edges indicate their influence on each other. In order to understand diseases and therapies at the cellular level, it is crucial to have an accurate understanding of the signaling pathways at work. Since signaling pathways can
-
Joint epitope selection and spacer design for string-of-beads vaccines Bioinformatics (IF 5.61) Pub Date : 2020-12-29 Emilio Dorigatti; Benjamin Schubert
Conceptually, epitope-based vaccine design poses two distinct problems: (i) selecting the best epitopes to elicit the strongest possible immune response and (ii) arranging and linking them through short spacer sequences to string-of-beads vaccines, so that their recovery likelihood during antigen processing is maximized. Current state-of-the-art approaches solve this design problem sequentially. Consequently
-
APOD: accurate sequence-based predictor of disordered flexible linkers Bioinformatics (IF 5.61) Pub Date : 2020-12-29 Zhenling Peng; Qian Xing; Lukasz Kurgan
Disordered flexible linkers (DFLs) are abundant and functionally important intrinsically disordered regions that connect protein domains and structural elements within domains and which facilitate disorder-based allosteric regulation. Although computational estimates suggest that thousands of proteins have DFLs, they were annotated experimentally in <200 proteins. This substantial annotation gap can
-
RAINFOREST: a random forest approach to predict treatment benefit in data from (failed) clinical drug trials Bioinformatics (IF 5.61) Pub Date : 2020-12-29 Joske Ubels; Tilman Schaefers; Cornelis Punt; Henk-Jan Guchelaar; Jeroen de Ridder
When phase III clinical drug trials fail their endpoint, enormous resources are wasted. Moreover, even if a clinical trial demonstrates a significant benefit, the observed effects are often small and may not outweigh the side effects of the drug. Therefore, there is a great clinical need for methods to identify genetic markers that can identify subgroups of patients which are likely to benefit from
-
FastSK: fast sequence analysis with gapped string kernels Bioinformatics (IF 5.61) Pub Date : 2020-12-29 Derrick Blakely; Eamon Collins; Ritambhara Singh; Andrew Norton; Jack Lanchantin; Yanjun Qi
Gapped k-mer kernels with support vector machines (gkm-SVMs) have achieved strong predictive performance on regulatory DNA sequences on modestly sized training sets. However, existing gkm-SVM algorithms suffer from slow kernel computation time, as they depend exponentially on the sub-sequence feature length, number of mismatch positions, and the task’s alphabet size.
-
A Siamese neural network model for the prioritization of metabolic disorders by integrating real and simulated data Bioinformatics (IF 5.61) Pub Date : 2020-12-29 Gian Marco Messa; Francesco Napolitano; Sarah H. Elsea; Diego di Bernardo; Xin Gao
Untargeted metabolomic approaches hold a great promise as a diagnostic tool for inborn errors of metabolisms (IEMs) in the near future. However, the complexity of the involved data makes its application difficult and time consuming. Computational approaches, such as metabolic network simulations and machine learning, could significantly help to exploit metabolomic data to aid the diagnostic process
-
Using a GTR+Γ substitution model for dating sequence divergence when stationarity and time-reversibility assumptions are violated Bioinformatics (IF 5.61) Pub Date : 2020-12-29 Jose Barba-Montoya; Qiqing Tao; Sudhir Kumar
As the number and diversity of species and genes grow in contemporary datasets, two common assumptions made in all molecular dating methods, namely the time-reversibility and stationarity of the substitution process, become untenable. No software tools for molecular dating allow researchers to relax these two assumptions in their data analyses. Frequently the same General Time Reversible (GTR) model
-
Finding orthologous gene blocks in bacteria: the computational hardness of the problem and novel methods to address it Bioinformatics (IF 5.61) Pub Date : 2020-12-29 Huy N Nguyen; Alexey Markin; Iddo Friedberg; Oliver Eulenstein
The evolution of complexity is one of the most fascinating and challenging problems in modern biology, and tracing the evolution of complex traits is an open problem. In bacteria, operons and gene blocks provide a model of tractable evolutionary complexity at the genomic level. Gene blocks are structures of co-located genes with related functions, and operons are gene blocks whose genes are co-transcribed
-
New mixture models for decoy-free false discovery rate estimation in mass spectrometry proteomics Bioinformatics (IF 5.61) Pub Date : 2020-12-29 Yisu Peng; Shantanu Jain; Yong Fuga Li; Michal Greguš; Alexander R. Ivanov; Olga Vitek; Predrag Radivojac
Accurate estimation of false discovery rate (FDR) of spectral identification is a central problem in mass spectrometry-based proteomics. Over the past two decades, target-decoy approaches (TDAs) and decoy-free approaches (DFAs) have been widely used to estimate FDR. TDAs use a database of decoy species to faithfully model score distributions of incorrect peptide-spectrum matches (PSMs). DFAs, on the
-
A neuro-evolution approach to infer a Boolean network from time-series gene expressions Bioinformatics (IF 5.61) Pub Date : 2020-12-29 Shohag Barman; Yung-Keun Kwon
In systems biology, it is challenging to accurately infer a regulatory network from time-series gene expression data, and a variety of methods have been proposed. Most of them were computationally inefficient in inferring very large networks, though, because of the increasing number of candidate regulatory genes. Although a recent approach called GABNI (genetic algorithm-based Boolean network inference)
-
An efficient framework to identify key miRNA–mRNA regulatory modules in cancer Bioinformatics (IF 5.61) Pub Date : 2020-12-29 Milad Mokhtaridoost; Mehmet Gönen
Micro-RNAs (miRNAs) are known as the important components of RNA silencing and post-transcriptional gene regulation, and they interact with messenger RNAs (mRNAs) either by degradation or by translational repression. miRNA alterations have a significant impact on the formation and progression of human cancers. Accordingly, it is important to establish computational methods with high predictive performance
-
SCHNEL: scalable clustering of high dimensional single-cell data Bioinformatics (IF 5.61) Pub Date : 2020-12-29 Tamim Abdelaal; Paul de Raadt; Boudewijn P F Lelieveldt; Marcel J T Reinders; Ahmed Mahfouz
Single cell data measures multiple cellular markers at the single-cell level for thousands to millions of cells. Identification of distinct cell populations is a key step for further biological understanding, usually performed by clustering this data. Dimensionality reduction based clustering tools are either not scalable to large datasets containing millions of cells, or not fully automated requiring
-
Detecting evolutionary patterns of cancers using consensus trees Bioinformatics (IF 5.61) Pub Date : 2020-12-29 Sarah Christensen; Juho Kim; Nicholas Chia; Oluwasanmi Koyejo; Mohammed El-Kebir
While each cancer is the result of an isolated evolutionary process, there are repeated patterns in tumorigenesis defined by recurrent driver mutations and their temporal ordering. Such repeated evolutionary trajectories hold the potential to improve stratification of cancer patients into subtypes with distinct survival and therapy response profiles. However, current cancer phylogeny methods infer
-
Padhoc: a computational pipeline for pathway reconstruction on the fly Bioinformatics (IF 5.61) Pub Date : 2020-12-29 Salvador Casaní-Galdón; Cecile Pereira; Ana Conesa
Molecular pathway databases represent cellular processes in a structured and standardized way. These databases support the community-wide utilization of pathway information in biological research and the computational analysis of high-throughput biochemical data. Although pathway databases are critical in genomics research, the fast progress of biomedical sciences prevents databases from staying up-to-date
-
SCIM: universal single-cell matching with unpaired feature sets Bioinformatics (IF 5.61) Pub Date : 2020-12-29 Stefan G Stark; Joanna Ficek; Francesco Locatello; Ximena Bonilla; Stéphane Chevrier; Franziska Singer; Tumor Profiler Consortium ; Rudolf Aebersold; Faisal S Al-Quaddoomi; Jonas Albinus; Ilaria Alborelli; Sonali Andani; Per-Olof Attinger; Marina Bacac; Daniel Baumhoer; Beatrice Beck-Schimmer; Niko Beerenwinkel; Christian Beisel; Lara Bernasconi; Anne Bertolini; Bernd Bodenmiller; Ximena Bonilla; Ruben
Recent technological advances have led to an increase in the production and availability of single-cell data. The ability to integrate a set of multi-technology measurements would allow the identification of biologically or clinically meaningful observations through the unification of the perspectives afforded by each technology. In most cases, however, profiling technologies consume the used cells
-
DeepSELEX: inferring DNA-binding preferences from HT-SELEX data using multi-class CNNs Bioinformatics (IF 5.61) Pub Date : 2020-12-29 Maor Asif; Yaron Orenstein
Transcription factor (TF) DNA-binding is a central mechanism in gene regulation. Biologists would like to know where and when these factors bind DNA. Hence, they require accurate DNA-binding models to enable binding prediction to any DNA sequence. Recent technological advancements measure the binding of a single TF to thousands of DNA sequences. One of the prevailing techniques, high-throughput SELEX
-
Graph convolutional networks for epigenetic state prediction using both sequence and 3D genome data Bioinformatics (IF 5.61) Pub Date : 2020-12-29 Jack Lanchantin; Yanjun Qi
Predictive models of DNA chromatin profile (i.e. epigenetic state), such as transcription factor binding, are essential for understanding regulatory processes and developing gene therapies. It is known that the 3D genome, or spatial structure of DNA, is highly influential in the chromatin profile. Deep neural networks have achieved state of the art performance on chromatin profile prediction by using
-
PROBselect: accurate prediction of protein-binding residues from proteins sequences via dynamic predictor selection Bioinformatics (IF 5.61) Pub Date : 2020-12-29 Fuhao Zhang; Wenbo Shi; Jian Zhang; Min Zeng; Min Li; Lukasz Kurgan
Knowledge of protein-binding residues (PBRs) improves our understanding of protein−protein interactions, contributes to the prediction of protein functions and facilitates protein−protein docking calculations. While many sequence-based predictors of PBRs were published, they offer modest levels of predictive performance and most of them cross-predict residues that interact with other partners. One
-
Geometricus represents protein structures as shape-mers derived from moment invariants Bioinformatics (IF 5.61) Pub Date : 2020-12-29 Janani Durairaj; Mehmet Akdel; Dick de Ridder; Aalt D J van Dijk
As the number of experimentally solved protein structures rises, it becomes increasingly appealing to use structural information for predictive tasks involving proteins. Due to the large variation in protein sizes, folds and topologies, an attractive approach is to embed protein structures into fixed-length vectors, which can be used in machine learning algorithms aimed at predicting and understanding
-
Batch equalization with a generative adversarial network Bioinformatics (IF 5.61) Pub Date : 2020-12-29 Wesley Wei Qian; Cassandra Xia; Subhashini Venugopalan; Arunachalam Narayanaswamy; Michelle Dimon; George W Ashdown; Jake Baum; Jian Peng; D Michael Ando
Advances in automation and imaging have made it possible to capture a large image dataset that spans multiple experimental batches of data. However, accurate biological comparison across the batches is challenged by batch-to-batch variation (i.e. batch effect) due to uncontrollable experimental noise (e.g. varying stain intensity or cell density). Previous approaches to minimize the batch effect have
-
DriverGroup: a novel method for identifying driver gene groups Bioinformatics (IF 5.61) Pub Date : 2020-12-29 Vu V H Pham; Lin Liu; Cameron P Bracken; Gregory J Goodall; Jiuyong Li; Thuc D Le
Identifying cancer driver genes is a key task in cancer informatics. Most existing methods are focused on individual cancer drivers which regulate biological processes leading to cancer. However, the effect of a single gene may not be sufficient to drive cancer progression. Here, we hypothesize that there are driver gene groups that work in concert to regulate cancer, and we develop a novel computational
-
Enhancing statistical power in temporal biomarker discovery through representative shapelet mining Bioinformatics (IF 5.61) Pub Date : 2020-12-29 Thomas Gumbsch; Christian Bock; Michael Moor; Bastian Rieck; Karsten Borgwardt
Temporal biomarker discovery in longitudinal data is based on detecting reoccurring trajectories, the so-called shapelets. The search for shapelets requires considering all subsequences in the data. While the accompanying issue of multiple testing has been mitigated in previous work, the redundancy and overlap of the detected shapelets results in an a priori unbounded number of highly similar and structurally
-
BiCoN: Network-constrained biclustering of patients and omics data Bioinformatics (IF 5.61) Pub Date : 2020-12-26 Olga Lazareva; Stefan Canzar; Kevin Yuan; Jan Baumbach; David B Blumenthal; Paolo Tieri; Tim Kacprowski; Markus List
Unsupervised learning approaches are frequently employed to stratify patients into clinically relevant subgroups and to identify biomarkers such as disease-associated genes. However, clustering and biclustering techniques are oblivious to the functional relationship of genes and are thus not ideally suited to pinpoint molecular mechanisms along with patient subgroups.
-
A non-linear regression method for estimation of gene-environment heritability Bioinformatics (IF 5.61) Pub Date : 2020-12-26 Matthew Kerin; Jonathan Marchini
Gene-environment (GxE) interactions are one of the least studied aspects of the genetic architecture of human traits and diseases. The environment of an individual is inherently high dimensional, evolves through time and can be expensive and time consuming to measure. The UK Biobank study, with all 500,000 participants having undergone an extensive baseline questionnaire, represents a unique opportunity
-
Single Cell Systems Analysis: Decision Geometry In Outliers Bioinformatics (IF 5.61) Pub Date : 2020-12-26 Lianne Abrahams
Anti-cancer therapeutics of the highest calibre currently focus on combinatorial targeting of specific oncoproteins and tumour suppressors. Clinical relapse depends upon intratumoral heterogeneity which serves as substrate variation during evolution of resistance to therapeutic regimens.
-
Prediction Of Histone Post-Translational Modifications Using Deep Learning Bioinformatics (IF 5.61) Pub Date : 2020-12-26 Dipankar Ranjan Baisya; Stefano Lonardi
Histone post-translational modifications (PTMs) are involved in a variety of essential regulatory processes in the cell, including transcription control. Recent studies have shown that histone PTMs can be accurately predicted from the knowledge of transcription factor binding or DNase hypersensitivity data. Similarly, it has been shown that one can predict PTMs from the underlying DNA primary sequence
-
VINYL: Variant prIoritizatioN by survivaL analysis Bioinformatics (IF 5.61) Pub Date : 2020-12-26 Matteo Chiara; Pietro Mandreoli; Marco Antonio Tangaro; Anna Maria D’Erchia; Sandro Sorrentino; Cinzia Forleo; David S Horner; Federico Zambelli; Graziano Pesole
Clinical applications of genome re-sequencing technologies typically generate large amounts of data that need to be carefully annotated and interpreted to identify genetic variants potentially associated with pathological conditions. In this context, accurate and reproducible methods for the functional annotation and prioritization of genetic variants are of fundamental importance.
-
MethPanel: a parallel pipeline and interactive analysis tool for multiplex bisulphite PCR sequencing to assess DNA methylation biomarker panels for disease detection Bioinformatics (IF 5.61) Pub Date : 2020-12-26 Phuc-Loi Luu; Phuc-Thinh Ong; Tran Thai Huu Loc; Dilys Lam; Ruth Pidsley; Clare Stirzaker; Susan J Clark
DNA methylation patterns in a cell are associated with gene expression and the phenotype of a cell, including disease states. Bisulphite PCR sequencing is commonly used to assess the methylation profile of genomic regions between different cells. Here we have developed MethPanel, a computational pipeline with an interactive graphical interface to rapidly analyse multiplex bisulphite PCR sequencing
-
MultiPaths: a python framework for analyzing multi-layer biological networks using diffusion algorithms Bioinformatics (IF 5.61) Pub Date : 2020-12-26 Josep Marín-Llaó; Sarah Mubeen; Alexandre Perera-Lluna; Martin Hofmann-Apitius; Sergio Picart-Armada; Daniel Domingo-Fernández
High-throughput screening yields vast amounts of biological data which can be highly challenging to interpret. In response, knowledge-driven approaches emerged as possible solutions to analyze large datasets by leveraging prior knowledge of biomolecular interactions represented in the form of biological networks. Nonetheless, given their size and complexity, their manual investigation quickly becomes
Contents have been reproduced by permission of the publishers.