样式: 排序: IF: - GO 导出 标记为已读
-
Infrared: a declarative tree decomposition-powered framework for bioinformatics Algorithms Mol. Biol. (IF 1.0) Pub Date : 2024-03-16 Hua-Ting Yao, Bertrand Marchand, Sarah J. Berkemer, Yann Ponty, Sebastian Will
Many bioinformatics problems can be approached as optimization or controlled sampling tasks, and solved exactly and efficiently using Dynamic Programming (DP). However, such exact methods are typically tailored towards specific settings, complex to develop, and hard to implement and adapt to problem variations. We introduce the Infrared framework to overcome such hindrances for a large class of problems
-
Median quartet tree search algorithms using optimal subtree prune and regraft Algorithms Mol. Biol. (IF 1.0) Pub Date : 2024-03-13 Shayesteh Arasti, Siavash Mirarab
Gene trees can be different from the species tree due to biological processes and inference errors. One way to obtain a species tree is to find one that maximizes some measure of similarity to a set of gene trees. The number of shared quartets between a potential species tree and gene trees provides a statistically justifiable score; if maximized properly, it could result in a statistically consistent
-
Suffix sorting via matching statistics Algorithms Mol. Biol. (IF 1.0) Pub Date : 2024-03-12 Zsuzsanna Lipták, Francesco Masillo, Simon J. Puglisi
We introduce a new algorithm for constructing the generalized suffix array of a collection of highly similar strings. As a first step, we construct a compressed representation of the matching statistics of the collection with respect to a reference string. We then use this data structure to distribute suffixes into a partial order, and subsequently to speed up suffix comparisons to complete the generalized
-
Finding maximal exact matches in graphs Algorithms Mol. Biol. (IF 1.0) Pub Date : 2024-03-11 Nicola Rizzo, Manuel Cáceres, Veli Mäkinen
We study the problem of finding maximal exact matches (MEMs) between a query string Q and a labeled graph G. MEMs are an important class of seeds, often used in seed-chain-extend type of practical alignment methods because of their strong connections to classical metrics. A principled way to speed up chaining is to limit the number of MEMs by considering only MEMs of length at least $$\kappa$$ ( $$\kappa$$
-
SparseRNAfolD: optimized sparse RNA pseudoknot-free folding with dangle consideration Algorithms Mol. Biol. (IF 1.0) Pub Date : 2024-03-03 Mateo Gray, Sebastian Will, Hosna Jabbari
Computational RNA secondary structure prediction by free energy minimization is indispensable for analyzing structural RNAs and their interactions. These methods find the structure with the minimum free energy (MFE) among exponentially many possible structures and have a restrictive time and space complexity ( $$O(n^3)$$ time and $$O(n^2)$$ space for pseudoknot-free structures) for longer RNA sequences
-
Recombinations, chains and caps: resolving problems with the DCJ-indel model Algorithms Mol. Biol. (IF 1.0) Pub Date : 2024-02-27 Leonard Bohnenkämper
One of the most fundamental problems in genome rearrangement studies is the (genomic) distance problem. It is typically formulated as finding the minimum number of rearrangements under a model that are needed to transform one genome into the other. A powerful multi-chromosomal model is the Double Cut and Join (DCJ) model.While the DCJ model is not able to deal with some situations that occur in practice
-
Unifying duplication episode clustering and gene-species mapping inference Algorithms Mol. Biol. (IF 1.0) Pub Date : 2024-02-14 Paweł Górecki, Natalia Rutecka, Agnieszka Mykowiecka, Jarosław Paszek
We present a novel problem, called MetaEC, which aims to infer gene-species assignments in a collection of partially leaf-labeled gene trees labels by minimizing the size of duplication episode clustering (EC). This problem is particularly relevant in metagenomics, where incomplete data often poses a challenge in the accurate reconstruction of gene histories. To solve MetaEC, we propose a polynomial
-
Predicting horizontal gene transfers with perfect transfer networks Algorithms Mol. Biol. (IF 1.0) Pub Date : 2024-02-06 Alitzel López Sánchez, Manuel Lafond
Horizontal gene transfer inference approaches are usually based on gene sequences: parametric methods search for patterns that deviate from a particular genomic signature, while phylogenetic methods use sequences to reconstruct the gene and species trees. However, it is well-known that sequences have difficulty identifying ancient transfers since mutations have enough time to erase all evidence of
-
Global exact optimisations for chloroplast structural haplotype scaffolding Algorithms Mol. Biol. (IF 1.0) Pub Date : 2024-02-06 Victor Epain, Rumen Andonov
Scaffolding is an intermediate stage of fragment assembly. It consists in orienting and ordering the contigs obtained by the assembly of the sequencing reads. In the general case, the problem has been largely studied with the use of distances data between the contigs. Here we focus on a dedicated scaffolding for the chloroplast genomes. As these genomes are small, circular and with few specific repeats
-
Co-linear chaining on pangenome graphs Algorithms Mol. Biol. (IF 1.0) Pub Date : 2024-01-27 Jyotshna Rajput, Ghanshyam Chandra, Chirag Jain
Pangenome reference graphs are useful in genomics because they compactly represent the genetic diversity within a species, a capability that linear references lack. However, efficiently aligning sequences to these graphs with complex topology and cycles can be challenging. The seed-chain-extend based alignment algorithms use co-linear chaining as a standard technique to identify a good cluster of exact
-
Fulgor: a fast and compact k-mer index for large-scale matching and color queries Algorithms Mol. Biol. (IF 1.0) Pub Date : 2024-01-22 Jason Fan, Jamshed Khan, Noor Pratap Singh, Giulio Ermanno Pibiri, Rob Patro
The problem of sequence identification or matching—determining the subset of reference sequences from a given collection that are likely to contain a short, queried nucleotide sequence—is relevant for many important tasks in Computational Biology, such as metagenomics and pangenome analysis. Due to the complex nature of such analyses and the large scale of the reference collections a resource-efficient
-
Dollo-CDP: a polynomial-time algorithm for the clade-constrained large Dollo parsimony problem Algorithms Mol. Biol. (IF 1.0) Pub Date : 2024-01-08 Junyan Dai, Tobias Rubel, Yunheng Han, Erin K. Molloy
The last decade of phylogenetics has seen the development of many methods that leverage constraints plus dynamic programming. The goal of this algorithmic technique is to produce a phylogeny that is optimal with respect to some objective function and that lies within a constrained version of tree space. The popular species tree estimation method ASTRAL, for example, returns a tree that (1) maximizes
-
Investigating the complexity of the double distance problems Algorithms Mol. Biol. (IF 1.0) Pub Date : 2024-01-04 Marília D. V. Braga, Leonie R. Brockmann, Katharina Klerx, Jens Stoye
Two genomes $$\mathbb {A}$$ and $$\mathbb {B}$$ over the same set of gene families form a canonical pair when each of them has exactly one gene from each family. Denote by $$n_*$$ the number of common families of $$\mathbb {A}$$ and $$\mathbb {B}$$ . Different distances of canonical genomes can be derived from a structure called breakpoint graph, which represents the relation between the two given
-
EMMA: a new method for computing multiple sequence alignments given a constraint subset alignment Algorithms Mol. Biol. (IF 1.0) Pub Date : 2023-12-07 Chengze Shen, Baqiao Liu, Kelly P. Williams, Tandy Warnow
Adding sequences into an existing (possibly user-provided) alignment has multiple applications, including updating a large alignment with new data, adding sequences into a constraint alignment constructed using biological knowledge, or computing alignments in the presence of sequence length heterogeneity. Although this is a natural problem, only a few tools have been developed to use this information
-
Correction: Constructing founder sets under allelic and non-allelic homologous recombination Algorithms Mol. Biol. (IF 1.0) Pub Date : 2023-12-06 Konstantinn Bonnet, Tobias Marschall, Daniel Doerr
Correction: Algorithms for Molecular Biology (2023) 18:15 https://doi.org/10.1186/s13015-023-00241-3 The Additional file 1 which originally published contained errors. It has now been replaced with the correct file. The original article [1] has been corrected. Bonnet K, Marschall T, Doerr D. Constructing founder sets under allelic and non-allelic homologous recombination. Algorithms Mol Biol. 2023;18:15
-
Quartets enable statistically consistent estimation of cell lineage trees under an unbiased error and missingness model Algorithms Mol. Biol. (IF 1.0) Pub Date : 2023-12-01 Yunheng Han, Erin K. Molloy
Cancer progression and treatment can be informed by reconstructing its evolutionary history from tumor cells. Although many methods exist to estimate evolutionary trees (called phylogenies) from molecular sequences, traditional approaches assume the input data are error-free and the output tree is fully resolved. These assumptions are challenged in tumor phylogenetics because single-cell sequencing
-
Automated design of dynamic programming schemes for RNA folding with pseudoknots Algorithms Mol. Biol. (IF 1.0) Pub Date : 2023-12-01 Bertrand Marchand, Sebastian Will, Sarah J. Berkemer, Yann Ponty, Laurent Bulteau
Although RNA secondary structure prediction is a textbook application of dynamic programming (DP) and routine task in RNA structure analysis, it remains challenging whenever pseudoknots come into play. Since the prediction of pseudoknotted structures by minimizing (realistically modelled) energy is NP-hard, specialized algorithms have been proposed for restricted conformation classes that capture the
-
New algorithms for structure informed genome rearrangement Algorithms Mol. Biol. (IF 1.0) Pub Date : 2023-12-01 Eden Ozeri, Meirav Zehavi, Michal Ziv-Ukelson
We define two new computational problems in the domain of perfect genome rearrangements, and propose three algorithms to solve them. The rearrangement scenarios modeled by the problems consider Reversal and Block Interchange operations, and a PQ-tree is utilized to guide the allowed operations and to compute their weights. In the first problem, $$\mathsf {Constrained \ TreeToString \ Divergence}$$
-
Relative timing information and orthology in evolutionary scenarios Algorithms Mol. Biol. (IF 1.0) Pub Date : 2023-11-08 David Schaller, Tom Hartmann, Manuel Lafond, Peter F. Stadler, Nicolas Wieseke, Marc Hellmuth
Evolutionary scenarios describing the evolution of a family of genes within a collection of species comprise the mapping of the vertices of a gene tree T to vertices and edges of a species tree S. The relative timing of the last common ancestors of two extant genes (leaves of T) and the last common ancestors of the two species (leaves of S) in which they reside is indicative of horizontal gene transfers
-
Constructing founder sets under allelic and non-allelic homologous recombination Algorithms Mol. Biol. (IF 1.0) Pub Date : 2023-09-29 Konstantinn Bonnet, Tobias Marschall, Daniel Doerr
Homologous recombination between the maternal and paternal copies of a chromosome is a key mechanism for human inheritance and shapes population genetic properties of our species. However, a similar mechanism can also act between different copies of the same sequence, then called non-allelic homologous recombination (NAHR). This process can result in genomic rearrangements—including deletion, duplication
-
Efficient gene orthology inference via large-scale rearrangements Algorithms Mol. Biol. (IF 1.0) Pub Date : 2023-09-28 Diego P. Rubert, Marília D. V. Braga
Recently we developed a gene orthology inference tool based on genome rearrangements (Journal of Bioinformatics and Computational Biology 19:6, 2021). Given a set of genomes our method first computes all pairwise gene similarities. Then it runs pairwise ILP comparisons to compute optimal gene matchings, which minimize, by taking the similarities into account, the weighted rearrangement distance between
-
Constructing phylogenetic networks via cherry picking and machine learning Algorithms Mol. Biol. (IF 1.0) Pub Date : 2023-09-16 Giulia Bernardini, Leo van Iersel, Esther Julien, Leen Stougie
Combining a set of phylogenetic trees into a single phylogenetic network that explains all of them is a fundamental challenge in evolutionary studies. Existing methods are computationally expensive and can either handle only small numbers of phylogenetic trees or are limited to severely restricted classes of networks. In this paper, we apply the recently-introduced theoretical framework of cherry picking
-
The solution surface of the Li-Stephens haplotype copying model Algorithms Mol. Biol. (IF 1.0) Pub Date : 2023-08-09 Yifan Jin, Jonathan Terhorst
The Li-Stephens (LS) haplotype copying model forms the basis of a number of important statistical inference procedures in genetics. LS is a probabilistic generative model which supposes that a sampled chromosome is an imperfect mosaic of other chromosomes found in a population. In the frequentist setting which is the focus of this paper, the output of LS is a “copying path” through chromosome space
-
phyBWT2: phylogeny reconstruction via eBWT positional clustering Algorithms Mol. Biol. (IF 1.0) Pub Date : 2023-08-03 Veronica Guerrini, Alessio Conte, Roberto Grossi, Gianni Liti, Giovanna Rosone, Lorenzo Tattini
Molecular phylogenetics studies the evolutionary relationships among the individuals of a population through their biological sequences. It may provide insights about the origin and the evolution of viral diseases, or highlight complex evolutionary trajectories. A key task is inferring phylogenetic trees from any type of sequencing data, including raw short reads. Yet, several tools require pre-processed
-
A topology-marginal composite likelihood via a generalized phylogenetic pruning algorithm Algorithms Mol. Biol. (IF 1.0) Pub Date : 2023-07-31 Seong-Hwan Jun, Hassan Nasif, Chris Jennings-Shaffer, David H Rich, Anna Kooperberg, Mathieu Fourment, Cheng Zhang, Marc A Suchard, Frederick A Matsen
Bayesian phylogenetics is a computationally challenging inferential problem. Classical methods are based on random-walk Markov chain Monte Carlo (MCMC), where random proposals are made on the tree parameter and the continuous parameters simultaneously. Variational phylogenetics is a promising alternative to MCMC, in which one fits an approximating distribution to the unnormalized phylogenetic posterior
-
On the complexity of non-binary tree reconciliation with endosymbiotic gene transfer Algorithms Mol. Biol. (IF 1.0) Pub Date : 2023-07-30 Mathieu Gascon, Nadia El-Mabrouk
Reconciling a non-binary gene tree with a binary species tree can be done efficiently in the absence of horizontal gene transfers, but becomes NP-hard in the presence of gene transfers. Here, we focus on the special case of endosymbiotic gene transfers (EGT), i.e. transfers between the mitochondrial and nuclear genome of the same species. More precisely, given a multifurcated (non-binary) gene tree
-
Mono-valent salt corrections for RNA secondary structures in the ViennaRNA package Algorithms Mol. Biol. (IF 1.0) Pub Date : 2023-07-29 Hua-Ting Yao, Ronny Lorenz, Ivo L. Hofacker, Peter F. Stadler
RNA features a highly negatively charged phosphate backbone that attracts a cloud of counter-ions that reduce the electrostatic repulsion in a concentration dependent manner. Ion concentrations thus have a large influence on folding and stability of RNA structures. Despite their well-documented effects, salt effects are not handled consistently by currently available secondary structure prediction
-
Locality-sensitive bucketing functions for the edit distance Algorithms Mol. Biol. (IF 1.0) Pub Date : 2023-07-24 Ke Chen, Mingfu Shao
Many bioinformatics applications involve bucketing a set of sequences where each sequence is allowed to be assigned into multiple buckets. To achieve both high sensitivity and precision, bucketing methods are desired to assign similar sequences into the same bucket while assigning dissimilar sequences into distinct buckets. Existing k-mer-based bucketing methods have been efficient in processing sequencing
-
Weighted ASTRID: fast and accurate species trees from weighted internode distances Algorithms Mol. Biol. (IF 1.0) Pub Date : 2023-07-19 Baqiao Liu, Tandy Warnow
Species tree estimation is a basic step in many biological research projects, but is complicated by the fact that gene trees can differ from the species tree due to processes such as incomplete lineage sorting (ILS), gene duplication and loss (GDL), and horizontal gene transfer (HGT), which can cause different regions within the genome to have different evolutionary histories (i.e., “gene tree heterogeneity”)
-
Eulertigs: minimum plain text representation of k-mer sets without repetitions in linear time Algorithms Mol. Biol. (IF 1.0) Pub Date : 2023-07-04 Sebastian Schmidt, Jarno N. Alanko
A fundamental operation in computational genomics is to reduce the input sequences to their constituent k-mers. For maximum performance of downstream applications it is important to store the k-mers in small space, while keeping the representation easy and efficient to use (i.e. without k-mer repetitions and in plain text). Recently, heuristics were presented to compute a near-minimum such representation
-
A classification algorithm based on dynamic ensemble selection to predict mutational patterns of the envelope protein in HIV-infected patients Algorithms Mol. Biol. (IF 1.0) Pub Date : 2023-06-19 Mohammad Fili, Guiping Hu, Changze Han, Alexa Kort, John Trettin, Hillel Haim
Therapeutics against the envelope (Env) proteins of human immunodeficiency virus type 1 (HIV-1) effectively reduce viral loads in patients. However, due to mutations, new therapy-resistant Env variants frequently emerge. The sites of mutations on Env that appear in each patient are considered random and unpredictable. Here we developed an algorithm to estimate for each patient the mutational state
-
On weighted k-mer dictionaries Algorithms Mol. Biol. (IF 1.0) Pub Date : 2023-06-17 Giulio Ermanno Pibiri
We consider the problem of representing a set of $$k$$ -mers and their abundance counts, or weights, in compressed space so that assessing membership and retrieving the weight of a $$k$$ -mer is efficient. The representation is called a weighted dictionary of $$k$$ -mers and finds application in numerous tasks in Bioinformatics that usually count $$k$$ -mers as a pre-processing step. In fact, $$k$$
-
Pangenomic genotyping with the marker array Algorithms Mol. Biol. (IF 1.0) Pub Date : 2023-05-05 Taher Mun, Naga Sai Kavya Vaddadi, Ben Langmead
We present a new method and software tool called rowbowt that applies a pangenome index to the problem of inferring genotypes from short-read sequencing data. The method uses a novel indexing structure called the marker array. Using the marker array, we can genotype variants with respect from large panels like the 1000 Genomes Project while reducing the reference bias that results when aligning to
-
All galls are divided into three or more parts: recursive enumeration of labeled histories for galled trees Algorithms Mol. Biol. (IF 1.0) Pub Date : 2023-02-13 Shaili Mathur, Noah A. Rosenberg
In mathematical phylogenetics, a labeled rooted binary tree topology can possess any of a number of labeled histories, each of which represents a possible temporal ordering of its coalescences. Labeled histories appear frequently in calculations that describe the combinatorics of phylogenetic trees. Here, we generalize the concept of labeled histories from rooted phylogenetic trees to rooted phylogenetic
-
Correction: Heuristic shortest hyperpaths in cell signaling hypergraphs Algorithms Mol. Biol. (IF 1.0) Pub Date : 2022-12-29 Krieger, Spencer, Kececioglu, John
Following the publication of the original article [1], the authors identified the errors on the formatting issues of definitions, lemmas, theorems, and mathematical proofs throughout the paper. The formatting issues have been corrected in the original version of the article. Please access the link given below to view the updated original article. There are no scientific content errors noted in the
-
On a greedy approach for genome scaffolding Algorithms Mol. Biol. (IF 1.0) Pub Date : 2022-10-29 Davot, Tom, Chateau, Annie, Fossé, Rohan, Giroudeau, Rodolphe, Weller, Mathias
Scaffolding is a bioinformatics problem aimed at completing the contig assembly process by determining the relative position and orientation of these contigs. It can be seen as a paths and cycles cover problem of a particular graph called the “scaffold graph”. We provide some NP-hardness and inapproximability results on this problem. We also adapt a greedy approximation algorithm on complete graphs
-
Treewidth-based algorithms for the small parsimony problem on networks Algorithms Mol. Biol. (IF 1.0) Pub Date : 2022-08-20 Scornavacca, Celine, Weller, Mathias
Phylogenetic reconstruction is one of the paramount challenges of contemporary bioinformatics. A subtask of existing tree reconstruction algorithms is modeled by the Small Parsimony problem: given a tree T and an assignment of character-states to its leaves, assign states to the internal nodes of T such as to minimize the parsimony score, that is, the number of edges of T connecting nodes with different
-
Binning long reads in metagenomics datasets using composition and coverage information Algorithms Mol. Biol. (IF 1.0) Pub Date : 2022-07-11 Wickramarachchi, Anuradha, Lin, Yu
Advancements in metagenomics sequencing allow the study of microbial communities directly from their environments. Metagenomics binning is a key step in the species characterisation of microbial communities. Next-generation sequencing reads are usually assembled into contigs for metagenomics binning mainly due to the limited information within short reads. Third-generation sequencing provides much
-
Two metrics on rooted unordered trees with labels Algorithms Mol. Biol. (IF 1.0) Pub Date : 2022-06-06 Wang, Yue
The early development of a zygote can be mathematically described by a developmental tree. To compare developmental trees of different species, we need to define distances on trees. If children cells after a division are not distinguishable, developmental trees are represented by the space $${\mathcal {T}}$$ of rooted trees with possibly repeated labels, where all vertices are unordered. If children
-
Heuristic shortest hyperpaths in cell signaling hypergraphs Algorithms Mol. Biol. (IF 1.0) Pub Date : 2022-05-26 Krieger, Spencer, Kececioglu, John
Cell signaling pathways, which are a series of reactions that start at receptors and end at transcription factors, are basic to systems biology. Properly modeling the reactions in such pathways requires directed hypergraphs, where an edge is now directed between two sets of vertices. Inferring a pathway by the most parsimonious series of reactions corresponds to finding a shortest hyperpath in a directed
-
Embedding gene trees into phylogenetic networks by conflict resolution algorithms Algorithms Mol. Biol. (IF 1.0) Pub Date : 2022-05-19 Wawerka, Marcin, Dąbkowski, Dawid, Rutecka, Natalia, Mykowiecka, Agnieszka, Górecki, Paweł
Phylogenetic networks are mathematical models of evolutionary processes involving reticulate events such as hybridization, recombination, or horizontal gene transfer. One of the crucial notions in phylogenetic network modelling is displayed tree, which is obtained from a network by removing a set of reticulation edges. Displayed trees may represent an evolutionary history of a gene family if the evolution
-
Bi-alignments with affine gaps costs Algorithms Mol. Biol. (IF 1.0) Pub Date : 2022-05-16 Stadler, Peter F., Will, Sebastian
Commonly, sequence and structure elements are assumed to evolve congruently, such that homologous sequence positions correspond to homologous structural features. Assuming congruent evolution, alignments based on sequence and structure similarity can therefore optimize both similarities at the same time in a single alignment. To model incongruent evolution, where sequence and structural features diverge
-
Efficient privacy-preserving variable-length substring match for genome sequence Algorithms Mol. Biol. (IF 1.0) Pub Date : 2022-04-26 Nakagawa, Yoshiki, Ohata, Satsuya, Shimizu, Kana
The development of a privacy-preserving technology is important for accelerating genome data sharing. This study proposes an algorithm that securely searches a variable-length substring match between a query and a database sequence. Our concept hinges on a technique that efficiently applies FM-index for a secret-sharing scheme. More precisely, we developed an algorithm that can achieve a secure table
-
Tree diet: reducing the treewidth to unlock FPT algorithms in RNA bioinformatics Algorithms Mol. Biol. (IF 1.0) Pub Date : 2022-04-02 Marchand, Bertrand, Ponty, Yann, Bulteau, Laurent
Hard graph problems are ubiquitous in Bioinformatics, inspiring the design of specialized Fixed-Parameter Tractable algorithms, many of which rely on a combination of tree-decomposition and dynamic programming. The time/space complexities of such approaches hinge critically on low values for the treewidth tw of the input graph. In order to extend their scope of applicability, we introduce the Tree-Diet
-
Adding hydrogen atoms to molecular models via fragment superimposition Algorithms Mol. Biol. (IF 1.0) Pub Date : 2022-03-29 Kunzmann, Patrick, Anter, Jacob Marcel, Hamacher, Kay
Most experimentally determined structures of biomolecules lack annotated hydrogen positions due to their low electron density. However, thorough structure analysis and simulations require knowledge about the positions of hydrogen atoms. Existing methods for their prediction are either limited to a certain range of molecules or only work effectively on small compounds. We present a novel algorithm that
-
Perplexity: evaluating transcript abundance estimation in the absence of ground truth Algorithms Mol. Biol. (IF 1.0) Pub Date : 2022-03-25 Fan, Jason, Chan, Skylar, Patro, Rob
There has been rapid development of probabilistic models and inference methods for transcript abundance estimation from RNA-seq data. These models aim to accurately estimate transcript-level abundances, to account for different biases in the measurement process, and even to assess uncertainty in resulting estimates that can be propagated to subsequent analyses. The assumed accuracy of the estimates
-
Space-efficient representation of genomic k-mer count tables Algorithms Mol. Biol. (IF 1.0) Pub Date : 2022-03-21 Shibuya, Yoshihiro, Belazzougui, Djamal, Kucherov, Gregory
k-mer counting is a common task in bioinformatic pipelines, with many dedicated tools available. Many of these tools produce in output k-mer count tables containing both k-mers and counts, easily reaching tens of GB. Furthermore, such tables do not support efficient random-access queries in general. In this work, we design an efficient representation of k-mer count tables supporting fast random-access
-
Fast characterization of segmental duplication structure in multiple genome assemblies Algorithms Mol. Biol. (IF 1.0) Pub Date : 2022-03-18 Išerić, Hamza, Alkan, Can, Hach, Faraz, Numanagić, Ibrahim
The increasing availability of high-quality genome assemblies raised interest in the characterization of genomic architecture. Major architectural elements, such as common repeats and segmental duplications (SDs), increase genome plasticity that stimulates further evolution by changing the genomic structure and inventing new genes. Optimal computation of SDs within a genome requires quadratic-time
-
Parsimonious Clone Tree Integration in cancer Algorithms Mol. Biol. (IF 1.0) Pub Date : 2022-03-14 Sashittal, Palash, Zaccaria, Simone, El-Kebir, Mohammed
Every tumor is composed of heterogeneous clones, each corresponding to a distinct subpopulation of cells that accumulated different types of somatic mutations, ranging from single-nucleotide variants (SNVs) to copy-number aberrations (CNAs). As the analysis of this intra-tumor heterogeneity has important clinical applications, several computational methods have been introduced to identify clones from
-
Efficiently sparse listing of classes of optimal cophylogeny reconciliations Algorithms Mol. Biol. (IF 1.0) Pub Date : 2022-02-15 Wang, Yishu, Mary, Arnaud, Sagot, Marie-France, Sinaimeri, Blerina
Cophylogeny reconciliation is a powerful method for analyzing host-parasite (or host-symbiont) co-evolution. It models co-evolution as an optimization problem where the set of all optimal solutions may represent different biological scenarios which thus need to be analyzed separately. Despite the significant research done in the area, few approaches have addressed the problem of helping the biologist
-
A new 1.375-approximation algorithm for sorting by transpositions Algorithms Mol. Biol. (IF 1.0) Pub Date : 2022-01-15 Silva, Luiz Augusto G., Kowada, Luis Antonio B., Rocco, Noraí Romeu, Walter, Maria Emília M. T.
sorting by transpositions (SBT) is a classical problem in genome rearrangements. In 2012, SBT was proven to be $$\mathcal {NP}$$ -hard and the best approximation algorithm with a 1.375 ratio was proposed in 2006 by Elias and Hartman (EH algorithm). Their algorithm employs simplification, a technique used to transform an input permutation $$\pi$$ into a simple permutation $${\hat{\pi }}$$ , presumably
-
An optimized FM-index library for nucleotide and amino acid search Algorithms Mol. Biol. (IF 1.0) Pub Date : 2021-12-31 Anderson, Tim, Wheeler, Travis J.
Pattern matching is a key step in a variety of biological sequence analysis pipelines. The FM-index is a compressed data structure for pattern matching, with search run time that is independent of the length of the database text. Implementation of the FM-index is reasonably complicated, so that increased adoption will be aided by the availability of a fast and flexible FM-index library. We present
-
An improved approximation algorithm for the reversal and transposition distance considering gene order and intergenic sizes Algorithms Mol. Biol. (IF 1.0) Pub Date : 2021-12-29 Brito, Klairton L., Oliveira, Andre R., Alexandrino, Alexsandro O., Dias, Ulisses, Dias, Zanoni
In the comparative genomics field, one of the goals is to estimate a sequence of genetic changes capable of transforming a genome into another. Genome rearrangement events are mutations that can alter the genetic content or the arrangement of elements from the genome. Reversal and transposition are two of the most studied genome rearrangement events. A reversal inverts a segment of a genome while a
-
A simpler linear-time algorithm for the common refinement of rooted phylogenetic trees on a common leaf set Algorithms Mol. Biol. (IF 1.0) Pub Date : 2021-12-06 Schaller, David, Hellmuth, Marc, Stadler, Peter F.
The supertree problem, i.e., the task of finding a common refinement of a set of rooted trees is an important topic in mathematical phylogenetics. The special case of a common leaf set L is known to be solvable in linear time. Existing approaches refine one input tree using information of the others and then test whether the results are isomorphic. An O(k|L|) algorithm, LinCR, for constructing the
-
Testing the agreement of trees with internal labels Algorithms Mol. Biol. (IF 1.0) Pub Date : 2021-12-04 Fernández-Baca, David, Liu, Lei
A semi-labeled tree is a tree where all leaves as well as, possibly, some internal nodes are labeled with taxa. Semi-labeled trees encompass ordinary phylogenetic trees and taxonomies. Suppose we are given a collection $${\mathcal {P}}= \{{\mathcal {T}}_1, {\mathcal {T}}_2, \ldots , {\mathcal {T}}_k\}$$ of semi-labeled trees, called input trees, over partially overlapping sets of taxa. The agreement
-
Approximation algorithm for rearrangement distances considering repeated genes and intergenic regions Algorithms Mol. Biol. (IF 1.0) Pub Date : 2021-10-13 Siqueira, Gabriel, Alexandrino, Alexsandro Oliveira, Oliveira, Andre Rodrigues, Dias, Zanoni
The rearrangement distance is a method to compare genomes of different species. Such distance is the number of rearrangement events necessary to transform one genome into another. Two commonly studied events are the transposition, which exchanges two consecutive blocks of the genome, and the reversal, which reverts a block of the genome. When dealing with such problems, seminal works represented genomes
-
DeepGRP: engineering a software tool for predicting genomic repetitive elements using Recurrent Neural Networks with attention Algorithms Mol. Biol. (IF 1.0) Pub Date : 2021-08-23 Hausmann, Fabian, Kurtz, Stefan
Repetitive elements contribute a large part of eukaryotic genomes. For example, about 40 to 50% of human, mouse and rat genomes are repetitive. So identifying and classifying repeats is an important step in genome annotation. This annotation step is traditionally performed using alignment based methods, either in a de novo approach or by aligning the genome sequence to a species specific set of repetitive
-
Heuristic algorithms for best match graph editing Algorithms Mol. Biol. (IF 1.0) Pub Date : 2021-08-17 Schaller, David, Geiß, Manuela, Hellmuth, Marc, Stadler, Peter F.
Best match graphs (BMGs) are a class of colored digraphs that naturally appear in mathematical phylogenetics as a representation of the pairwise most closely related genes among multiple species. An arc connects a gene x with a gene y from another species (vertex color) Y whenever it is one of the phylogenetically closest relatives of x. BMGs can be approximated with the help of similarity measures
-
A novel method for inference of acyclic chemical compounds with bounded branch-height based on artificial neural networks and integer programming Algorithms Mol. Biol. (IF 1.0) Pub Date : 2021-08-14 Azam, Naveed Ahmed, Zhu, Jianshen, Sun, Yanming, Shi, Yu, Shurbevski, Aleksandar, Zhao, Liang, Nagamochi, Hiroshi, Akutsu, Tatsuya
Analysis of chemical graphs is becoming a major research topic in computational molecular biology due to its potential applications to drug design. One of the major approaches in such a study is inverse quantitative structure activity/property relationship (inverse QSAR/QSPR) analysis, which is to infer chemical structures from given chemical activities/properties. Recently, a novel two-phase framework
-
INGOT-DR: an interpretable classifier for predicting drug resistance in M. tuberculosis Algorithms Mol. Biol. (IF 1.0) Pub Date : 2021-08-10 Zabeti, Hooman, Dexter, Nick, Safari, Amir Hosein, Sedaghat, Nafiseh, Libbrecht, Maxwell, Chindelevitch, Leonid
Prediction of drug resistance and identification of its mechanisms in bacteria such as Mycobacterium tuberculosis, the etiological agent of tuberculosis, is a challenging problem. Solving this problem requires a transparent, accurate, and flexible predictive model. The methods currently used for this purpose rarely satisfy all of these criteria. On the one hand, approaches based on testing strains