当前期刊: Algorithms for Molecular Biology Go to current issue    加入关注   
显示样式:        排序: IF: - GO 导出
  • gsufsort: constructing suffix arrays, LCP arrays and BWTs for string collections
    Algorithms Mol. Biol. (IF 1.432) Pub Date : 2020-09-22
    Felipe A. Louza; Guilherme P. Telles; Simon Gog; Nicola Prezza; Giovanna Rosone

    The construction of a suffix array for a collection of strings is a fundamental task in Bioinformatics and in many other applications that process strings. Related data structures, as the Longest Common Prefix array, the Burrows–Wheeler transform, and the document array, are often needed to accompany the suffix array to efficiently solve a wide variety of problems. While several algorithms have been

  • A linear-time algorithm that avoids inverses and computes Jackknife (leave-one-out) products like convolutions or other operators in commutative semigroups.
    Algorithms Mol. Biol. (IF 1.432) Pub Date : 2020-09-19
    John L Spouge,Joseph M Ziegelbauer,Mileidy Gonzalez

    Data about herpesvirus microRNA motifs on human circular RNAs suggested the following statistical question. Consider independent random counts, not necessarily identically distributed. Conditioned on the sum, decide whether one of the counts is unusually large. Exact computation of the p-value leads to a specific algorithmic problem. Given $$n$$ elements $$g_{0} ,g_{1} , \ldots ,g_{n - 1}$$ in a set

  • Reconstruction of time-consistent species trees.
    Algorithms Mol. Biol. (IF 1.432) Pub Date : 2020-08-20
    Manuel Lafond,Marc Hellmuth

    The history of gene families—which are equivalent to event-labeled gene trees—can to some extent be reconstructed from empirically estimated evolutionary event-relations containing pairs of orthologous, paralogous or xenologous genes. The question then arises as whether inferred event-labeled gene trees are “biologically feasible” which is the case if one can find a species tree with which the gene

  • On an enhancement of RNA probing data using information theory.
    Algorithms Mol. Biol. (IF 1.432) Pub Date : 2020-08-07
    Thomas J X Li,Christian M Reidys

    Identifying the secondary structure of an RNA is crucial for understanding its diverse regulatory functions. This paper focuses on how to enhance target identification in a Boltzmann ensemble of structures via chemical probing data. We employ an information-theoretic approach to solve the problem, via considering a variant of the Rényi-Ulam game. Our framework is centered around the ensemble tree,

  • Algorithms for the quantitative Lock/Key model of cytoplasmic incompatibility.
    Algorithms Mol. Biol. (IF 1.432) Pub Date : 2020-07-22
    Tiziana Calamoneri,Mattia Gastaldello,Arnaud Mary,Marie-France Sagot,Blerina Sinaimeri

    Cytoplasmic incompatibility (CI) relates to the manipulation by the parasite Wolbachia of its host reproduction. Despite its widespread occurrence, the molecular basis of CI remains unclear and theoretical models have been proposed to understand the phenomenon. We consider in this paper the quantitative Lock-Key model which currently represents a good hypothesis that is consistent with the data available

  • Fast computation of genome-metagenome interaction effects.
    Algorithms Mol. Biol. (IF 1.432) Pub Date : 2020-07-01
    Florent Guinot,Marie Szafranski,Julien Chiquet,Anouk Zancarini,Christine Le Signor,Christophe Mougel,Christophe Ambroise

    Association studies have been widely used to search for associations between common genetic variants observations and a given phenotype. However, it is now generally accepted that genes and environment must be examined jointly when estimating phenotypic variance. In this work we consider two types of biological markers: genotypic markers, which characterize an observation in terms of inherited genetic

  • Evolution through segmental duplications and losses: a Super-Reconciliation approach.
    Algorithms Mol. Biol. (IF 1.432) Pub Date : 2020-05-26
    Mattéo Delabre,Nadia El-Mabrouk,Katharina T Huber,Manuel Lafond,Vincent Moulton,Emmanuel Noutahi,Miguel Sautie Castellanos

    The classical gene and species tree reconciliation, used to infer the history of gene gain and loss explaining the evolution of gene families, assumes an independent evolution for each family. While this assumption is reasonable for genes that are far apart in the genome, it is not appropriate for genes grouped into syntenic blocks, which are more plausibly the result of a concerted evolution. Here

  • Precise parallel volumetric comparison of molecular surfaces and electrostatic isopotentials.
    Algorithms Mol. Biol. (IF 1.432) Pub Date : 2020-05-25
    Georgi D Georgiev,Kevin F Dodd,Brian Y Chen

    Geometric comparisons of binding sites and their electrostatic properties can identify subtle variations that select different binding partners and subtle similarities that accommodate similar partners. Because subtle features are central for explaining how proteins achieve specificity, algorithmic efficiency and geometric precision are central to algorithmic design. To address these concerns, this

  • Context-aware seeds for read mapping.
    Algorithms Mol. Biol. (IF 1.432) Pub Date : 2020-05-23
    Hongyi Xin,Mingfu Shao,Carl Kingsford

    Most modern seed-and-extend NGS read mappers employ a seeding scheme that requires extracting t non-overlapping seeds in each read in order to find all valid mappings under an edit distance threshold of t. As t grows, this seeding scheme forces mappers to use more and shorter seeds, which increases the seed hits (seed frequencies) and therefore reduces the efficiency of mappers. We propose a novel

  • Detecting transcriptomic structural variants in heterogeneous contexts via the Multiple Compatible Arrangements Problem.
    Algorithms Mol. Biol. (IF 1.432) Pub Date : 2020-05-15
    Yutong Qiu,Cong Ma,Han Xie,Carl Kingsford

    Transcriptomic structural variants (TSVs)—large-scale transcriptome sequence change due to structural variation - are common in cancer. TSV detection from high-throughput sequencing data is a computationally challenging problem. Among all the confounding factors, sample heterogeneity, where each sample contains multiple distinct alleles, poses a critical obstacle to accurate TSV prediction. To improve

  • The distance and median problems in the single-cut-or-join model with single-gene duplications.
    Algorithms Mol. Biol. (IF 1.432) Pub Date : 2020-05-04
    Aniket C Mane,Manuel Lafond,Pedro C Feijao,Cedric Chauve

    Background In the field of genome rearrangement algorithms, models accounting for gene duplication lead often to hard problems. For example, while computing the pairwise distance is tractable in most duplication-free models, the problem is NP-complete for most extensions of these models accounting for duplicated genes. Moreover, problems involving more than two genomes, such as the genome median and

  • Non-parametric and semi-parametric support estimation using SEquential RESampling random walks on biomolecular sequences.
    Algorithms Mol. Biol. (IF 1.432) Pub Date : 2020-04-16
    Wei Wang,Jack Smith,Hussein A Hejase,Kevin J Liu

    Non-parametric and semi-parametric resampling procedures are widely used to perform support estimation in computational biology and bioinformatics. Among the most widely used methods in this class is the standard bootstrap method, which consists of random sampling with replacement. While not requiring assumptions about any particular parametric model for resampling purposes, the bootstrap and related

  • From pairs of most similar sequences to phylogenetic best matches.
    Algorithms Mol. Biol. (IF 1.432) Pub Date : 2020-04-09
    Peter F Stadler,Manuela Geiß,David Schaller,Alitzel López Sánchez,Marcos González Laffitte,Dulce I Valdivia,Marc Hellmuth,Maribel Hernández Rosales

    Background Many of the commonly used methods for orthology detection start from mutually most similar pairs of genes (reciprocal best hits) as an approximation for evolutionary most closely related pairs of genes (reciprocal best matches). This approximation of best matches by best hits becomes exact for ultrametric dissimilarities, i.e., under the Molecular Clock Hypothesis. It fails, however, whenever

  • GrpClassifierEC: a novel classification approach based on the ensemble clustering space.
    Algorithms Mol. Biol. (IF 1.432) Pub Date : 2020-02-13
    Loai Abdallah,Malik Yousef

    Background Advances in molecular biology have resulted in big and complicated data sets, therefore a clustering approach that able to capture the actual structure and the hidden patterns of the data is required. Moreover, the geometric space may not reflects the actual similarity between the different objects. As a result, in this research we use clustering-based space that convert the geometric space

  • Finding all maximal perfect haplotype blocks in linear time.
    Algorithms Mol. Biol. (IF 1.432) Pub Date : 2020-02-10
    Jarno Alanko,Hideo Bannai,Bastien Cazaux,Pierre Peterlongo,Jens Stoye

    Recent large-scale community sequencing efforts allow at an unprecedented level of detail the identification of genomic regions that show signatures of natural selection. Traditional methods for identifying such regions from individuals' haplotype data, however, require excessive computing times and therefore are not applicable to current datasets. In 2019, Cunha et al. (Advances in bioinformatics

  • NANUQ: a method for inferring species networks from gene trees under the coalescent model.
    Algorithms Mol. Biol. (IF 1.432) Pub Date : 2019-12-06
    Elizabeth S Allman,Hector Baños,John A Rhodes

    Species networks generalize the notion of species trees to allow for hybridization or other lateral gene transfer. Under the network multispecies coalescent model, individual gene trees arising from a network can have any topology, but arise with frequencies dependent on the network structure and numerical parameters. We propose a new algorithm for statistical inference of a level-1 species network

  • TMRS: an algorithm for computing the time to the most recent substitution event from a multiple alignment column.
    Algorithms Mol. Biol. (IF 1.432) Pub Date : 2019-11-18
    Hisanori Kiryu,Yuto Ichikawa,Yasuhiro Kojima

    Background  As the number of sequenced genomes grows, researchers have access to an increasingly rich source for discovering detailed evolutionary information. However, the computational technologies for inferring biologically important evolutionary events are not sufficiently developed. Results  We present algorithms to estimate the evolutionary time ( t MRS ) to the most recent substitution event

  • Super short operations on both gene order and intergenic sizes.
    Algorithms Mol. Biol. (IF 1.432) Pub Date : 2019-11-12
    Andre R Oliveira,Géraldine Jean,Guillaume Fertin,Ulisses Dias,Zanoni Dias

    Background The evolutionary distance between two genomes can be estimated by computing a minimum length sequence of operations, called genome rearrangements, that transform one genome into another. Usually, a genome is modeled as an ordered sequence of genes, and most of the studies in the genome rearrangement literature consist in shaping biological scenarios into mathematical models. For instance

  • Bayesian localization of CNV candidates in WGS data within minutes.
    Algorithms Mol. Biol. (IF 1.432) Pub Date : 2019-10-02
    John Wiedenhoeft,Alex Cagan,Rimma Kozhemyakina,Rimma Gulevich,Alexander Schliep

    Background Full Bayesian inference for detecting copy number variants (CNV) from whole-genome sequencing (WGS) data is still largely infeasible due to computational demands. A recently introduced approach to perform Forward-Backward Gibbs sampling using dynamic Haar wavelet compression has alleviated issues of convergence and, to some extent, speed. Yet, the problem remains challenging in practice

  • Implications of non-uniqueness in phylogenetic deconvolution of bulk DNA samples of tumors.
    Algorithms Mol. Biol. (IF 1.432) Pub Date : 2019-09-10
    Yuanyuan Qi,Dikshant Pradhan,Mohammed El-Kebir

    Background Tumors exhibit extensive intra-tumor heterogeneity, the presence of groups of cellular populations with distinct sets of somatic mutations. This heterogeneity is the result of an evolutionary process, described by a phylogenetic tree. In addition to enabling clinicians to devise patient-specific treatment plans, phylogenetic trees of tumors enable researchers to decipher the mechanisms of

  • A branching process for homology distribution-based inference of polyploidy, speciation and loss.
    Algorithms Mol. Biol. (IF 1.432) Pub Date : 2019-08-08
    Yue Zhang,Chunfang Zheng,David Sankoff

    Background The statistical distribution of the similarity or difference between pairs of paralogous genes, created by whole genome doubling, or between pairs of orthologous genes in two related species is an important source of information about genomic evolution, especially in plants. Methods We derive the mixture of distributions of sequence similarity for duplicate gene pairs generated by repeated

  • A multi-labeled tree dissimilarity measure for comparing "clonal trees" of tumor progression.
    Algorithms Mol. Biol. (IF 1.432) Pub Date : 2019-08-03
    Nikolai Karpov,Salem Malikic,Md Khaledur Rahman,S Cenk Sahinalp

    We introduce a new dissimilarity measure between a pair of "clonal trees", each representing the progression and mutational heterogeneity of a tumor sample, constructed by the use of single cell or bulk high throughput sequencing data. In a clonal tree, each vertex represents a specific tumor clone, and is labeled with one or more mutations in a way that each mutation is assigned to the oldest clone

  • A general framework for genome rearrangement with biological constraints.
    Algorithms Mol. Biol. (IF 1.432) Pub Date : 2019-07-31
    Pijus Simonaitis,Annie Chateau,Krister M Swenson

    This paper generalizes previous studies on genome rearrangement under biological constraints, using double cut and join (DCJ). We propose a model for weighted DCJ, along with a family of optimization problems called φ -MCPS (Minimum Cost Parsimonious Scenario), that are based on labeled graphs. We show how to compute solutions to general instances of φ -MCPS, given an algorithm to compute φ -MCPS on

  • Statistically consistent divide-and-conquer pipelines for phylogeny estimation using NJMerge.
    Algorithms Mol. Biol. (IF 1.432) Pub Date : 2019-07-31
    Erin K Molloy,Tandy Warnow

    Background Divide-and-conquer methods, which divide the species set into overlapping subsets, construct a tree on each subset, and then combine the subset trees using a supertree method, provide a key algorithmic framework for boosting the scalability of phylogeny estimation methods to large datasets. Yet the use of supertree methods, which typically attempt to solve NP-hard optimization problems,

  • Prefix-free parsing for building big BWTs.
    Algorithms Mol. Biol. (IF 1.432) Pub Date : 2019-06-01
    Christina Boucher,Travis Gagie,Alan Kuhnle,Ben Langmead,Giovanni Manzini,Taher Mun

    High-throughput sequencing technologies have led to explosive growth of genomic databases; one of which will soon reach hundreds of terabytes. For many applications we want to build and store indexes of these databases but constructing such indexes is a challenge. Fortunately, many of these genomic databases are highly-repetitive-a characteristic that can be exploited to ease the computation of the

  • Linear time minimum segmentation enables scalable founder reconstruction.
    Algorithms Mol. Biol. (IF 1.432) Pub Date : 2019-05-28
    Tuukka Norri,Bastien Cazaux,Dmitry Kosolobov,Veli Mäkinen

    Background We study a preprocessing routine relevant in pan-genomic analyses: consider a set of aligned haplotype sequences of complete human chromosomes. Due to the enormous size of such data, one would like to represent this input set with a few founder sequences that retain as well as possible the contiguities of the original sequences. Such a smaller set gives a scalable way to exploit pan-genomic

  • An average-case sublinear forward algorithm for the haploid Li and Stephens model.
    Algorithms Mol. Biol. (IF 1.432) Pub Date : 2019-04-17
    Yohei M Rosen,Benedict J Paten

    Background Hidden Markov models of haplotype inheritance such as the Li and Stephens model allow for computationally tractable probability calculations using the forward algorithm as long as the representative reference panel used in the model is sufficiently small. Specifically, the monoploid Li and Stephens model and its variants are linear in reference panel size unless heuristic approximations

  • Differentially mutated subnetworks discovery.
    Algorithms Mol. Biol. (IF 1.432) Pub Date : 2019-04-13
    Morteza Chalabi Hajkarim,Eli Upfal,Fabio Vandin

    Problem We study the problem of identifying differentially mutated subnetworks of a large gene-gene interaction network, that is, subnetworks that display a significant difference in mutation frequency in two sets of cancer samples. We formally define the associated computational problem and show that the problem is NP-hard. Algorithm We propose a novel and efficient algorithm, called DAMOKLE, to identify

  • Repairing Boolean logical models from time-series data using Answer Set Programming.
    Algorithms Mol. Biol. (IF 1.432) Pub Date : 2019-04-10
    Alexandre Lemos,Inês Lynce,Pedro T Monteiro

    Background Boolean models of biological signalling-regulatory networks are increasingly used to formally describe and understand complex biological processes. These models may become inconsistent as new data become available and need to be repaired. In the past, the focus has been shed on the inference of (classes of) models given an interaction network and time-series data sets. However, repair of

  • Kermit: linkage map guided long read assembly.
    Algorithms Mol. Biol. (IF 1.432) Pub Date : 2019-04-02
    Riku Walve,Pasi Rastas,Leena Salmela

    Background  With long reads getting even longer and cheaper, large scale sequencing projects can be accomplished without short reads at an affordable cost. Due to the high error rates and less mature tools, de novo assembly of long reads is still challenging and often results in a large collection of contigs. Dense linkage maps are collections of markers whose location on the genome is approximately

  • Reconciling multiple genes trees via segmental duplications and losses.
    Algorithms Mol. Biol. (IF 1.432) Pub Date : 2019-04-02
    Riccardo Dondi,Manuel Lafond,Celine Scornavacca

    Reconciling gene trees with a species tree is a fundamental problem to understand the evolution of gene families. Many existing approaches reconcile each gene tree independently. However, it is well-known that the evolution of gene families is interconnected. In this paper, we extend a previous approach to reconcile a set of gene trees with a species tree based on segmental macro-evolutionary events

  • External memory BWT and LCP computation for sequence collections with applications.
    Algorithms Mol. Biol. (IF 1.432) Pub Date : 2019-03-23
    Lavinia Egidi,Felipe A Louza,Giovanni Manzini,Guilherme P Telles

    Background Sequencing technologies produce larger and larger collections of biosequences that have to be stored in compressed indices supporting fast search operations. Many compressed indices are based on the Burrows-Wheeler Transform (BWT) and the longest common prefix (LCP) array. Because of the sheer size of the input it is important to build these data structures in external memory and time using

  • Connectivity problems on heterogeneous graphs.
    Algorithms Mol. Biol. (IF 1.432) Pub Date : 2019-03-23
    Jimmy Wu,Alex Khodaverdian,Benjamin Weitz,Nir Yosef

    Background Network connectivity problems are abundant in computational biology research, where graphs are used to represent a range of phenomena: from physical interactions between molecules to more abstract relationships such as gene co-expression. One common challenge in studying biological networks is the need to extract meaningful, small subgraphs out of large databases of potential interactions

  • Semi-nonparametric modeling of topological domain formation from epigenetic data.
    Algorithms Mol. Biol. (IF 1.432) Pub Date : 2019-03-15
    Emre Sefer,Carl Kingsford

    Background Hi-C experiments capturing the 3D genome architecture have led to the discovery of topologically-associated domains (TADs) that form an important part of the 3D genome organization and appear to play a role in gene regulation and other functions. Several histone modifications have been independently associated with TAD formation, but their combinatorial effects on domain formation remain

  • Automated partial atomic charge assignment for drug-like molecules: a fast knapsack approach.
    Algorithms Mol. Biol. (IF 1.432) Pub Date : 2019-03-07
    Martin S Engler,Bertrand Caron,Lourens Veen,Daan P Geerke,Alan E Mark,Gunnar W Klau

    A key factor in computational drug design is the consistency and reliability with which intermolecular interactions between a wide variety of molecules can be described. Here we present a procedure to efficiently, reliably and automatically assign partial atomic charges to atoms based on known distributions. We formally introduce the molecular charge assignment problem, where the task is to select

  • Constrained incremental tree building: new absolute fast converging phylogeny estimation methods with improved scalability and accuracy.
    Algorithms Mol. Biol. (IF 1.432) Pub Date : 2019-03-07
    Qiuyi Zhang,Satish Rao,Tandy Warnow

    Background Absolute fast converging (AFC) phylogeny estimation methods are ones that have been proven to recover the true tree with high probability given sequences whose lengths are polynomial in the number of number of leaves in the tree (once the shortest and longest branch weights are fixed). While there has been a large literature on AFC methods, the best in terms of empirical performance was

  • SNPs detection by eBWT positional clustering.
    Algorithms Mol. Biol. (IF 1.432) Pub Date : 2019-03-07
    Nicola Prezza,Nadia Pisanti,Marinella Sciortino,Giovanna Rosone

    Background Sequencing technologies keep on turning cheaper and faster, thus putting a growing pressure for data structures designed to efficiently store raw data, and possibly perform analysis therein. In this view, there is a growing interest in alignment-free and reference-free variants calling methods that only make use of (suitably indexed) raw reads data. Results We develop the positional clustering

  • Regmex: a statistical tool for exploring motifs in ranked sequence lists from genomics experiments.
    Algorithms Mol. Biol. (IF 1.432) Pub Date : 2018-12-18
    Morten Muhlig Nielsen,Paula Tataru,Tobias Madsen,Asger Hobolth,Jakob Skou Pedersen

    Background Motif analysis methods have long been central for studying biological function of nucleotide sequences. Functional genomics experiments extend their potential. They typically generate sequence lists ranked by an experimentally acquired functional property such as gene expression or protein binding affinity. Current motif discovery tools suffer from limitations in searching large motif spaces

  • Superbubbles revisited.
    Algorithms Mol. Biol. (IF 1.432) Pub Date : 2018-12-07
    Fabian Gärtner,Lydia Müller,Peter F Stadler

    Background Superbubbles are distinctive subgraphs in direct graphs that play an important role in assembly algorithms for high-throughput sequencing (HTS) data. Their practical importance derives from the fact they are connected to their host graph by a single entrance and a single exit vertex, thus allowing them to be handled independently. Efficient algorithms for the enumeration of superbubbles

  • Coordinate systems for supergenomes.
    Algorithms Mol. Biol. (IF 1.432) Pub Date : 2018-09-28
    Fabian Gärtner,Christian Höner Zu Siederdissen,Lydia Müller,Peter F Stadler

    Background Genome sequences and genome annotation data have become available at ever increasing rates in response to the rapid progress in sequencing technologies. As a consequence the demand for methods supporting comparative, evolutionary analysis is also growing. In particular, efficient tools to visualize-omics data simultaneously for multiple species are sorely lacking. A first and crucial step

  • Improved de novo peptide sequencing using LC retention time information.
    Algorithms Mol. Biol. (IF 1.432) Pub Date : 2018-09-06
    Yves Frank,Tomas Hruz,Thomas Tschager,Valentin Venzin

    Background Liquid chromatography combined with tandem mass spectrometry is an important tool in proteomics for peptide identification. Liquid chromatography temporally separates the peptides in a sample. The peptides that elute one after another are analyzed via tandem mass spectrometry by measuring the mass-to-charge ratio of a peptide and its fragments. De novo peptide sequencing is the problem of

  • Sorting signed circular permutations by super short operations.
    Algorithms Mol. Biol. (IF 1.432) Pub Date : 2018-08-02
    Andre R Oliveira,Guillaume Fertin,Ulisses Dias,Zanoni Dias

    Background One way to estimate the evolutionary distance between two given genomes is to determine the minimum number of large-scale mutations, or genome rearrangements, that are necessary to transform one into the other. In this context, genomes can be represented as ordered sequences of genes, each gene being represented by a signed integer. If no gene is repeated, genomes are thus modeled as signed

  • Split-inducing indels in phylogenomic analysis.
    Algorithms Mol. Biol. (IF 1.432) Pub Date : 2018-07-22
    Alexander Donath,Peter F Stadler

    Background Most phylogenetic studies using molecular data treat gaps in multiple sequence alignments as missing data or even completely exclude alignment columns that contain gaps. Results Here we show that gap patterns in large-scale, genome-wide alignments are themselves phylogenetically informative and can be used to infer reliable phylogenies provided the gap data are properly filtered to reduce

  • Locus-aware decomposition of gene trees with respect to polytomous species trees.
    Algorithms Mol. Biol. (IF 1.432) Pub Date : 2018-06-09
    Michał Aleksander Ciach,Anna Muszewska,Paweł Górecki

    Background Horizontal gene transfer (HGT), a process of acquisition and fixation of foreign genetic material, is an important biological phenomenon. Several approaches to HGT inference have been proposed. However, most of them either rely on approximate, non-phylogenetic methods or on the tree reconciliation, which is computationally intensive and sensitive to parameter values. Results We investigate

  • A fast and accurate enumeration-based algorithm for haplotyping a triploid individual.
    Algorithms Mol. Biol. (IF 1.432) Pub Date : 2018-06-09
    Jingli Wu,Qian Zhang

    Background Haplotype assembly, reconstructing haplotypes from sequence data, is one of the major computational problems in bioinformatics. Most of the current methodologies for haplotype assembly are designed for diploid individuals. In recent years, genomes having more than two sets of homologous chromosomes have attracted many research groups that are interested in the genomics of disease, phylogenetics

  • Finding local genome rearrangements.
    Algorithms Mol. Biol. (IF 1.432) Pub Date : 2018-05-15
    Pijus Simonaitis,Krister M Swenson

    Background The double cut and join (DCJ) model of genome rearrangement is well studied due to its mathematical simplicity and power to account for the many events that transform gene order. These studies have mostly been devoted to the understanding of minimum length scenarios transforming one genome into another. In this paper we search instead for rearrangement scenarios that minimize the number

  • FSH: fast spaced seed hashing exploiting adjacent hashes.
    Algorithms Mol. Biol. (IF 1.432) Pub Date : 2018-03-29
    Samuele Girotto,Matteo Comin,Cinzia Pizzi

    Background Patterns with wildcards in specified positions, namely spaced seeds, are increasingly used instead of k-mers in many bioinformatics applications that require indexing, querying and rapid similarity search, as they can provide better sensitivity. Many of these applications require to compute the hashing of each position in the input sequences with respect to the given spaced seed, or to multiple

  • Outlier detection in BLAST hits.
    Algorithms Mol. Biol. (IF 1.432) Pub Date : 2018-03-29
    Nidhi Shah,Stephen F Altschul,Mihai Pop

    Background An important task in a metagenomic analysis is the assignment of taxonomic labels to sequences in a sample. Most widely used methods for taxonomy assignment compare a sequence in the sample to a database of known sequences. Many approaches use the best BLAST hit(s) to assign the taxonomic label. However, it is known that the best BLAST hit may not always correspond to the best taxonomic

  • OCTAL: Optimal Completion of gene trees in polynomial time.
    Algorithms Mol. Biol. (IF 1.432) Pub Date : 2018-03-24
    Sarah Christensen,Erin K Molloy,Pranjal Vachaspati,Tandy Warnow

    Background For a combination of reasons (including data generation protocols, approaches to taxon and gene sampling, and gene birth and loss), estimated gene trees are often incomplete, meaning that they do not contain all of the species of interest. As incomplete gene trees can impact downstream analyses, accurate completion of gene trees is desirable. Results We introduce the Optimal Tree Completion

  • Derivative-free neural network for optimizing the scoring functions associated with dynamic programming of pairwise-profile alignment.
    Algorithms Mol. Biol. (IF 1.432) Pub Date : 2018-02-23
    Kazunori D Yamada

    Background A profile-comparison method with position-specific scoring matrix (PSSM) is among the most accurate alignment methods. Currently, cosine similarity and correlation coefficients are used as scoring functions of dynamic programming to calculate similarity between PSSMs. However, it is unclear whether these functions are optimal for profile alignment methods. By definition, these functions

  • Fast phylogenetic inference from typing data.
    Algorithms Mol. Biol. (IF 1.432) Pub Date : 2018-02-23
    João A Carriço,Maxime Crochemore,Alexandre P Francisco,Solon P Pissis,Bruno Ribeiro-Gonçalves,Cátia Vaz

    Background Microbial typing methods are commonly used to study the relatedness of bacterial strains. Sequence-based typing methods are a gold standard for epidemiological surveillance due to the inherent portability of sequence and allelic profile data, fast analysis times and their capacity to create common nomenclatures for strains or clones. This led to development of several novel methods and several

  • A safe and complete algorithm for metagenomic assembly.
    Algorithms Mol. Biol. (IF 1.432) Pub Date : 2018-02-16
    Nidia Obscura Acosta,Veli Mäkinen,Alexandru I Tomescu

    Background Reconstructing the genome of a species from short fragments is one of the oldest bioinformatics problems. Metagenomic assembly is a variant of the problem asking to reconstruct the circular genomes of all bacterial species present in a sequencing sample. This problem can be naturally formulated as finding a collection of circular walks of a directed graph G that together cover all nodes

  • Time-consistent reconciliation maps and forbidden time travel.
    Algorithms Mol. Biol. (IF 1.432) Pub Date : 2018-02-15
    Nikolai Nøjgaard,Manuela Geiß,Daniel Merkle,Peter F Stadler,Nicolas Wieseke,Marc Hellmuth

    Background In the absence of horizontal gene transfer it is possible to reconstruct the history of gene families from empirically determined orthology relations, which are equivalent to event-labeled gene trees. Knowledge of the event labels considerably simplifies the problem of reconciling a gene tree T with a species trees S, relative to the reconciliation problem without prior knowledge of the

  • Gene tree parsimony for incomplete gene trees: addressing true biological loss.
    Algorithms Mol. Biol. (IF 1.432) Pub Date : 2018-02-02
    Md Shamsuzzoha Bayzid,Tandy Warnow

    Motivation Species tree estimation from gene trees can be complicated by gene duplication and loss, and "gene tree parsimony" (GTP) is one approach for estimating species trees from multiple gene trees. In its standard formulation, the objective is to find a species tree that minimizes the total number of gene duplications and losses with respect to the input set of gene trees. Although much is known

  • Phylogeny reconstruction based on the length distribution of k-mismatch common substrings.
    Algorithms Mol. Biol. (IF 1.432) Pub Date : 2017-12-15
    Burkhard Morgenstern,Svenja Schöbel,Chris-André Leimeister

    Background Various approaches to alignment-free sequence comparison are based on the length of exact or inexact word matches between pairs of input sequences. Haubold et al. (J Comput Biol 16:1487-1500, 2009) showed how the average number of substitutions per position between two DNA sequences can be estimated based on the average length of exact common substrings. Results In this paper, we study the

  • Generalized enhanced suffix array construction in external memory.
    Algorithms Mol. Biol. (IF 1.432) Pub Date : 2017-12-14
    Felipe A Louza,Guilherme P Telles,Steve Hoffmann,Cristina D A Ciferri

    Background Suffix arrays, augmented by additional data structures, allow solving efficiently many string processing problems. The external memory construction of the generalized suffix array for a string collection is a fundamental task when the size of the input collection or the data structure exceeds the available internal memory. Results In this article we present and analyze [Formula: see text]

  • HAlign-II: efficient ultra-large multiple sequence alignment and phylogenetic tree reconstruction with distributed and parallel computing.
    Algorithms Mol. Biol. (IF 1.432) Pub Date : 2017-10-14
    Shixiang Wan,Quan Zou

    BACKGROUND Multiple sequence alignment (MSA) plays a key role in biological sequence analyses, especially in phylogenetic tree construction. Extreme increase in next-generation sequencing results in shortage of efficient ultra-large biological sequence alignment approaches for coping with different sequence types. METHODS Distributed and parallel computing represents a crucial technique for accelerating

  • Algorithms for matching partially labelled sequence graphs.
    Algorithms Mol. Biol. (IF 1.432) Pub Date : 2017-10-13
    William R Taylor

    BACKGROUND In order to find correlated pairs of positions between proteins, which are useful in predicting interactions, it is necessary to concatenate two large multiple sequence alignments such that the sequences that are joined together belong to those that interact in their species of origin. When each protein is unique then the species name is sufficient to guide this match, however, when there

  • Biologically feasible gene trees, reconciliation maps and informative triples.
    Algorithms Mol. Biol. (IF 1.432) Pub Date : 2017-09-02
    Marc Hellmuth

    BACKGROUND The history of gene families-which are equivalent to event-labeled gene trees-can be reconstructed from empirically estimated evolutionary event-relations containing pairs of orthologous, paralogous or xenologous genes. The question then arises as whether inferred event-labeled gene trees are biologically feasible, that is, if there is a possible true history that would explain a given gene

  • Partially local three-way alignments and the sequence signatures of mitochondrial genome rearrangements.
    Algorithms Mol. Biol. (IF 1.432) Pub Date : 2017-08-31
    Marwa Al Arab,Matthias Bernt,Christian Höner Zu Siederdissen,Kifah Tout,Peter F Stadler

    BACKGROUND Genomic DNA frequently undergoes rearrangement of the gene order that can be localized by comparing the two DNA sequences. In mitochondrial genomes different mechanisms are likely at work, at least some of which involve the duplication of sequence around the location of the apparent breakpoints. We hypothesize that these different mechanisms of genome rearrangement leave distinctive sequence

Contents have been reproduced by permission of the publishers.
Springer 纳米技术权威期刊征稿
ACS ES&T Engineering
ACS ES&T Water
ACS Publications填问卷