当前期刊: Systematic Biology Go to current issue    加入关注   
显示样式:        排序: 导出
我的关注
我的收藏
您暂时未登录!
登录
  • The Impact of Cross-Species Gene Flow on Species Tree Estimation
    Syst. Biol. (IF 10.266) Pub Date : 2020-01-24
    Jiao X, Flouris T, Rannala B, et al.

    Recent analyses of genomic sequence data suggest cross-species gene flow is common in both plants and animals, posing challenges to species tree estimation. We examine the levels of gene flow needed to mislead species tree estimation with three species and either episodic introgressive hybridization or continuous migration between an outgroup and one ingroup species. Several species tree estimation methods are examined, including the majority-vote method based on the most common gene tree topology (with either the true or reconstructed gene trees used), the UPGMA method based on the average sequence distances (or average coalescent times) between species, and the full-likelihood method based on multi-locus sequence data. Our results suggest that the majority-vote method based on gene tree topologies is more robust to gene flow than the UPGMA method based on coalescent times and both are more robust than likelihood assuming a multispecies coalescent (MSC) model with no cross-species gene flow. Comparison of the continuous migration model with the episodic introgression model suggests that a small amount of gene flow per generation can cause drastic changes to the genetic history of the species and mislead species tree methods, especially if the species diverged through radiative speciation events. Estimates of parameters under the MSC with gene flow suggest that African mosquito species in the Anopheles gambia species complex constitute such an example of extreme impact of gene flow on species phylogeny.

    更新日期:2020-01-24
  • Hierarchical Hybrid Enrichment: Multi-tiered genomic data collection across evolutionary scales, with application to chorus frogs (Pseudacris)
    Syst. Biol. (IF 10.266) Pub Date : 2019-12-30
    Banker S, Lemmon A, Hassinger A, et al.

    Determining the optimal targets of genomic sub-sampling for phylogenomics, phylogeography, and population genomics remains a challenge for evolutionary biologists. Of the available methods for sub-sampling the genome, hybrid enrichment (sequence capture) has become one of the primary means of data collection for systematics, due to the flexibility and cost efficiency of this approach. Despite the utility of this method, information is lacking as to what genomic targets are most appropriate for addressing questions at different evolutionary scales. In this study, first we compare the benefits of target loci developed for deep- and shallow-scales by comparing these loci at each of three taxonomic levels: within a genus (phylogenetics), within a species (phylogeography) and within a hybrid zone (population genomics). Specifically, we target evolutionarily conserved loci that are appropriate for deeper phylogenetic scales and more rapidly evolving loci that are informative for phylogeographic and population genomic scales. Second, we assess the efficacy of targeting multiple locus sets for different taxonomic levels in the same hybrid enrichment reaction, an approach we term hierarchical hybrid enrichment. Third, we apply this approach to the North American chorus frog genus Pseudacris to answer key evolutionary questions across taxonomic and temporal scales. We demonstrate that in this system the type of genomic target that produces the most resolved gene trees differs depending on the taxonomic level, although the potential for error is substantially lower for the deep-scale loci at all levels. We successfully recover data for the two different locus sets with high efficiency. Using hierarchical data targeting deep and shallow levels: we (a) resolve the phylogeny of the genus Pseudacris and introduce a novel visual and hypothesis-testing method that uses nodal heat maps to examine the robustness of branch support values to the removal of sites and loci; (b) estimate the phylogeographic history of P. feriarum, which reveals up to five independent invasions leading to sympatry with congener P. nigrita to form replicated reinforcement contact zones with ongoing gene flow into sympatry; and (c) quantify with high confidence the frequency of hybridization in one of these zones between P. feriarum and P. nigrita, which is lower than microsatellite-based estimates. We find that the hierarchical hybrid enrichment approach offers an efficient, multi-tiered data collection method for simultaneously addressing questions spanning multiple evolutionary scales.

    更新日期:2019-12-30
  • Interdependent Phenotypic and Biogeographic Evolution Driven by Biotic Interactions
    Syst. Biol. (IF 10.266) Pub Date : 2019-12-20
    Quintero I, Landis M.

    Biotic interactions are hypothesized to be one of the main processes shaping trait and biogeographic evolution during lineage diversification. Theoretical and empirical evidence suggests that species with similar ecological requirements either spatially exclude each other, by preventing the colonization of competitors or by driving coexisting populations to extinction, or show niche divergence when in sympatry. However, the extent and generality of the effect of interspecific competition in trait and biogeographic evolution has been limited by a dearth of appropriate process-generating models to directly test the effect of biotic interactions. Here, we formulate a phylogenetic parametric model that allows interdependence between trait and biogeographic evolution, thus enabling a direct test of central hypotheses on how biotic interactions shape these evolutionary processes. We adopt a Bayesian data augmentation approach to estimate the joint posterior distribution of trait histories, range histories, and co-evolutionary process parameters under this analytically intractable model. Through simulations, we show that our model is capable of distinguishing alternative scenarios of biotic interactions. We apply our model to the radiation of Darwin’s finches—a classic example of adaptive divergence—and find limited support for in situ trait divergence in beak size, but stronger evidence for convergence in traits such as beak shape and tarsus length and for competitive exclusion throughout their evolutionary history. These findings are more consistent with pre-sympatric, rather than post-sympatric, niche divergence. Our modeling framework opens new possibilities for testing more complex hypotheses about the processes underlying lineage diversification. More generally, it provides a robust probabilistic methodology to model correlated evolution of continuous and discrete characters.

    更新日期:2019-12-20
  • The implications of lineage-specific rates for divergence time estimation
    Syst. Biol. (IF 10.266) Pub Date : 2019-12-06
    Carruthers T, Sanderson M, Scotland R.

    Rate variation adds considerable complexity to divergence time estimation in molecular phylogenies. Here, we evaluate the impact of lineage-specific rates – which we define as among-branch-rate-variation that acts consistently across the entire genome. We compare its impact to residual rates – defined as among-branch-rate-variation that shows a different pattern of rate variation at each sampled locus, and gene-specific rates – defined as variation in the average rate across all branches at each sampled locus. We show that lineage-specific rates lead to erroneous divergence time estimates, regardless of how many loci are sampled. Further, we show that stronger lineage-specific rates lead to increasing error. This contrasts to residual rates and gene-specific rates, where sampling more loci significantly reduces error. If divergence times are inferred in a Bayesian framework, we highlight that error caused by lineage-specific rates significantly reduces the probability that the 95% highest posterior density (HPD) includes the correct value, and leads to sensitivity to the prior. Use of a more complex rate prior – which has recently been proposed to model rate variation more accurately – does not affect these conclusions. Finally, we show that the scale of lineage-specific rates used in our simulation experiments is comparable to that of an empirical dataset for the angiosperm genus Ipomoea. Taken together, our findings demonstrate that lineage-specific rates cause error in divergence time estimates, and that this error is not overcome by analysing genomic scale multi-locus datasets.

    更新日期:2019-12-06
  • Estimating diversification rates on incompletely-sampled phylogenies: theoretical concerns and practical solutions
    Syst. Biol. (IF 10.266) Pub Date : 2019-12-05
    Chang J, Rabosky D, Alfaro M.

    Molecular phylogenies are a key source of information about the tempo and mode of species diversification. However, most empirical phylogenies do not contain representatives of all species, such that diversification rates are typically estimated from incompletely sampled data. Most researchers recognize that incomplete sampling can lead to biased rate estimates, but the statistical properties of methods for accommodating incomplete sampling remain poorly known. In this point of view, we demonstrate theoretical concerns with the widespread use of analytical sampling corrections for sparsely sampled phylogenies of higher taxonomic groups. In particular, corrections based on “sampling fractions” can lead to low statistical power to infer rate variation when it is present, depending on the likelihood function used for inference. In the extreme, the sampling fraction correction can lead to spurious patterns of diversification that are driven solely by unbalanced sampling across the tree in concert with low overall power to infer shifts. Stochastic polytomy resolution provides an alternative to sampling fraction approaches that avoids some of these biases. We show that stochastic polytomy resolvers can greatly improve the power of common analyses to estimate shifts in diversification rates. We introduce a new stochastic polytomy resolution method (TACT: Taxonomic Addition for Complete Trees) that uses birth-death-sampling estimators across an ultrametric phylogeny to estimate branching times for unsampled taxa, with taxonomic information to compatibly place new taxa onto a backbone phylogeny. We close with practical recommendations for diversification inference under several common scenarios of incomplete sampling.

    更新日期:2019-12-05
  • Craniodental and postcranial characters of non-avian Dinosauria often imply different trees
    Syst. Biol. (IF 10.266) Pub Date : 2019-11-26
    Li Y, Ruta M, Wills M.

    Despite the increasing importance of molecular sequence data, morphology still makes an important contribution to resolving the phylogeny of many groups, and is the only source of data for most fossils. Most systematists sample morphological characters as broadly as possible on the principle of total evidence. However, it is not uncommon for sampling to be focussed on particular aspects of anatomy, either because characters therein are believed to be more informative, or because preservation biases restrict what is available. Empirically, the optimal trees from partitions of morphological data sets often represent significantly different hypotheses of relationships. Previous work on hard-part versus soft-part characters across animal phyla revealed significant differences in about a half of sampled studies. Similarly, studies of the craniodental versus postcranial characters of vertebrates revealed significantly different trees in about one third of cases, with the highest rates observed in non-avian dinosaurs. We test whether this is a generality here with a much larger sample of 81 published data matrices across all major dinosaur groups. Using the incongruence length difference (ILD) test and two variants of the incongruence relationship difference (IRD) test, we found significant incongruence in about 50% of cases. Incongruence is not uniformly distributed across major dinosaur clades, being highest (63%) in Theropoda and lowest (25%) in Thyreophora. As in previous studies, our partition tests show some sensitivity to matrix dimensions and the amount and distribution of missing entries. Levels of homomplasy and retained synapomorphy are similar between partitions, such that incongruence must partly reflect differences in patterns of homoplasy between partitions, which may itself be a function of modularity and mosaic evolution. Finally, we implement new tests to determine which partition yields trees most similar to those from the entire matrix. Despite no bias across dinosaurs overall, there are striking differences between major groups. The craniodental characters of Ornithischia and the postcranial characters of Saurischia yield trees most similar to the ‘total evidence’ trees derived from the entire matrix. Trees from these same character partitions also tend to be most stratigraphically congruent: a mutual consilience suggesting that those partitions yield more accurate trees.

    更新日期:2019-11-27
  • PHYLOGENETIC CONFLICTS, COMBINABILITY, AND DEEP PHYLOGENOMICS IN PLANTS
    Syst. Biol. (IF 10.266) Pub Date : 2019-11-20
    Smith S, Walker-Hale N, Walker J, et al.

    Studies have demonstrated that pervasive gene tree conflict underlies several important phylogenetic relationships where different species tree methods produce conflicting results. Here, we present a means of dissecting the phylogenetic signal for alternative resolutions within a dataset in order to resolve recalcitrant relationships and, importantly, identify what the dataset is unable to resolve. These procedures extend upon methods for isolating conflict and concordance involving specific candidate relationships and can be used to identify systematic error and disambiguate sources of conflict among species tree inference methods. We demonstrate these on a large phylogenomic plant dataset. Our results support the placement of Amborella as sister to the remaining extant angiosperms, Gnetales as sister to pines, and the monophyly of extant gymnosperms. Several other contentious relationships, including the resolution of relationships within the bryophytes and the eudicots, remain uncertain given the low number of supporting gene trees. To address whether concatenation of filtered genes amplified phylogenetic signal for relationships, we implemented a combinatorial heuristic to test combinability of genes. We found that nested conflicts limited the ability of data filtering methods to fully ameliorate conflicting signal amongst gene trees. These analyses confirmed that the underlying conflicting signal does not support broad concatenation of genes. Our approach provides a means of dissecting a specific dataset to address deep phylogenetic relationships while also identifying the inferential boundaries of the dataset.

    更新日期:2019-11-20
  • A phenotype-genotype codon model for detecting adaptive evolution
    Syst. Biol. (IF 10.266) Pub Date : 2019-11-15
    Jones C, Youssef N, Susko E, et al.

    A central objective in biology is to link adaptive evolution in a gene to structural and/or functional phenotypic novelties. Yet most analytic methods make inferences mainly from either phenotypic data or genetic data alone. A small number of models have been developed to infer correlations between the rate of molecular evolution and changes in a discrete or continuous life history trait. But such correlations are not necessarily evidence of adaptation. Here we present a novel approach called the phenotype-genotype branch-site model (PG-BSM) designed to detect evidence of adaptive codon evolution associated with discrete-state phenotype evolution. An episode of adaptation is inferred under standard codon substitution models when there is evidence of positive selection in the form of an elevation in the nonsynonymous-to-synonymous rate ratio ω to a value ω > 1. As it is becoming increasingly clear that ω > 1 can occur without adaptation, the PG-BSM was formulated to infer an instance of adaptive evolution without appealing to evidence of positive selection. The null model makes use of a covarion-like component to account for general heterotachy (i.e., random changes in the evolutionary rate at a site over time). The alternative model employs samples of the phenotypic evolutionary history to test for phenomenological patterns of heterotachy consistent with specific mechanisms of molecular adaptation. These include (i) a persistent increase/decrease in ω at a site following a change in phenotype (the pattern) consistent with an increase/decrease in the functional importance of the site (the mechanism); and (ii) a transient increase in ω at a site along a branch over which the phenotype changed (the pattern) consistent with a change in the site’s optimal amino acid (the mechanism). Rejection of the null is followed by post hoc analyses to identify sites with strongest evidence for adaptation in association with changes in the phenotype as well as the most likely evolutionary history of the phenotype. Simulation studies based on a novel method for generating mechanistically realistic signatures of molecular adaptation show that the PG-BSM has good statistical properties. Analyses of three real alignments show that site patterns identified post hoc are consistent with the specific mechanisms of adaptation included in the alternate model. Further simulation studies show that the covarion-like component of the PG-BSM plays a crucial role in mitigating recently discovered statistical pathologies associated with confounding by accounting for heterotachy-by-any-means.

    更新日期:2019-11-15
  • Species Selection Regime and Phylogenetic Tree Shape
    Syst. Biol. (IF 10.266) Pub Date : 2019-11-15
    Verboom G, Boucher F, Ackerly D, et al.

    Species selection, the effect of heritable traits in generating between-lineage diversification rate differences, provides a valuable conceptual framework for understanding the relationship between traits, diversification and phylogenetic tree shape. An important challenge, however, is that the nature of real diversification landscapes – curves or surfaces which describe the propensity of species-level lineages to diversify as a function of one or more traits – remains poorly understood. Here we present a novel, time-stratified extension of the QuaSSE model in which speciation/extinction rate is specified as a static or temporally-shifting Gaussian or skewed-Gaussian function of the diversification trait. We then use simulations to show that the generally imbalanced nature of real phylogenetic trees, as well as their generally greater-than-expected frequency of deep branching events, are typical outcomes when diversification is treated as a dynamic, trait-dependent process. Focusing on four basic models (Gaussian-speciation with and without background extinction; skewed-speciation; Gaussian-extinction), we also show that particular features of the species selection regime produce distinct tree shape signatures and that, consequently, a combination of tree shape metrics has the potential to reveal the species selection regime under which a particular lineage diversified. We evaluate this idea empirically by comparing the phylogenetic trees of plant lineages diversifying within climatically- and geologically-stable environments of the Greater Cape Floristic Region, with those of lineages diversifying in environments that have experienced major change through the Late Miocene-Pliocene. Consistent with our expectations, the trees of lineages diversifying in a dynamic context are less balanced, show a greater concentration of branching events close to the present, and display stronger diversification rate-trait correlations. We suggest that species selection plays an important role in shaping phylogenetic trees but recognize the need for an explicit probabilistic framework within which to assess the likelihoods of alternative diversification scenarios as explanations of a particular tree shape.

    更新日期:2019-11-15
  • Reticulate evolution helps explain apparent homoplasy in floral biology and pollination in baobabs (Adansonia; Bombacoideae; Malvaceae)
    Syst. Biol. (IF 10.266) Pub Date : 2019-11-06
    Karimi N, Grover C, Gallagher J, et al.

    Baobabs (Adansonia) are a cohesive group of tropical trees with a disjunct distribution in Australia, Madagascar, and continental Africa, and diverse flowers associated with two pollination modes. We used custom targeted sequence capture in conjunction with new and existing phylogenetic comparative methods to explore the evolution of floral traits and pollination systems while allowing for reticulate evolution. Our analyses suggest that relationships in Adansonia are confounded by reticulation, with network inference methods supporting at least one reticulation event. The best supported hypothesis involves introgression between A. rubrostipa and core Longitubae, both of which are hawkmoth pollinated with yellow/red flowers, but there is also some support for introgression between the African lineage and Malagasy Brevitubae, which are both mammal-pollinated with white flowers. New comparative methods for phylogenetic networks were developed that allow maximum-likelihood inference of ancestral states and were applied to study the apparent homoplasy in floral biology and pollination mode seen in Adansonia. This analysis supports a role for introgressive hybridization in morphological evolution even in a clade with highly divergent and geographically widespread species. Our new comparative methods for discrete traits on species networks are implemented in the software PhyloNetworks.

    更新日期:2019-11-06
  • Impossibility Results on Stability of Phylogenetic Consensus Methods
    Syst. Biol. (IF 10.266) Pub Date : 2019-11-06
    Delucchi E, Hoessly L, Paolini G.

    We answer two questions raised by Bryant, Francis and Steel in their work on consensus methods in phylogenetics. Consensus methods apply to every practical instance where it is desired to aggregate a set of given phylogenetic trees (say, gene evolution trees) into a resulting, “consensus” tree (say, a species tree). Various stability criteria have been explored in this context, seeking to model desirable consistency properties of consensus methods as the experimental data is updated (e.g., more taxa, or more trees, are mapped). However, such stability conditions can be incompatible with some basic regularity properties that are widely accepted to be essential in any meaningful consensus method. Here, we prove that such an incompatibility does arise in the case of extension stability on binary trees and in the case of associative stability. Our methods combine general theoretical considerations with the use of computer programs tailored to the given stability requirements.

    更新日期:2019-11-06
  • Model-based species delimitation: are coalescent species reproductively isolated?
    Syst. Biol. (IF 10.266) Pub Date : 2019-11-05
    Campillo L, Barley A, Thomson R.

    A large and growing fraction of systematists define species as independently evolving lineages that may be recognized by analyzing the population genetic history of alleles sampled from individuals belonging to those species. This has motivated the development of increasingly sophisticated statistical models rooted in the multispecies coalescent process. Specifically, these models allow for simultaneous estimation of the number of species present in a sample of individuals and the phylogenetic history of those species using only DNA sequence data from independent loci. These methods hold extraordinary promise for increasing the efficiency of species discovery, but require extensive validation to ensure that they are accurate and precise. Whether the species identified by these methods correspond to the species that would be recognized by alternative species recognition criteria (such as measurements of reproductive isolation) is currently an open question and a subject of vigorous debate. Here we perform an empirical test of these methods by making use of a classic model system in the history of speciation research, flies of the genus Drosophila. Specifically, we use the uniquely comprehensive data on reproductive isolation that is available for this system, along with DNA sequence data, to ask whether Drosophila species inferred under the multispecies coalescent model correspond to those recognized by many decades of speciation research. We found that coalescent based and reproductive isolation based methods of inferring species boundaries are concordant for 77% of the species pairs. We explore and discuss potential explanations for these discrepancies. We also found that the amount of prezygotic isolation between two species is a strong predictor of the posterior probability of species boundaries based on DNA sequence data, regardless of whether the species pairs are sympatrically or allopatrically distributed.

    更新日期:2019-11-05
  • A Relaxed Directional Random Walk Model for Phylogenetic Trait Evolution.
    Syst. Biol. (IF 10.266) Pub Date : 2016-11-01
    Mandev S Gill,Lam Si Tung Ho,Guy Baele,Philippe Lemey,Marc A Suchard

    Understanding the processes that give rise to quantitative measurements associated with molecular sequence data remains an important issue in statistical phylogenetics. Examples of such measurements include geographic coordinates in the context of phylogeography and phenotypic traits in the context of comparative studies. A popular approach is to model the evolution of continuously varying traits as a Brownian diffusion process acting on a phylogenetic tree. However, standard Brownian diffusion is quite restrictive and may not accurately characterize certain trait evolutionary processes. Here, we relax one of the major restrictions of standard Brownian diffusion by incorporating a nontrivial estimable mean into the process. We introduce a relaxed directional random walk (RDRW) model for the evolution of multivariate continuously varying traits along a phylogenetic tree. Notably, the RDRW model accommodates branch-specific variation of directional trends while preserving model identifiability. Furthermore, our development of a computationally efficient dynamic programming approach to compute the data likelihood enables scaling of our method to large data sets frequently encountered in phylogenetic comparative studies and viral evolution. We implement the RDRW model in a Bayesian inference framework to simultaneously reconstruct the evolutionary histories of molecular sequence data and associated multivariate continuous trait data, and provide tools to visualize evolutionary reconstructions. We demonstrate the performance of our model on synthetic data, and we illustrate its utility in two viral examples. First, we examine the spatiotemporal spread of HIV-1 in central Africa and show that the RDRW model uncovers a clearer, more detailed picture of the dynamics of viral dispersal than standard Brownian diffusion. Second, we study antigenic evolution in the context of HIV-1 resistance to three broadly neutralizing antibodies. Our analysis reveals evidence of a continuous drift at the HIV-1 population level towards enhanced resistance to neutralization by the VRC01 monoclonal antibody over the course of the epidemic. [Brownian Motion; Diffusion Processes; Phylodynamics; Phylogenetics; Phylogeography; Trait Evolution.].

    更新日期:2019-11-01
  • Genealogical Working Distributions for Bayesian Model Testing with Phylogenetic Uncertainty.
    Syst. Biol. (IF 10.266) Pub Date : 2015-11-04
    Guy Baele,Philippe Lemey,Marc A Suchard

    Marginal likelihood estimates to compare models using Bayes factors frequently accompany Bayesian phylogenetic inference. Approaches to estimate marginal likelihoods have garnered increased attention over the past decade. In particular, the introduction of path sampling (PS) and stepping-stone sampling (SS) into Bayesian phylogenetics has tremendously improved the accuracy of model selection. These sampling techniques are now used to evaluate complex evolutionary and population genetic models on empirical data sets, but considerable computational demands hamper their widespread adoption. Further, when very diffuse, but proper priors are specified for model parameters, numerical issues complicate the exploration of the priors, a necessary step in marginal likelihood estimation using PS or SS. To avoid such instabilities, generalized SS (GSS) has recently been proposed, introducing the concept of "working distributions" to facilitate--or shorten--the integration process that underlies marginal likelihood estimation. However, the need to fix the tree topology currently limits GSS in a coalescent-based framework. Here, we extend GSS by relaxing the fixed underlying tree topology assumption. To this purpose, we introduce a "working" distribution on the space of genealogies, which enables estimating marginal likelihoods while accommodating phylogenetic uncertainty. We propose two different "working" distributions that help GSS to outperform PS and SS in terms of accuracy when comparing demographic and evolutionary models applied to synthetic data and real-world examples. Further, we show that the use of very diffuse priors can lead to a considerable overestimation in marginal likelihood when using PS and SS, while still retrieving the correct marginal likelihood using both GSS approaches. The methods used in this article are available in BEAST, a powerful user-friendly software package to perform Bayesian evolutionary analyses.

    更新日期:2019-11-01
  • Displayed Trees Do Not Determine Distinguishability Under the Network Multispecies Coalescent.
    Syst. Biol. (IF 10.266) Pub Date : 2016-11-01
    Sha Zhu,James H Degnan

    Recent work in estimating species relationships from gene trees has included inferring networks assuming that past hybridization has occurred between species. Probabilistic models using the multispecies coalescent can be used in this framework for likelihood-based inference of both network topologies and parameters, including branch lengths and hybridization parameters. A difficulty for such methods is that it is not always clear whether, or to what extent, networks are identifiable-that is whether there could be two distinct networks that lead to the same distribution of gene trees. For cases in which incomplete lineage sorting occurs in addition to hybridization, we demonstrate a new representation of the species network likelihood that expresses the probability distribution of the gene tree topologies as a linear combination of gene tree distributions given a set of species trees. This representation makes it clear that in some cases in which two distinct networks give the same distribution of gene trees when sampling one allele per species, the two networks can be distinguished theoretically when multiple individuals are sampled per species. This result means that network identifiability is not only a function of the trees displayed by the networks but also depends on allele sampling within species. We additionally give an example in which two networks that display exactly the same trees can be distinguished from their gene trees even when there is only one lineage sampled per species. [gene tree, hybridization, identifiability, maximum likelihood, species tree, phylogeny.].

    更新日期:2019-11-01
  • Understanding Past Population Dynamics: Bayesian Coalescent-Based Modeling with Covariates.
    Syst. Biol. (IF 10.266) Pub Date : 2016-07-03
    Mandev S Gill,Philippe Lemey,Shannon N Bennett,Roman Biek,Marc A Suchard

    Effective population size characterizes the genetic variability in a population and is a parameter of paramount importance in population genetics and evolutionary biology. Kingman's coalescent process enables inference of past population dynamics directly from molecular sequence data, and researchers have developed a number of flexible coalescent-based models for Bayesian nonparametric estimation of the effective population size as a function of time. Major goals of demographic reconstruction include identifying driving factors of effective population size, and understanding the association between the effective population size and such factors. Building upon Bayesian nonparametric coalescent-based approaches, we introduce a flexible framework that incorporates time-varying covariates that exploit Gaussian Markov random fields to achieve temporal smoothing of effective population size trajectories. To approximate the posterior distribution, we adapt efficient Markov chain Monte Carlo algorithms designed for highly structured Gaussian models. Incorporating covariates into the demographic inference framework enables the modeling of associations between the effective population size and covariates while accounting for uncertainty in population histories. Furthermore, it can lead to more precise estimates of population dynamics. We apply our model to four examples. We reconstruct the demographic history of raccoon rabies in North America and find a significant association with the spatiotemporal spread of the outbreak. Next, we examine the effective population size trajectory of the DENV-4 virus in Puerto Rico along with viral isolate count data and find similar cyclic patterns. We compare the population history of the HIV-1 CRF02_AG clade in Cameroon with HIV incidence and prevalence data and find that the effective population size is more reflective of incidence rate. Finally, we explore the hypothesis that the population dynamics of musk ox during the Late Quaternary period were related to climate change. [Coalescent; effective population size; Gaussian Markov random fields; phylodynamics; phylogenetics; population genetics.

    更新日期:2019-11-01
  • Estimating Bayesian Phylogenetic Information Content.
    Syst. Biol. (IF 10.266) Pub Date : 2016-05-08
    Paul O Lewis,Ming-Hui Chen,Lynn Kuo,Louise A Lewis,Karolina Fučíková,Suman Neupane,Yu-Bo Wang,Daoyuan Shi

    Measuring the phylogenetic information content of data has a long history in systematics. Here we explore a Bayesian approach to information content estimation. The entropy of the posterior distribution compared with the entropy of the prior distribution provides a natural way to measure information content. If the data have no information relevant to ranking tree topologies beyond the information supplied by the prior, the posterior and prior will be identical. Information in data discourages consideration of some hypotheses allowed by the prior, resulting in a posterior distribution that is more concentrated (has lower entropy) than the prior. We focus on measuring information about tree topology using marginal posterior distributions of tree topologies. We show that both the accuracy and the computational efficiency of topological information content estimation improve with use of the conditional clade distribution, which also allows topological information content to be partitioned by clade. We explore two important applications of our method: providing a compelling definition of saturation and detecting conflict among data partitions that can negatively affect analyses of concatenated data. [Bayesian; concatenation; conditional clade distribution; entropy; information; phylogenetics; saturation.].

    更新日期:2019-11-01
  • A Rapid and Scalable Method for Multilocus Species Delimitation Using Bayesian Model Comparison and Rooted Triplets.
    Syst. Biol. (IF 10.266) Pub Date : 2016-04-09
    Tomochika Fujisawa,Amr Aswad,Timothy G Barraclough

    Multilocus sequence data provide far greater power to resolve species limits than the single locus data typically used for broad surveys of clades. However, current statistical methods based on a multispecies coalescent framework are computationally demanding, because of the number of possible delimitations that must be compared and time-consuming likelihood calculations. New methods are therefore needed to open up the power of multilocus approaches to larger systematic surveys. Here, we present a rapid and scalable method that introduces 2 new innovations. First, the method reduces the complexity of likelihood calculations by decomposing the tree into rooted triplets. The distribution of topologies for a triplet across multiple loci has a uniform trinomial distribution when the 3 individuals belong to the same species, but a skewed distribution if they belong to separate species with a form that is specified by the multispecies coalescent. A Bayesian model comparison framework was developed and the best delimitation found by comparing the product of posterior probabilities of all triplets. The second innovation is a new dynamic programming algorithm for finding the optimum delimitation from all those compatible with a guide tree by successively analyzing subtrees defined by each node. This algorithm removes the need for heuristic searches used by current methods, and guarantees that the best solution is found and potentially could be used in other systematic applications. We assessed the performance of the method with simulated, published, and newly generated data. Analyses of simulated data demonstrate that the combined method has favorable statistical properties and scalability with increasing sample sizes. Analyses of empirical data from both eukaryotes and prokaryotes demonstrate its potential for delimiting species in real cases.

    更新日期:2019-11-01
  • Does Gene Tree Discordance Explain the Mismatch between Macroevolutionary Models and Empirical Patterns of Tree Shape and Branching Times?
    Syst. Biol. (IF 10.266) Pub Date : 2016-03-13
    Tanja Stadler,James H Degnan,Noah A Rosenberg

    Classic null models for speciation and extinction give rise to phylogenies that differ in distribution from empirical phylogenies. In particular, empirical phylogenies are less balanced and have branching times closer to the root compared to phylogenies predicted by common null models. This difference might be due to null models of the speciation and extinction process being too simplistic, or due to the empirical datasets not being representative of random phylogenies. A third possibility arises because phylogenetic reconstruction methods often infer gene trees rather than species trees, producing an incongruity between models that predict species tree patterns and empirical analyses that consider gene trees. We investigate the extent to which the difference between gene trees and species trees under a combined birth-death and multispecies coalescent model can explain the difference in empirical trees and birth-death species trees. We simulate gene trees embedded in simulated species trees and investigate their difference with respect to tree balance and branching times. We observe that the gene trees are less balanced and typically have branching times closer to the root than the species trees. Empirical trees from TreeBase are also less balanced than our simulated species trees, and model gene trees can explain an imbalance increase of up to 8% compared to species trees. However, we see a much larger imbalance increase in empirical trees, about 100%, meaning that additional features must also be causing imbalance in empirical trees. This simulation study highlights the necessity of revisiting the assumptions made in phylogenetic analyses, as these assumptions, such as equating the gene tree with the species tree, might lead to a biased conclusion.

    更新日期:2019-11-01
  • Comparison of Target-Capture and Restriction-Site Associated DNA Sequencing for Phylogenomics: A Test in Cardinalid Tanagers (Aves, Genus: Piranga).
    Syst. Biol. (IF 10.266) Pub Date : 2016-01-30
    Joseph D Manthey,Luke C Campillo,Kevin J Burns,Robert G Moyle

    Restriction-site associated DNA sequencing (RAD-seq) and target capture of specific genomic regions, such as ultraconserved elements (UCEs), are emerging as two of the most popular methods for phylogenomics using reduced-representation genomic data sets. These two methods were designed to target different evolutionary timescales: RAD-seq was designed for population-genomic level questions and UCEs for deeper phylogenetics. The utility of both data sets to infer phylogenies across a variety of taxonomic levels has not been adequately compared within the same taxonomic system. Additionally, the effects of uninformative gene trees on species tree analyses (for target capture data) have not been explored. Here, we utilize RAD-seq and UCE data to infer a phylogeny of the bird genus Piranga The group has a range of divergence dates (0.5-6 myr), contains 11 recognized species, and lacks a resolved phylogeny. We compared two species tree methods for the RAD-seq data and six species tree methods for the UCE data. Additionally, in the UCE data, we analyzed a complete matrix as well as data sets with only highly informative loci. A complete matrix of 189 UCE loci with 10 or more parsimony informative (PI) sites, and an approximately 80% complete matrix of 1128 PI single-nucleotide polymorphisms (SNPs) (from RAD-seq) yield the same fully resolved phylogeny of Piranga We inferred non-monophyletic relationships of Piranga lutea individuals, with all other a priori species identified as monophyletic. Finally, we found that species tree analyses that included predominantly uninformative gene trees provided strong support for different topologies, with consistent phylogenetic results when limiting species tree analyses to highly informative loci or only using less informative loci with concatenation or methods meant for SNPs alone.

    更新日期:2019-11-01
  • A Bayesian Supertree Model for Genome-Wide Species Tree Reconstruction.
    Syst. Biol. (IF 10.266) Pub Date : 2014-10-05
    Leonardo De Oliveira Martins,Diego Mallo,David Posada

    Current phylogenomic data sets highlight the need for species tree methods able to deal with several sources of gene tree/species tree incongruence. At the same time, we need to make most use of all available data. Most species tree methods deal with single processes of phylogenetic discordance, namely, gene duplication and loss, incomplete lineage sorting (ILS) or horizontal gene transfer. In this manuscript, we address the problem of species tree inference from multilocus, genome-wide data sets regardless of the presence of gene duplication and loss and ILS therefore without the need to identify orthologs or to use a single individual per species. We do this by extending the idea of Maximum Likelihood (ML) supertrees to a hierarchical Bayesian model where several sources of gene tree/species tree disagreement can be accounted for in a modular manner. We implemented this model in a computer program called guenomu whose inputs are posterior distributions of unrooted gene tree topologies for multiple gene families, and whose output is the posterior distribution of rooted species tree topologies. We conducted extensive simulations to evaluate the performance of our approach in comparison with other species tree approaches able to deal with more than one leaf from the same species. Our method ranked best under simulated data sets, in spite of ignoring branch lengths, and performed well on empirical data, as well as being fast enough to analyze relatively large data sets. Our Bayesian supertree method was also very successful in obtaining better estimates of gene trees, by reducing the uncertainty in their distributions. In addition, our results show that under complex simulation scenarios, gene tree parsimony is also a competitive approach once we consider its speed, in contrast to more sophisticated models.

    更新日期:2019-11-01
  • Best practices for justifying fossil calibrations.
    Syst. Biol. (IF 10.266) Pub Date : 2011-11-23
    James F Parham,Philip C J Donoghue,Christopher J Bell,Tyler D Calway,Jason J Head,Patricia A Holroyd,Jun G Inoue,Randall B Irmis,Walter G Joyce,Daniel T Ksepka,José S L Patané,Nathan D Smith,James E Tarver,Marcel van Tuinen,Ziheng Yang,Kenneth D Angielczyk,Jenny M Greenwood,Christy A Hipsley,Louis Jacobs,Peter J Makovicky,Johannes Müller,Krister T Smith,Jessica M Theodor,Rachel C M Warnock,Michael J Benton

    更新日期:2019-11-01
  • Testing and quantifying phylogenetic signals and homoplasy in morphometric data.
    Syst. Biol. (IF 10.266) Pub Date : 2010-06-09
    Christian Peter Klingenberg,Nelly A Gidaszewski

    The relationship between morphometrics and phylogenetic analysis has long been controversial. Here we propose an approach that is based on mapping morphometric traits onto phylogenies derived from other data and thus avoids the pitfalls encountered by previous studies. This method treats shape as a single, multidimensional character. We propose a test for the presence of a phylogenetic signal in morphometric data, which simulates the null hypothesis of the complete absence of phylogenetic structure by permutation of the shape data among the terminal taxa. We also propose 2 measures of the fit of morphometric data to the phylogeny that are direct extensions of the consistency index and retention index used in traditional cladistics. We apply these methods to a small study of the evolution of wing shape in the Drosophila melanogaster subgroup, for which a very strongly supported phylogeny is available. This case study reveals a significant phylogenetic signal and a relatively low degree of homoplasy. Despite the low homoplasy, the shortest tree computed from landmark data on wing shape is inconsistent with the well-supported phylogenetic tree from molecular data, underscoring that morphometric data may not provide reliable information for inferring phylogeny.

    更新日期:2019-11-01
  • The impact of the representation of fossil calibrations on Bayesian estimation of species divergence times.
    Syst. Biol. (IF 10.266) Pub Date : 2010-06-09
    Jun Inoue,Philip C J Donoghue,Ziheng Yang

    Bayesian inference provides a powerful framework for integrating different sources of information (in particular, molecules and fossils) to derive estimates of species divergence times. Indeed, it is currently the only framework that can adequately account for uncertainties in fossil calibrations. We use 2 Bayesian Markov chain Monte Carlo programs, MULTIDIVTIME and MCMCTREE, to analyze 3 empirical datasets to estimate divergence times in amphibians, actinopterygians, and felids. We evaluate the impact of various factors, including the priors on rates and times, fossil calibrations, substitution model, the violation of the molecular clock and the rate-drift model, and the exact and approximate likelihood calculation. Assuming the molecular clock caused seriously biased time estimates when the clock is violated, but 2 different rate-drift models produced similar estimates. The prior on times, which incorporates fossil-calibration information, had the greatest impact on posterior time estimation. In particular, the strategies used by the 2 programs to incorporate minimum- and maximum-age bounds led to very different time priors and were responsible for large differences in posterior time estimates in a previous study. The results highlight the critical importance of fossil calibrations to molecular dating and the need for probabilistic modeling of fossil depositions, preservations, and sampling to provide statistical summaries of information in the fossil record concerning species divergence times.

    更新日期:2019-11-01
  • Species concepts and species delimitation.
    Syst. Biol. (IF 10.266) Pub Date : 2007-11-21
    Kevin De Queiroz

    The issue of species delimitation has long been confused with that of species conceptualization, leading to a half century of controversy concerning both the definition of the species category and methods for inferring the boundaries and numbers of species. Alternative species concepts agree in treating existence as a separately evolving metapopulation lineage as the primary defining property of the species category, but they disagree in adopting different properties acquired by lineages during the course of divergence (e.g., intrinsic reproductive isolation, diagnosability, monophyly) as secondary defining properties (secondary species criteria). A unified species concept can be achieved by treating existence as a separately evolving metapopulation lineage as the only necessary property of species and the former secondary species criteria as different lines of evidence (operational criteria) relevant to assessing lineage separation. This unified concept of species has several consequences for species delimitation, including the following: First, the issues of species conceptualization and species delimitation are clearly separated; the former secondary species criteria are no longer considered relevant to species conceptualization but only to species delimitation. Second, all of the properties formerly treated as secondary species criteria are relevant to species delimitation to the extent that they provide evidence of lineage separation. Third, the presence of any one of the properties (if appropriately interpreted) is evidence for the existence of a species, though more properties and thus more lines of evidence are associated with a higher degree of corroboration. Fourth, and perhaps most significantly, a unified species concept shifts emphasis away from the traditional species criteria, encouraging biologists to develop new methods of species delimitation that are not tied to those properties.

    更新日期:2019-11-01
  • Toward understanding Anophelinae (Diptera, Culicidae) phylogeny: insights from nuclear single-copy genes and the weight of evidence.
    Syst. Biol. (IF 10.266) Pub Date : 2002-07-16
    J Krzywinski,R C Wilkerson,N J Besansky

    A phylogeny of the mosquito subfamily Anophelinae was inferred from fragments of two protein-coding nuclear genes, G6pd (462 bp) and white (801 bp), and from a combined data set (2,136 bp) that included a portion of the mitochondrial gene ND5 and the D2 region of the ribosomal 28S gene. Sixteen species from all three anopheline genera and six Anopheles subgenera were sampled, along with six species of other mosquitoes used as an outgroup. Each of four genes analyzed individually recovered the same well-supported clades; topological incongruence was limited to unsupported or poorly supported nodes. As assessed by the incongruence length difference test, most of the conflicting signal was contributed by third codon positions. Strong structural constraints, as observed in white and G6pd, apparently had little impact on phylogenetic inference. Compared with the other genes, white provided a superior source of phylogenetic information. However, white appears to have experienced accelerated rates of evolution in few lineages, the affinities of which are therefore suspect. In combined analyses, most of the inferred relationship were well-supported and in agreement with previous studies: monophyly of Anophelinae, basal position of Chagasia, monophyly of Anopheles subgenera, and subgenera Nyssorhynchus + Kerteszia as sister taxa. The results suggested also monophyletic origin of subgenera Cellia + Anopheles, and the white gene analysis supported genus Bironella as a sister taxon to Anopheles. The present data and other available evidence suggest a South American origin of Anophelinae, probably in the Mesozoic; a rapid diversification of Bironella and basal subgeneric lineages of Anopheles, potentially associated with the breakup of Gondwanaland; and a relatively recent and rapid dispersion of subgenus Anopheles.

    更新日期:2019-11-01
  • Phylogenetic utility of different types of molecular data used to infer evolutionary relationships among stalk-eyed flies (Diopsidae).
    Syst. Biol. (IF 10.266) Pub Date : 2002-07-16
    R H Baker,G S Wilkinson,R DeSalle

    A phylogenetic hypothesis of relationships among 33 species of stalk-eyed flies was generated from a molecular data set comprising three mitochondrial and three nuclear gene regions. A combined analysis of all the data equally weighted produced a single most-parsimonious cladogram with relatively strong support at the majority of nodes. The phylogenetic utility of different classes of molecular data was also examined. In particular, using a number of different measures of utility in both a combined and separate analysis framework, we focused on the distinction between mitochondrial and nuclear genes and between faster-evolving characters and slower-evolving characters. For the first comparison, by nearly any measure of utility, the nuclear genes are substantially more informative for resolving diopsid relationships than are the mitochondrial genes. The nuclear genes exhibit less homoplasy, are less incongruent with one another and with the combined data, and contribute more support to the combined analysis topology than do the mitochondrial genes. Results from the second comparison, however, provide little evidence of a clear difference in utility. Despite indications of rapid divergence and saturation, faster-evolving characters in both the nuclear and mitochondrial data sets still provide substantial phylogenetic signal. In general, inclusion of the more rapidly evolving data consistently improves the congruence among partitions.

    更新日期:2019-11-01
  • 更新日期:2019-11-01
  • Nematode small subunit phylogeny correlates with alignment parameters.
    Syst. Biol. (IF 10.266) Pub Date : 2007-03-10
    Ashleigh B Smythe,Michael J Sanderson,Steven A Nadler

    The number of nuclear small subunit (SSU) ribosomal RNA (rRNA) sequences for Nematoda has increased dramatically in recent years, and although their use in constructing phylogenies has also increased, relatively little attention has been given to their alignment. Here we examined the sensitivity of the nematode SSU data set to different alignment parameters and to the removal of alignment ambiguous regions. Ten alignments were created with CLUSTAL W using different sets of alignment parameters (10 full alignments), and each alignment was examined by eye and alignment ambiguous regions were removed (creating 10 reduced alignments). These alignment ambiguous regions were analyzed as a third type of data set, culled alignments. Maximum parsimony, neighbor-joining, and parsimony bootstrap analyses were performed. The resulting phylogenies were compared to each other by the symmetric difference distance tree comparison metric (SymD). The correlation of the phylogenies with the alignment parameters was tested by comparing matrices from SymD with corresponding matrices of Manhattan distances representing the alignment parameters. Differences among individual parsimony trees from the full alignments were frequently correlated with the differences among alignment parameters (580/1000 tests), as were trees from the culled alignments (403/1000 tests). Differences among individual parsimony trees from the reduced alignments were less frequently correlated with the differences among alignment parameters (230/1000 tests). Differences among majority-rule consensus trees (50%) from the parsimony analysis of the full alignments were significantly correlated with the differences among alignment parameters, whereas consensus trees from the reduced and culled analyses were not correlated with the alignment parameters. These patterns of correlation confirm that choice of alignment parameters has the potential to bias the resultant phylogenies for the nematode SSU data set, and suggest that the removal of alignment ambiguous regions reduces this effect. Finally, we discuss the implications of conservative phylogenetic hypotheses for Nematoda produced by exploring alignment space and removing alignment ambiguous regions for SSU rDNA.

    更新日期:2019-11-01
  • Mapping uncertainty and phylogenetic uncertainty in ancestral character state reconstruction: an example in the moss genus Brachytheciastrum.
    Syst. Biol. (IF 10.266) Pub Date : 2007-03-10
    A Vanderpoorten,B Goffinet

    The evolution of species traits along a phylogeny can be examined through an increasing number of possible, but not necessarily complementary, approaches. In this paper, we assess whether deriving ancestral states of discrete morphological characters from a model whose parameters are (i) optimized by ML on a most likely tree; (II) optimized by ML onto each of a Bayesian sample of trees; and (III) sampled by a MCMC visiting the space of a Bayesian sample of trees affects the reconstruction of ancestral states in the moss genus Brachytheciastrum. In the first two methods, the choice of a single- or two-rate model and of a genetic distance (wherein branch lengths are used to determine the probabilities of change) or speciational (wherein changes are only driven by speciation events) model based upon a likelihood-ratio test strongly depended on the sampled trees. Despite these differences in model selection, reconstructions of ancestral character states were strongly correlated to each others across nodes, often at r > 0.9, for all the characters. The Bayesian approach of ancestral character state reconstruction offers, however, a series of advantages over the single-tree approach or the ML model optimization on a Bayesian sample of trees because it does not involve restricting model parameters prior to reconstructing ancestral states, but rather allows a range of model parameters and ancestral character states to be sampled according to their posterior probabilities. From the distribution of the latter, conclusions on trait evolution can be made in a more satisfactorily way than when a substantial part of the uncertainty of the results is obscured by the focus on a single set of model parameters and associated ancestral states. The reconstructions of ancestral character states in Brachytheciastrum reveal rampant parallel morphological evolution. Most species previously described based on phenetic grounds are thus resolved of polyphyletic origin. Species polyphylly has been increasingly reported among mosses, raising severe reservations regarding current species definition.

    更新日期:2019-11-01
  • Short interspersed elements (SINEs) in plants: origin, classification, and use as phylogenetic markers.
    Syst. Biol. (IF 10.266) Pub Date : 2007-03-10
    Jean-Marc Deragon,Xiaoyu Zhang

    Short interspersed elements (SINEs) are a class of dispersed mobile sequences that use RNA as an intermediate in an amplification process called retroposition. The presence-absence of a SINE at a given locus has been used as a meaningful classification criterion to evaluate phylogenetic relations among species. We review here recent developments in the characterisation of plant SINEs and their use as molecular makers to retrace phylogenetic relations among wild and cultivated Oryza and Brassica species. In Brassicaceae, further use of SINE markers is limited by our partial knowledge of endogenous SINE families (their origin and evolution histories) and by the absence of a clear classification. To solve this problem, phylogenetic relations among all known Brassicaceae SINEs were analyzed and a new classification, grouping SINEs in 15 different families, is proposed. The relative age and size of each Brassicaceae SINE family was evaluated and new phylogenetically supported subfamilies were described. We also present evidence suggesting that new potentially active SINEs recently emerged in Brassica oleracea from the shuffling of preexisting SINE portions. Finally, the comparative evolution history of SINE families present in Arabidopsis thaliana and Brassica oleracea revealed that SINEs were in general more active in the Brassica lineage. The importance of these new data for the use of Brassicaceae SINEs as molecular markers in future applications is discussed.

    更新日期:2019-11-01
  • Automated scanning for phylogenetically informative transposed elements in rodents.
    Syst. Biol. (IF 10.266) Pub Date : 2007-03-10
    Astrid Farwick,Ursula Jordan,Georg Fuellen,Dorothée Huchon,François Catzeflis,Jürgen Brosius,Jürgen Schmitz

    Transposed elements constitute an attractive, useful source of phylogenetic markers to elucidate the evolutionary history of their hosts. Frequent and successive amplifications over evolutionary time are important requirements for utilizing their presence or absence as landmarks of evolution. Although transposed elements are well distributed in rodent taxa, the generally high degree of genomic sequence divergence among species complicates our access to presence/absence data. With this in mind we developed a novel, high-throughput computational strategy, called CPAL (Conserved Presence/Absence Locus-finder), to identify genome-wide distributed, phylogenetically informative transposed elements flanked by highly conserved regions. From a total of 232 extracted chromosomal mouse loci we randomly selected 14 of these plus 2 others from previous test screens and attempted to amplify them via PCR in representative rodent species. All loci were amplifiable and ultimately contributed 31 phylogenetically informative markers distributed throughout the major groups of Rodentia.

    更新日期:2019-11-01
  • SINEs of a nearly perfect character.
    Syst. Biol. (IF 10.266) Pub Date : 2007-03-10
    David A Ray,Jinchuan Xing,Abdel-Halim Salem,Mark A Batzer

    Mobile elements have been recognized as powerful tools for phylogenetic and population-level analyses. However, issues regarding potential sources of homoplasy and other misleading events have been raised. We have collected available data for all phylogenetic and population level studies of primates utilizing Alu insertion data and examined them for potentially homoplasious and other misleading events. Very low levels of each potential confounding factor in a phylogenetic or population analysis (i.e., lineage sorting, parallel insertions, and precise excision) were found. Although taxa known to be subject to high levels of these types of events may indeed be subject to problems when using SINE analysis, we propose that most taxa will respond as the order Primates has--by the resolution of several long-standing problems observed using sequence-based methods.

    更新日期:2019-11-01
  • Extensive morphological convergence and rapid radiation in the evolutionary history of the family Geoemydidae (old world pond turtles) revealed by SINE insertion analysis.
    Syst. Biol. (IF 10.266) Pub Date : 2007-03-10
    Takeshi Sasaki,Yuichirou Yasukawa,Kazuhiko Takahashi,Seiko Miura,Andrew M Shedlock,Norihiro Okada

    The family Geoemydidae is one of three in the superfamily Testudinoidea and is the most diversified family of extant turtle species. The phylogenetic relationships in this family and among related families have been vigorously investigated from both morphological and molecular viewpoints. The evolutionary history of Geoemydidae, however, remains controversial. Therefore, to elucidate the phylogenetic relationships of Geoemydidae and related species, we applied the SINE insertion method to investigate 49 informative SINE loci in 28 species. We detected four major evolutionary lineages (Testudinidae, Batagur group, Siebenrockiella group, and Geoemyda group) in the clade Testuguria (a clade of Geoemydidae + Testudinidae). All five specimens of Testudinidae form a monophyletic clade. The Batagur group comprises five batagurines. The Siebenrockiella group has one species, Siebenrockiella crassicollis. The Geoemyda group comprises 15 geoemydines (including three former batagurines, Mauremys reevesii, Mauremys sinensis, and Heosemys annandalii). Among these four groups, the SINE insertion patterns were inconsistent at four loci, suggesting that an ancestral species of Testuguria radiated and rapidly diverged into the four lineages during the initial stage of its evolution. Furthermore, within the Geoemyda group we identified three evolutionary lineages, namely Mauremys, Cuora, and Heosemys. The Heosemys lineage comprises Heosemys, Sacalia, Notochelys, and Melanochelys species, and its monophyly is a novel assemblage in Geoemydidae. Our SINE phylogenetic tree demonstrates extensive convergent morphological evolution between the Batagur group and the three species of the Geoemyda group, M. reevesii, M. sinensis, and H. annandalii.

    更新日期:2019-11-01
  • Phylogenomic investigation of CR1 LINE diversity in reptiles.
    Syst. Biol. (IF 10.266) Pub Date : 2007-03-10
    Andrew M Shedlock

    It is unlikely that taxonomically diverse phylogenetic studies will be completed rapidly in the near future for nonmodel organisms on a whole-genome basis. However, one approach to advancing the field of "phylogenomics" is to estimate the structure of poorly known genomes by mining libraries of clones from suites of taxa, rather than from single species. The present analysis adopts this approach by taking advantage of megabase-scale end-sequence scanning of reptilian genomic clones to characterize diversity of CR1-like LINEs, the dominant family of transposable elements (TEs) in the sister group of mammals. As such, it helps close an important gap in the literature on the molecular systematics and evolution of retroelements in nonavian reptiles. Results from aligning more than 14 Mb of sequence from the American alligator (Alligator mississippiensis), painted turtle (Chrysemys picta), Bahamian green anole (Anolis smaragdinus), Tuatara (Sphenodon punctatus), Emu (Dromaius novaehollandiae), and Zebra Finch (Taeniopygia guttata) against a comprehensive library approximately 3000 TE-encoding peptides reflect an increasing abundance of LINE and non-long-terminal-repeat (non-LTR) retrotransposon repeat types with the age of common ancestry among exemplar reptilian clades. The hypothesis that repeat diversity is correlated with basal metabolic rate was tested using comparative methods and a significant nonlinear relationship was indicated. This analysis suggests that the age of divergence between an exemplary clade and its sister group as well as metabolic correlates should be considered in addition to genome size in explaining patterns of retroelement diversity. The first phylogenetic analysis of the largely unexplored chicken repeat 1 (CR1) 3' reverse transcriptase (RT) conserved domains 8 and 9 in nonavian reptiles reveals a pattern of multiple lineages with variable branch lengths, suggesting presence of both old and young elements and the existence of several distinct well-supported clades not apparent from previous characterization of CR1 subfamily structure in birds and the turtle. This mode of CR1 evolution contrasts with historical patterns of LINE 1 diversification in mammals and hints toward the existence of a rich but still largely unexplored diversity of nonavian retroelements of importance to advancing both comparative vertebrate genomics and amniote systematics.

    更新日期:2019-11-01
  • Phylogenomic analysis of the L1 retrotransposons in Deuterostomia.
    Syst. Biol. (IF 10.266) Pub Date : 2007-03-10
    Dusan Kordis,Nika Lovsin,Franc Gubensek

    L1 retrotransposons constitute the largest single component of mammalian genomes. In contrast to the single remaining lineage of L1 retrotransposons in mammalian genomes, some teleost fishes contain a highly diverse L1 retrotransposon repertoire. Major evolutionary changes in L1 retrotransposon repertoires have therefore taken place in the land vertebrates (Tetrapoda). The lack of sequence data for L1 retrotransposons in the basal living Tetrapoda lineages prompted an investigation of their distribution and evolution in the genomes of the key tetrapod lineages, amphibians and reptiles, and in lungfishes. In this study, we combined genome database searches with PCR analysis to demonstrate that L1 retrotransposons are present in the genomes of lungfishes, amphibians, and lepidosaurs. Phylogenomic analysis shows that the genomes of Deuterostomia possess three highly divergent groups of L1 retrotransposons, with distinct distribution patterns. The analysis of L1 diversity shows the presence of a very large number of diverse L1 families, each with very low copy numbers, at the time of the origin of tetrapods. During the evolution of synapsids, all but one L1 lineage have been lost. This study establishes that the loss of L1 diversity and explosion in copy numbers occurred in the synapsid ancestors of mammals, and was most probably caused by severe population bottlenecks.

    更新日期:2019-11-01
  • Distribution and phylogeny of Penelope-like elements in eukaryotes.
    Syst. Biol. (IF 10.266) Pub Date : 2007-03-10
    Irina R Arkhipova

    Penelope-like elements (PLEs) are a relatively little studied class of eukaryotic retroelements, distinguished by the presence of the GIY-YIG endonuclease domain, the ability of some representatives to retain introns, and the similarity of PLE-encoded reverse transcriptases to telomerases. Although these retrotransposons are abundant in many animal genomes, the reverse transcriptase moiety can also be found in several protists, fungi, and plants, indicating its ancient origin. A comprehensive phylogenetic analysis of PLEs was conducted, based on extended sequence alignments and a considerably expanded data set. PLEs exhibit the pattern of evolution similar to that of non-LTR retrotransposons, which form deep-branching clades dating back to the Precambrian era. However, PLEs seem to have experienced a much higher degree of lineage losses than non-LTR retrotransposons. It is suggested that PLEs and non-LTR retrotransposons are included into a larger eTPRT (eukaryotic target-primed) group of retroelements, characterized by 5' truncation, variable target-site duplication, and the potential of the 3' end to participate in formation of non-autonomous derivatives.

    更新日期:2019-11-01
  • Exploring frontiers in the DNA landscape: an introduction to the symposium "Genome Analysis and the Molecular Systematics of Retroelements".
    Syst. Biol. (IF 10.266) Pub Date : 2007-03-10
    Andrew M Shedlock

    The emerging field of phylogenomics is influencing both the amount and type of characters being brought to bear on long-standing problems in systematic biology. Moreover, the proliferation of sequence information from genome projects in concert with the development of new informatics tools is widening access to comparative data on retroelements to a broad cross section of investigators. Motivated by this, the Society of Systematic Biologists sponsored a symposium entitled "Genome Analysis and the Molecular Systematics of Retroelements," and the resulting papers illustrate this theme of new discoveries and cover three basic areas of research: (i) the taxonomic distribution and phylogenetic structure of families of retroelements; (II) the use of SINE and LINE insertions for phylogenetic inference; and (III) the informatics and classification of repetitive elements. Contributions of each article are briefly discussed in this context and particularly fruitful directions for future research illuminated by results of this symposium are reviewed.

    更新日期:2019-11-01
  • 更新日期:2019-11-01
  • The effect of combining molecular and morphological data in published phylogenetic analyses.
    Syst. Biol. (IF 10.266) Pub Date : 2006-09-15
    Alexandra H Wortley,Robert W Scotland

    更新日期:2019-11-01
  • Increasing data transparency and estimating phylogenetic uncertainty in supertrees: Approaches using nonparametric bootstrapping.
    Syst. Biol. (IF 10.266) Pub Date : 2006-09-15
    Brian R Moore,Stephen A Smith,Michael J Donoghue

    The estimation of ever larger phylogenies requires consideration of alternative inference strategies, including divide-and-conquer approaches that decompose the global inference problem to a set of smaller, more manageable component problems. A prominent locus of research in this area is the development of supertree methods, which estimate a composite tree by combining a set of partially overlapping component topologies. Although promising, the use of component tree topologies as the primary data dissociates supertrees from complexities within the underling character data and complicates the evaluation of phylogenetic uncertainty. We address these issues by exploring three approaches that variously incorporate nonparametric bootstrapping into a common supertree estimation algorithm (matrix representation with parsimony, although any algorithm might be used), including bootstrap-weighting, source-tree bootstrapping, and hierarchical bootstrapping. We illustrate these procedures by means of hypothetical and empirical examples. Our preliminary experiments suggest that these methods have the potential to improve the correspondence of supertree estimates to those derived from simultaneous analysis of the combined data and to allow uncertainty in supertree topologies to be quantified. The ability to increase the transparency of supertrees to the underlying character data has several practical implications and sheds new light on an old debate. These methods have been implemented in the freely available program, tREeBOOT.

    更新日期:2019-11-01
  • A geometric approach to tree shape statistics.
    Syst. Biol. (IF 10.266) Pub Date : 2006-09-15
    Frederick A Matsen

    This article presents a new way to quantify the descriptive ability of tree shape statistics. Where before, tree shape statistics were chosen by their ability to distinguish between macroevolutionary models, the resolution presented in this paper quantifies the ability of a statistic to differentiate between similar and different trees. This is termed the geometric approach to differentiate it from the model-based approach previously explored. A distinct advantage of this perspective is that it allows evaluation of multiple tree shape statistics describing different aspects of tree shape. After developing the methodology, it is applied here to make specific recommendations for a suite of three statistics that may prove useful in applications. The article ends with an application of the statistics to clarify the impact of taxa omission on tree shape.

    更新日期:2019-11-01
  • Maximizing phylogenetic diversity in biodiversity conservation: Greedy solutions to the Noah's Ark problem.
    Syst. Biol. (IF 10.266) Pub Date : 2006-09-15
    Klaas Hartmann,Mike Steel

    The Noah's Ark Problem (NAP) is a comprehensive cost-effectiveness methodology for biodiversity conservation that was introduced by Weitzman (1998) and utilizes the phylogenetic tree containing the taxa of interest to assess biodiversity. Given a set of taxa, each of which has a particular survival probability that can be increased at some cost, the NAP seeks to allocate limited funds to conserving these taxa so that the future expected biodiversity is maximized. Finding optimal solutions using this framework is a computationally difficult problem to which a simple and efficient "greedy" algorithm has been proposed in the literature and applied to conservation problems. We show that, although algorithms of this type cannot produce optimal solutions for the general NAP, there are two restricted scenarios of the NAP for which a greedy algorithm is guaranteed to produce optimal solutions. The first scenario requires the taxa to have equal conservation cost; the second scenario requires an ultrametric tree. The NAP assumes a linear relationship between the funding allocated to conservation of a taxon and the increased survival probability of that taxon. This relationship is briefly investigated and one variation is suggested that can also be solved using a greedy algorithm.

    更新日期:2019-11-01
  • Detecting the node-density artifact in phylogeny reconstruction.
    Syst. Biol. (IF 10.266) Pub Date : 2006-09-15
    Chris Venditti,Andrew Meade,Mark Pagel

    The node-density effect is an artifact of phylogeny reconstruction that can cause branch lengths to be underestimated in areas of the tree with fewer taxa. Webster, Payne, and Pagel (2003, Science 301:478) introduced a statistical procedure (the "delta" test) to detect this artifact, and here we report the results of computer simulations that examine the test's performance. In a sample of 50,000 random data sets, we find that the delta test detects the artifact in 94.4% of cases in which it is present. When the artifact is not present (n = 10,000 simulated data sets) the test showed a type I error rate of approximately 1.69%, incorrectly reporting the artifact in 169 data sets. Three measures of tree shape or "balance" failed to predict the size of the node-density effect. This may reflect the relative homogeneity of our randomly generated topologies, but emphasizes that nearly any topology can suffer from the artifact, the effect not being confined only to highly unevenly sampled or otherwise imbalanced trees. The ability to screen phylogenies for the node-density artifact is important for phylogenetic inference and for researchers using phylogenetic trees to infer evolutionary processes, including their use in molecular clock dating.

    更新日期:2019-11-01
  • Incorporating allelic variation for reconstructing the evolutionary history of organisms from multiple genes: An example from Rosa in North America.
    Syst. Biol. (IF 10.266) Pub Date : 2006-09-15
    Simon Joly,Anne Bruneau

    Allelic variation within individuals holds information regarding the relationships of organisms, which is expected to be particularly important for reconstructing the evolutionary history of closely related taxa. However, little effort has been committed to incorporate such information for reconstructing the phylogeny of organisms. Haplotype trees represent a solution when one nonrecombinant marker is considered, but there is no satisfying method when multiple genes are to be combined. In this paper, we propose an algorithm that converts a distance matrix of alleles to a distance matrix among organisms. This algorithm allows the incorporation of allelic variation for reconstructing the phylogeny of organisms from one or more genes. The method is applied to reconstruct the phylogeny of the seven native diploid species of Rosa sect. Cinnamomeae in North America. The glyceralgehyde 3-phosphate dehydrogenase (GAPDH), the triose phosphate isomerase (TPI), and the malate synthase (MS) genes were sequenced for 40 individuals from these species. The three genes had little genetic variation, and most species showed incomplete lineage sorting, suggesting these species have a recent origin. Despite these difficulties, the networks (NeighborNet) of organisms reconstructed from the matrix obtained with the algorithm recovered groups that more closely match taxonomic boundaries than did the haplotype trees. The combined network of individuals shows that species west of the Rocky Mountains, Rosa gymnocarpa and R. pisocarpa, form exclusive groups and that together they are distinct from eastern species. In the east, three groups were found to be exclusive: R. nitida-R. palustris, R. foliolosa, and R. blanda-R. woodsii. These groups are congruent with the morphology and the ecology of species. The method is also useful for representing hybrid individuals when the relationships are reconstructed using a phylogenetic network.

    更新日期:2019-11-01
  • Dating dispersal and radiation in the gymnosperm Gnetum (Gnetales)--clock calibration when outgroup relationships are uncertain.
    Syst. Biol. (IF 10.266) Pub Date : 2006-09-15
    Hyosig Won,Susanne S Renner

    Most implementations of molecular clocks require resolved topologies. However, one of the Bayesian relaxed clock approaches accepts input topologies that include polytomies. We explored the effects of resolved and polytomous input topologies in a rate-heterogeneous sequence data set for Gnetum, a member of the seed plant lineage Gnetales. Gnetum has 10 species in South America, 1 in tropical West Africa, and 20 to 25 in tropical Asia, and explanations for the ages of these disjunctions involve long-distance dispersal and/or the breakup of Gondwana. To resolve relationships within Gnetum, we sequenced most of its species for six loci from the chloroplast (rbcL, matK, and the trnT-trnF region), the nucleus (rITS/5.8S and the LEAFY gene second intron), and the mitochondrion (nad1 gene second intron). Because Gnetum has no fossil record, we relied on fossils from other Gnetales and from the seed plant lineages conifers, Ginkgo, cycads, and angiosperms to constrain a molecular clock and obtain absolute times for within-Gnetum divergence events. Relationships among Gnetales and the other seed plant lineages are still unresolved, and we therefore used differently resolved topologies, including one that contained a basal polytomy among gymnosperms. For a small set of Gnetales exemplars (n = 13) in which rbcL and matK satisfied the clock assumption, we also obtained time estimates from a strict clock, calibrated with one outgroup fossil. The changing hierarchical relationships among seed plants (and accordingly changing placements of distant fossils) resulted in small changes of within-Gnetum estimates because topologically closest constraints overrode more distant constraints. Regardless of the seed plant topology assumed, relaxed clock estimates suggest that the extant clades of Gnetum began diverging from each other during the Upper Oligocene. Strict clock estimates imply a mid-Miocene divergence. These estimates, together with the phylogeny for Gnetum from the six combined data sets, imply that the single African species of Gnetum is not a remnant of a once Gondwanan distribution. Miocene and Pliocene range expansions are inferred for the Asian subclades of Gnetum, which stem from an ancestor that arrived from Africa. These findings fit with seed dispersal by water in several species of Gnetum, morphological similarities among apparently young species, and incomplete concerted evolution in the nuclear ITS region.

    更新日期:2019-11-01
  • Sequence-based species delimitation for the DNA taxonomy of undescribed insects.
    Syst. Biol. (IF 10.266) Pub Date : 2006-09-14
    Joan Pons,Timothy G Barraclough,Jesus Gomez-Zurita,Anabela Cardoso,Daniel P Duran,Steaphan Hazell,Sophien Kamoun,William D Sumlin,Alfried P Vogler

    Cataloging the very large number of undescribed species of insects could be greatly accelerated by automated DNA based approaches, but procedures for large-scale species discovery from sequence data are currently lacking. Here, we use mitochondrial DNA variation to delimit species in a poorly known beetle radiation in the genus Rivacindela from arid Australia. Among 468 individuals sampled from 65 sites and multiple morphologically distinguishable types, sequence variation in three mtDNA genes (cytochrome oxidase subunit 1, cytochrome b, 16S ribosomal RNA) was strongly partitioned between 46 or 47 putative species identified with quantitative methods of species recognition based on fixed unique ("diagnostic") characters. The boundaries between groups were also recognizable from a striking increase in branching rate in clock-constrained calibrated trees. Models of stochastic lineage growth (Yule models) were combined with coalescence theory to develop a new likelihood method that determines the point of transition from species-level (speciation and extinction) to population-level (coalescence) evolutionary processes. Fitting the location of the switches from speciation to coalescent nodes on the ultrametric tree of Rivacindela produced a transition in branching rate occurring at 0.43 Mya, leading to an estimate of 48 putative species (confidence interval for the threshold ranging from 47 to 51 clusters within 2 logL units). Entities delimited in this way exhibited biological properties of traditionally defined species, showing coherence of geographic ranges, broad congruence with morphologically recognized species, and levels of sequence divergence typical for closely related species of insects. The finding of discontinuous evolutionary groupings that are readily apparent in patterns of sequence variation permits largely automated species delineation from DNA surveys of local communities as a scaffold for taxonomy in this poorly known insect group.

    更新日期:2019-11-01
  • 更新日期:2019-11-01
  • Characters, states, and homology.
    Syst. Biol. (IF 10.266) Pub Date : 2006-01-03
    John V Freudenstein

    更新日期:2019-11-01
  • Hastings ratio of the LOCAL proposal used in Bayesian phylogenetics.
    Syst. Biol. (IF 10.266) Pub Date : 2006-01-03
    Mark T Holder,Paul O Lewis,David L Swofford,Bret Larget

    更新日期:2019-11-01
  • 更新日期:2019-11-01
  • A tale of two processes.
    Syst. Biol. (IF 10.266) Pub Date : 2006-01-03
    Peter Lockhart,Mike Steel

    更新日期:2019-11-01
  • Evidence for multiple reversals of asymmetric mutational constraints during the evolution of the mitochondrial genome of metazoa, and consequences for phylogenetic inferences.
    Syst. Biol. (IF 10.266) Pub Date : 2005-07-16
    Alexandre Hassanin,Nelly Léger,Jean Deutsch

    Mitochondrial DNA (mtDNA) sequences are comonly used for inferring phylogenetic relationships. However, the strand-specific bias in the nucleotide composition of the mtDNA, which is thought to reflect assymetric mutational constraints, combined with the important compositional heterogeneity among taxa, are known to be highly problematic for phylogenetic analyses. Here, nucleotide composition was compared across 49 species of Metazoa (34 arthropods, 2 annelids, 2 molluscs, and 11 deuterosomes), and analyzed for a mtDNA fragment including six protein-coding genes, i.e., atp6, atp8, cox1, cox2, cox3, and nad2. The analyses show that most metazoan species present a clear strand assymetry, where one strand is biased in favor of A and C, whereas the other strand has reverse bias, i.e. in favor of T and G. the origin of this strand bias can be related to assymetric mutational constraints involving deaminations of A and C nucleotides during the replication and/or transcription processes. The analyses reveal that six unrelated genera are characterized by a reversal of the usual strand bias, i.e., Argiope (Araneae), Euscorpius (Scorpiones), Tigrioupus (Maxillopoda), Branchiostoma (Cephalochordata) Florometra (Echinodermata), and Katharina (Mollusca). It is proposed that assymetric mutational constraints have been independantly reversed in these six genera, through an inversion of the control region, i.e., the region that contains most regulatory elements for replication and transcription of the mtDNA. We show that reversals of assymetric mutational constraints have dramatic consequences on the phylogenetic analyses, as taxa characterized by reverse strand bias tend to group together due to long-branch attraction artifacts. We propose a new method for limiting this specific problem in tree reconstruction under the Bayesian approach. We apply our method to deal with the question of phylogenetic relationships of the major lineages of Arthropoda, This new approach provides a better congruence with nuclear analyses based on mtDNA sequences, our data suggest that Chelicerata, Crustacea, Myriapoda, Pancrustacea, and Paradoxopoda are monophyletic.

    更新日期:2019-11-01
  • Morphology's role in phylogeny reconstruction: perspectives from paleontology.
    Syst. Biol. (IF 10.266) Pub Date : 2005-04-05
    Nathan Smith,Alan Turner

    更新日期:2019-11-01
  • Aligned 18S and insect phylogeny.
    Syst. Biol. (IF 10.266) Pub Date : 2004-10-27
    Karl M Kjer

    The nuclear small subunit rRNA (18S) has played a dominant role in the estimation of relationships among insect orders from molecular data. In previous studies, 18S sequences have been aligned by unadjusted automated approaches (computer alignments that are not manually readjusted), most recently with direct optimization (simultaneous alignment and tree building using a program called "POY"). Parsimony has been the principal optimality criterion. Given the problems associated with the alignment of rRNA, and the recent availability of the doublet model for the analysis of covarying sites using Bayesian MCMC analysis, a different approach is called for in the analysis of these data. In this paper, nucleotide sequence data from the 18S small subunit rRNA gene of insects are aligned manually with reference to secondary structure, and analyzed under Bayesian phylogenetic methods with both GTR+I+G and doublet models in MrBayes. A credible phylogeny of Insecta is recovered that is independent of the morphological data and (unlike many other analyses of 18S in insects) not contradictory to traditional ideas of insect ordinal relationships based on morphology. Hexapoda, including Collembola, are monophyletic. Paraneoptera are the sister taxon to a monophyletic Holometabola but weakly supported. Ephemeroptera are supported as the sister taxon of Neoptera, and this result is interpreted with respect to the evolution of direct sperm transfer and the evolution of flight. Many other relationships are well-supported but several taxa remain problematic, e.g., there is virtually no support for relationships among orthopteroid orders. A website is made available that provides aligned 18S data in formats that include structural symbols and Nexus formats.

    更新日期:2019-11-01
  • Molecular phylogenetic dating of asterid flowering plants shows early Cretaceous diversification.
    Syst. Biol. (IF 10.266) Pub Date : 2004-10-27
    Kare Bremer,Else Marie Friis,Birgitta Bremer

    We present a phylogenetic dating of asterids, based on a 111-taxon tree representing all major groups and orders and 83 of the 102 families of asterids, with an underlying data set comprising six chloroplast DNA markers totaling 9914 positions. Phylogenetic dating was done with semiparametric rate smoothing by penalized likelihood. Confidence intervals were calculated by bootstrapping. Six reference fossils were used for calibration. To explore the effects of various sources of error, we repeated the analyses with alternative dating methods (nonparametric rate smoothing and the Langley-Fitch clock-based method), alternative tree topologies, reduced taxon sampling (22 of the 111 taxa deleted), partitioning the data into three genes and three noncoding regions, and calibrating with single reference fossils. The analyses with alternative topologies, reduced taxon sampling, and coding versus noncoding sequences all yielded small or in some cases no deviations. The choice of method influenced the age estimates of a few nodes considerably. Calibration with reference fossils is a critical issue, and use of single reference fossils yielded different results depending on the fossil. The bootstrap confidence intervals were generally small. Our results show that asterids and their major subgroups euasterids, campanulids, and lamiids diversified during the Early Cretaceous. Cornales, Ericales, and Aquifoliales also have crown node ages from the Early Cretaceous. Dipsacales and Solanales are from the Mid-Cretaceous, the other orders of core campanulids and core lamiids from the Late Cretaceous. The considerable diversity exhibited by asterids almost from their first appearance in the fossil record also supports an origin and first phase of diversification in the Early Cretaceous.

    更新日期:2019-11-01
  • Modeling compositional heterogeneity.
    Syst. Biol. (IF 10.266) Pub Date : 2004-10-27
    Peter G Foster

    Compositional heterogeneity among lineages can compromise phylogenetic analyses, because models in common use assume compositionally homogeneous data. Models that can accommodate compositional heterogeneity with few extra parameters are described here, and used in two examples where the true tree is known with confidence. It is shown using likelihood ratio tests that adequate modeling of compositional heterogeneity can be achieved with few composition parameters, that the data may not need to be modelled with separate composition parameters for each branch in the tree. Tree searching and placement of composition vectors on the tree are done in a Bayesian framework using Markov chain Monte Carlo (MCMC) methods. Assessment of fit of the model to the data is made in both maximum likelihood (ML) and Bayesian frameworks. In an ML framework, overall model fit is assessed using the Goldman-Cox test, and the fit of the composition implied by a (possibly heterogeneous) model to the composition of the data is assessed using a novel tree-and model-based composition fit test. In a Bayesian framework, overall model fit and composition fit are assessed using posterior predictive simulation. It is shown that when composition is not accommodated, then the model does not fit, and incorrect trees are found; but when composition is accommodated, the model then fits, and the known correct phylogenies are obtained.

    更新日期:2019-11-01
  • Cytogenetics and cladistics.
    Syst. Biol. (IF 10.266) Pub Date : 2004-10-27
    Gauthier Dobigny,Jean-François Ducroz,Terence J Robinson,Vitaly Volobouev

    Chromosomal data have been underutilized in phylogenetic investigations despite the obvious potential that cytogenetic studies have to reveal both structural and functional homologies among taxa. In large part this is associated with difficulties in scoring conventional and molecular cytogenetic information for phylogenetic analysis. The manner in which chromosomal data have been used by most authors in the past was often conceptionally flawed in terms of the methods and principles underpinning modern cladistics. We present herein a review of the different methods employed, examine their relative strengths, and then outline a simple approach that considers the chromosomal change as the character, and its presence or absence the character state. We test this using one simulated and several empirical data sets. Features that are unique to cytogenetic investigations, including B-chromosomes, heterochromatic additions/deletions, and the location and number of nucleolar organizer regions (NORs), as well as the weighting of chromosomal characters, are critically discussed with regard to their suitability for phylogenetic reconstruction. We conclude that each of these classes of data have inherent problems that limit their usefulness in phylogenetic analyses and in most of these instances, inclusion should be subject to rigorous appraisal that addresses the criterion of unequivocal homology.

    更新日期:2019-11-01
  • Data partitions and complex models in Bayesian analysis: the phylogeny of Gymnophthalmid lizards.
    Syst. Biol. (IF 10.266) Pub Date : 2004-10-27
    Todd A Castoe,Tiffany M Doan,Christopher L Parkinson

    Phylogenetic studies incorporating multiple loci, and multiple genomes, are becoming increasingly common. Coincident with this trend in genetic sampling, model-based likelihood techniques including Bayesian phylogenetic methods continue to gain popularity. Few studies, however, have examined model fit and sensitivity to such potentially heterogeneous data partitions within combined data analyses using empirical data. Here we investigate the relative model fit and sensitivity of Bayesian phylogenetic methods when alternative site-specific partitions of among-site rate variation (with and without autocorrelated rates) are considered. Our primary goal in choosing a best-fit model was to employ the simplest model that was a good fit to the data while optimizing topology and/or Bayesian posterior probabilities. Thus, we were not interested in complex models that did not practically affect our interpretation of the topology under study. We applied these alternative models to a four-gene data set including one protein-coding nuclear gene (c-mos), one protein-coding mitochondrial gene (ND4), and two mitochondrial rRNA genes (12S and 16S) for the diverse yet poorly known lizard family Gymnophthalmidae. Our results suggest that the best-fit model partitioned among-site rate variation separately among the c-mos, ND4, and 12S + 16S gene regions. We found this model yielded identical topologies to those from analyses based on the GTR+I+G model, but significantly changed posterior probability estimates of clade support. This partitioned model also produced more precise (less variable) estimates of posterior probabilities across generations of long Bayesian runs, compared to runs employing a GTR+I+G model estimated for the combined data. We use this three-way gamma partitioning in Bayesian analyses to reconstruct a robust phylogenetic hypothesis for the relationships of genera within the lizard family Gymnophthalmidae. We then reevaluate the higher-level taxonomic arrangement of the Gymnophthalmidae. Based on our findings, we discuss the utility of nontraditional parameters for modeling among-site rate variation and the implications and future directions for complex model building and testing.

    更新日期:2019-11-01
  • A molecular supermatrix of the rabbits and hares (Leporidae) allows for the identification of five intercontinental exchanges during the Miocene.
    Syst. Biol. (IF 10.266) Pub Date : 2004-10-27
    Conrad A Matthee,Bettine Jansen van Vuuren,Diana Bell,Terence J Robinson

    The hares and rabbits belonging to the family Leporidae have a nearly worldwide distribution and approximately 72% of the genera have geographically restricted distributions. Despite several attempts using morphological, cytogenetic, and mitochondrial DNA evidence, a robust phylogeny for the Leporidae remains elusive. To provide phylogenetic resolution within this group, a molecular supermatrix was constructed for 27 taxa representing all 11 leporid genera. Five nuclear (SPTBN1, PRKCI, THY, TG, and MGF) and two mitochondrial (cytochrome b and 12S rRNA) gene fragments were analyzed singly and in combination using parsimony, maximum likelihood, and Bayesian inference. The analysis of each gene fragment separately as well as the combined mtDNA data almost invariably failed to provide strong statistical support for intergeneric relationships. In contrast, the combined nuclear DNA topology based on 3601 characters greatly increased phylogenetic resolution among leporid genera, as was evidenced by the number of topologies in the 95% confidence interval and the number of significantly supported nodes. The final molecular supermatrix contained 5483 genetic characters and analysis thereof consistently recovered the same topology across a range of six arbitrarily chosen model specifications. Twelve unique insertion-deletions were scored and all could be mapped to the tree to provide additional support without introducing any homoplasy. Dispersal-vicariance analyses suggest that the most parsimonious solution explaining the current geographic distribution of the group involves an Asian or North American origin for the Leporids followed by at least nine dispersals and five vicariance events. Of these dispersals, at least three intercontinental exchanges occurred between North America and Asia via the Bering Strait and an additional three independent dispersals into Africa could be identified. A relaxed Bayesian molecular clock applied to the seven loci used in this study indicated that most of the intercontinental exchanges occurred between 14 and 9 million years ago and this period is broadly coincidental with the onset of major Antarctic expansions causing land bridges to be exposed.

    更新日期:2019-11-01
Contents have been reproduced by permission of the publishers.
导出
全部期刊列表>>
2020新春特辑
限时免费阅读临床医学内容
ACS材料视界
科学报告最新纳米科学与技术研究
清华大学化学系段昊泓
自然科研论文编辑服务
加州大学洛杉矶分校
上海纽约大学William Glover
南开大学化学院周其林
课题组网站
X-MOL
北京大学分子工程苏南研究院
华东师范大学分子机器及功能材料
中山大学化学工程与技术学院
试剂库存
天合科研
down
wechat
bug