当前位置: X-MOL 学术Syst. Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Resolving Arthropod Phylogeny: Exploring Phylogenetic Signal within 41 kb of Protein-Coding Nuclear Gene Sequence
Systematic Biology ( IF 6.5 ) Pub Date : 2008-12-01 , DOI: 10.1080/10635150802570791
Jerome C Regier 1 , Jeffrey W Shultz , Austen R D Ganley , April Hussey , Diane Shi , Bernard Ball , Andreas Zwick , Jason E Stajich , Michael P Cummings , Joel W Martin , Clifford W Cunningham
Affiliation  

This study attempts to resolve relationships among and within the four basal arthropod lineages (Pancrustacea, Myriapoda, Euchelicerata, Pycnogonida) and to assess the widespread expectation that remaining phylogenetic problems will yield to increasing amounts of sequence data. Sixty-eight regions of 62 protein-coding nuclear genes (approximately 41 kilobases (kb)/taxon) were sequenced for 12 taxonomically diverse arthropod taxa and a tardigrade outgroup. Parsimony, likelihood, and Bayesian analyses of total nucleotide data generally strongly supported the monophyly of each of the basal lineages represented by more than one species. Other relationships within the Arthropoda were also supported, with support levels depending on method of analysis and inclusion/exclusion of synonymous changes. Removing third codon positions, where the assumption of base compositional homogeneity was rejected, altered the results. Removing the final class of synonymous mutations--first codon positions encoding leucine and arginine, which were also compositionally heterogeneous--yielded a data set that was consistent with a hypothesis of base compositional homogeneity. Furthermore, under such a data-exclusion regime, all 68 gene regions individually were consistent with base compositional homogeneity. Restricting likelihood analyses to nonsynonymous change recovered trees with strong support for the basal lineages but not for other groups that were variably supported with more inclusive data sets. In a further effort to increase phylogenetic signal, three types of data exploration were undertaken. (1) Individual genes were ranked by their average rate of nonsynonymous change, and three rate categories were assigned--fast, intermediate, and slow. Then, bootstrap analysis of each gene was performed separately to see which taxonomic groups received strong support. Five taxonomic groups were strongly supported independently by two or more genes, and these genes mostly belonged to the slow or intermediate categories, whereas groups supported only by a single gene region tended to be from genes of the fast category, arguing that fast genes provide a less consistent signal. (2) A sensitivity analysis was performed in which increasing numbers of genes were excluded, beginning with the fastest. The number of strongly supported nodes increased up to a point and then decreased slightly. Recovery of Hexapoda required removal of fast genes. Support for Mandibulata (Pancrustacea + Myriapoda) also increased, at times to "strong" levels, with removal of the fastest genes. (3) Concordance selection was evaluated by clustering genes according to their ability to recover Pancrustacea, Euchelicerata, or Myriapoda and analyzing the three clusters separately. All clusters of genes recovered the three concordance clades but were at times inconsistent in the relationships recovered among and within these clades, a result that indicates that the a priori concordance criteria may bias phylogenetic signal in unexpected ways. In a further attempt to increase support of taxonomic relationships, sequence data from 49 additional taxa for three slow genes (i.e., EF-1 alpha, EF-2, and Pol II) were combined with the various 13-taxon data sets. The 62-taxon analyses supported the results of the 13-taxon analyses and provided increased support for additional pancrustacean clades found in an earlier analysis including only EF-1 alpha, EF-2, and Pol II.

中文翻译:

解析节肢动物系统发育:探索蛋白质编码核基因序列 41 kb 内的系统发育信号

这项研究试图解决四种基础节肢动物谱系(全足纲、多足纲、真足纲、Pycnogonida)之间和内部的关系,并评估人们普遍期望剩余的系统发育问题将产生于越来越多的序列数据。对 62 个蛋白质编码核基因的 68 个区域(大约 41 千碱基 (kb)/分类单元)进行了测序,用于 12 个分类多样的节肢动物分类群和一个缓步动物外群。总核苷酸数据的简约、似然和贝叶斯分析通常强烈支持由一个以上物种代表的每个基础谱系的单一性。节肢动物中的其他关系也得到了支持,支持水平取决于分析方法和同义变化的包含/排除。去除第三个密码子位置,在拒绝基础成分同质性假设的情况下,改变了结果。去除最后一类同义突变——编码亮氨酸和精氨酸的第一个密码子位置,它们在组成上也是异质的——产生了一个与基础组成同质性假设一致的数据集。此外,在这样的数据排除制度下,所有 68 个基因区域均与碱基组成同质性一致。将可能性分析限制为非同义变化恢复了对基础谱系的强烈支持的树木,但不适用于得到更多包容性数据集可变支持的其他群体。为了进一步增加系统发育信号,进行了三种类型的数据探索。(1) 单个基因按其平均非同义变化率排序,并分配了三个速率类别——快速、中速和慢速。然后,分别对每个基因进行 bootstrap 分析,看看哪些分类群得到了强有力的支持。五个分类群被两个或更多基因独立强烈支持,这些基因大多属于慢或中间类别,而仅由单个基因区域支持的群往往来自快速类别的基因,认为快基因提供了一个不太一致的信号。(2) 进行敏感性分析,其中排除了越来越多的基因,从最快的开始。强烈支持的节点数量增加了一个点,然后略有减少。六足动物的恢复需要去除快速基因。对 Mandibulata (Pancrustacea + Myriapoda) 的支持也增加了,有时“
更新日期:2008-12-01
down
wechat
bug