当前位置: X-MOL 学术Cladistics › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
ON COMBINING PROTEIN SEQUENCES AND NUCLEIC ACID SEQUENCES IN PHYLOGENETIC ANALYSIS: THE HOMEOBOX PROTEIN CASE
Cladistics ( IF 3.6 ) Pub Date : 1996-03-01 , DOI: 10.1111/j.1096-0031.1996.tb00193.x
D Agosti 1 , D Jacobs , R DeSalle
Affiliation  

Abstract— Amino acid encoding genes contain character state information that may be useful for phylogenetic analysis on at least two levels. The nucleotide sequence and the translated amino acid sequences have both been employed separately as character states for cladistic studies of various taxa, including studies of the genealogy of genes in multigene families. In essence, amino acid sequences and nucleic acid sequences are two different ways of character coding the information in a gene. Silent positions in the nucleotide sequence (first or third positions in codons that can accrue change without changing the identity of the amino acid that the triplet codes for) may accrue change relatively rapidly and become saturated, losing the pattern of historical divergence. On the other hand, non‐silent nucleotide alterations and their accompanying amino acid changes may evolve too slowly to reveal relationships among closely related taxa. In general, the dynamics of sequence change in silent and non‐silent positions in protein coding genes result in homoplasy and lack of resolution, respectively. We suggest that the combination of nucleic acid and the translated amino acid coded character states into the same data matrix for phylogenetic analysis addresses some of the problems caused by the rapid change of silent nucleotide positions and overall slow rate of change of non‐silent nucleotide positions and slowly changing amino acid positions. One major theoretical problem with this approach is the apparent non‐independence of the two sources of characters. However, there are at least three possible outcomes when comparing protein coding nucleic acid sequences with their translated amino acids in a phylogenetic context on a codon by codon basis. First, the two character sets for a codon may be entirely congruent with respect to the information they convey about the relationships of a certain set of taxa. Second, one character set may display no information concerning a phylogenetic hypothesis while the other character set may impart information to a hypothesis. These two possibilities are cases of non‐independence, however, we argue that congruence in such cases can be thought of as increasing the weight of the particular phylogenetic hypothesis that is supported by those characters. In the third case, the two sources of character information for a particular codon may be entirely incongruent with respect to phylogenetic hypotheses concerning the taxa examined. In this last case the two character sets are independent in that information from neither can predict the character states of the other. Examples of these possibilities are discussed and the general applicability of combining these two sources of information for protein coding genes is presented using sequences from the homeobox region of 46 homeobox genes fromDrosophila melanogasterto develop a hypothesis of genealogical relationship of these genes in this large multigene family.

中文翻译:

在系统发育分析中结合蛋白质序列和核酸序列:同源盒蛋白案例

摘要:氨基酸编码基因包含可用于至少两个水平的系统发育分析的特征状态信息。核苷酸序列和翻译的氨基酸序列都被分别用作各种分类群的分支研究的特征状态,包括多基因家族中基因谱系的研究。本质上,氨基酸序列和核酸序列是编码基因信息的两种不同方式。核苷酸序列中的沉默位置(密码子中的第一个或第三个位置可以在不改变三联体编码的氨基酸身份的情况下发生变化)可能会相对迅速地发生变化并变得饱和,从而失去历史分歧的模式。另一方面,非沉默的核苷酸改变及其伴随的氨基酸变化可能进化得太慢,无法揭示密切相关的分类群之间的关系。一般来说,蛋白质编码基因中沉默和非沉默位置的序列变化动态分别导致同源性和分辨率的缺乏。我们建议将核酸和翻译的氨基酸编码字符状态组合到相同的数据矩阵中进行系统发育分析,以解决由沉默核苷酸位置的快速变化和非沉默核苷酸位置的整体变化速度缓慢引起的一些问题和缓慢变化的氨基酸位置。这种方法的一个主要理论问题是两个字符来源的明显非独立性。然而,在逐个密码子的系统发育环境中比较蛋白质编码核酸序列与其翻译的氨基酸时,至少有三种可能的结果。首先,密码子的两个字符集可能完全一致,它们传达的有关特定分类群关系的信息。其次,一个字符集可能不显示有关系统发育假设的信息,而另一个字符集可能会将信息传递给假设。这两种可能性是非独立的情况,然而,我们认为在这种情况下的一致性可以被认为是增加了这些特征所支持的特定系统发育假设的权重。在第三种情况下,特定密码子的两个特征信息来源可能与所检查的分类群的系统发育假设完全不一致。在最后一种情况下,这两个字符集是独立的,因为它们的信息都不能预测另一个字符的状态。讨论了这些可能性的例子,并使用来自黑腹果蝇的 46 个同源框基因的同源框区域的序列介绍了将这两种信息来源结合起来用于蛋白质编码基因的普遍适用性,以在这个大的多基因家族中开发这些基因的谱系关系假说。
更新日期:1996-03-01
down
wechat
bug