当前位置: X-MOL 学术Interface Focus › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Integrated synteny- and similarity-based inference on the polyploidization–fractionation cycle
Interface Focus ( IF 4.4 ) Pub Date : 2021-06-11 , DOI: 10.1098/rsfs.2020.0059
Yue Zhang 1 , Zhe Yu 1 , Chunfang Zheng 1 , David Sankoff 1
Affiliation  

Whole-genome doubling, tripling or replicating to a greater degree, due to fixation of polyploidization events, is attested in almost all lineages of the flowering plants, recurring in the ancestry of some plants two, three or more times in retracing their history to the earliest angiosperm. This major mechanism in plant genome evolution, which generally appears as instantaneous on the evolutionary time scale, sets in operation a compensatory process called fractionation, the loss of duplicate genes, initially rapid, but continuing at a diminishing rate over millions and tens of millions of years. We study this process by statistically comparing the distribution of duplicate gene pairs as a function of their time of creation through polyploidization, as measured by sequence similarity. The stochastic model that accounts for this distribution, though exceedingly simple, still has too many parameters to be estimated based only on the similarity distribution, while the computational procedures for compiling the distribution from annotated genomic data is heavily biased against earlier polyploidization events—syntenic ‘crumble’. Other parameters, such as the size of the initial gene complement and the ploidy of the various events giving rise to duplicate gene pairs, are even more inaccessible to estimation. Here, we show how the frequency of unpaired genes, identified via their embedding in stretches of duplicate pairs, together with previously established constraints among some parameters, adds enormously to the range of successive polyploidization events that can be analysed. This also allows us to estimate the initial gene complement and to correct for the bias due to crumble. We explore the applicability of our methodology to four flowering plant genomes covering a range of different polyploidization histories.



中文翻译:

基于综合同线性和相似性的多倍化-分级循环推断

由于多倍化事件的固定,全基因组加倍、三倍或更大程度的复制在开花植物的几乎所有谱系中都得到证实,在一些植物的祖先中重复出现两次、三次或更多次,以追溯它们的历史。最早的被子植物。植物基因组进化中的这种主要机制,在进化时间尺度上通常表现为瞬时,启动了一个称为分馏的补偿过程,即重复基因的丢失,最初很快,但在数百万和数千万年。我们通过统计比较重复基因对的分布作为它们通过多倍化产生的时间的函数来研究这个过程,如序列相似性所测量的。解释这种分布的随机模型,虽然非常简单,但仍然有太多参数无法仅基于相似性分布来估计,而从注释的基因组数据编译分布的计算过程严重偏向于早期的多倍体化事件 - 同义“崩溃”。其他参数,例如初始基因补体的大小和导致重​​复基因对的各种事件的倍性,甚至更难以估计。在这里,我们展示了频率 例如初始基因补体的大小和导致重​​复基因对的各种事件的倍性,甚至更难以估计。在这里,我们展示了频率 例如初始基因补体的大小和导致重​​复基因对的各种事件的倍性,甚至更难以估计。在这里,我们展示了频率不成对的基因,通过它们嵌入重复对的延伸,以及先前在一些参数中建立的约束来识别,极大地增加了可以分析的连续多倍化事件的范围。这也使我们能够估计初始基因补体并纠正由于崩溃导致的偏差。我们探索了我们的方法对涵盖一系列不同多倍化历史的四种开花植物基因组的适用性。

更新日期:2021-06-11
down
wechat
bug