当前位置: X-MOL 学术Genome Biol. Evol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Stochastic gain and loss of novel transcribed open reading frames in the human lineage
Genome Biology and Evolution ( IF 3.3 ) Pub Date : 2020-09-16 , DOI: 10.1093/gbe/evaa194
Daniel Dowling 1 , Jonathan F Schmitz 1 , Erich Bornberg-Bauer 1
Affiliation  

In addition to known genes, much of the human genome is transcribed into RNA. Chance formation of novel open reading frames (ORFs) can lead to the translation of myriad new proteins. Some of these ORFs may yield advantageous adaptive de novo proteins. However, widespread translation of non-coding DNA can also produce hazardous protein molecules, which can misfold and\or form toxic aggregates. The dynamics of how de novo proteins emerge from potentially toxic raw materials and what influences their long-term survival are unknown. Here, using transcriptomic data from human and five other primates, we generate a set of transcribed human ORFs at six conservation levels to investigate which properties influence the early emergence and long-term retention of these expressed ORFs. As these taxa diverged from each other relatively recently we present a fine scale view of the evolution of novel sequences over recent evolutionary time. We find that novel human-restricted ORFs are preferentially located on GC-rich gene-dense chromosomes, suggesting their retention is linked to pre-existing genes. Sequence properties such as intrinsic structural disorder and aggregation propensity–which have been proposed to play a role in survival of de novo genes–remain relatively unchanged over time. Even very young sequences code for proteins with low aggregation propensities, suggesting that genomic regions with many novel transcribed ORFs are concomitantly less likely to produce ORFs which code for harmful toxic proteins. Our data indicate that the survival of these novel ORFs is largely stochastic rather than shaped by selection.

中文翻译:

人类谱系中新型转录开放阅读框的随机增益和丢失

除了已知基因外,大部分人类基因组都被转录为 RNA。新的开放阅读框 (ORF) 的偶然形成可以导致无数新蛋白质的翻译。这些 ORF 中的一些可能会产生有利的适应性 de novo 蛋白质。然而,非编码 DNA 的广泛翻译也会产生危险的蛋白质分子,这些蛋白质分子可能会错误折叠和/或形成有毒的聚集体。从头蛋白质如何从潜在有毒原材料中产生的动力学以及影响它们长期存活的因素尚不清楚。在这里,我们使用来自人类和其他五种灵长类动物的转录组数据,在六个保护级别生成一组转录的人类 ORF,以研究哪些特性会影响这些表达的 ORF 的早期出现和长期保留。由于这些分类群相对最近才彼此分化,我们提出了新序列在最近进化时间内进化的精细尺度视图。我们发现新的人类限制性 ORF 优先位于富含 GC 的基因密集染色体上,这表明它们的保留与预先存在的基因有关。序列特性,如内在结构紊乱和聚集倾向——已被提议在从头基因的存活中发挥作用——随着时间的推移保持相对不变。即使非常年轻的序列也编码具有低聚集倾向的蛋白质,这表明具有许多新转录 ORF 的基因组区域同时产生编码有害有毒蛋白质的 ORF 的可能性较小。我们的数据表明,这些新型 ORF 的存活在很大程度上是随机的,而不是受选择影响。
更新日期:2020-09-16
down
wechat
bug