当前位置: X-MOL 学术Genes Genom. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Comprehensive genomic analyses with 115 plastomes from algae to seed plants: structure, gene contents, GC contents, and introns.
Genes & Genomics ( IF 2.1 ) Pub Date : 2020-03-21 , DOI: 10.1007/s13258-020-00923-x
Eun-Chae Kwon 1 , Jong-Hwa Kim 2, 3 , Nam-Soo Kim 1, 4
Affiliation  

BACKGROUND Chloroplasts are a common character in plants. The chloroplasts in each plant lineage have shaped their own genomes, plastomes, by structural changes and transferring many genes to nuclear genomes during plant evolution. Some plastid genes have introns that are mostly group II introns. OBJECTIVE This study aimed to get genomic and evolutionary insights on the plastomes from green algae to flowering plants. METHODS Plastomes of 115 species from green algae, bryophytes, pteridophytes (spore bearing vascular plants), gymnosperms, and angiosperms were mined from NCBI organelle genome database. Plastome structure, gene contents and GC contents were analyzed by the in-house developed Phyton code. Intronic features including presence/absence, length, intron phases were analyzed by manually in the annotated information in NCBI. RESULTS The canonical quadripartite structures were retained in most plastomes except of a few plastomes that had lost an invert repeat (IR). Expansion or reduction or deletion of IRs resulted in the length variation of the plastomes. The number of protein coding genes ranged from 40 to 92 with an average 79.43 ± 5.84 per plastome and gene losses were apparent in specific lineages. The number of trn genes ranged from 13 to 33 with an average 21.19 ± 2.42 per plastome. Ribosomal RNA genes, rrn, were located in the IRs so that they were present in a duplicate except of the species that had lost one of the IR. GC contents were variable from 24.9 to 51.0% with an average 38.21 ± 3.27%, indicating bias to high AT contents. Plastid introns were present in 18 protein coding genes, six trn genes, and one rrn gene. Intron losses occurred among the orthologous genes in different plant lineages. The plastid introns were long compared with the nuclear introns, which might be related with the spliceosome nuclear introns and self-splicing group II plastid introns. The trnK-UUU intron contained the maturase encoding matK gene except in the chlorophyte algae and monilophyte ferns in which the trnK-UUU was lost, but matK retained. There were many annotation artefacts in the intron positions in the NCBI database. In the analysis of intron phases, phase 0 introns were more frequent than those of phase 2 and 3 introns. Phase polymorphism was observed in the introns of clpP which was derived from nucleotide insertion. Plastid trn introns were long compared to the archaeal or eukaryotic nuclear tRNA introns. Of the six plastid trn introns, one was at the D loop and other five were at the anticodon loop. The insertion sites were conserved among the trn genes in archaea, eukaryotic nuclear and plastid tRNA genes. CONCLUSIONS Current study refurbrished the previous findings of structural variations, gene contents, and GC contents of the chloroplast genomes from green algae to flowering plants. The study also included some noble findings and discussions on the plastome introns including their length variations and phase variation. We also presented and corrected some false annotations on the introns in protein coding and tRNA genes in the genome database, which might be confirmed by the chloroplast transcriptome analysis in the future.

中文翻译:

使用从藻类到种子植物的115个质体组进行全面的基因组分析:结构,基因含量,GC含量和内含子。

背景技术叶绿体是植物中的共同特征。每个植物谱系中的叶绿体通过结构变化并将许多基因在植物进化过程中转移到核基因组中,从而形成了自己的基因组,质体组。一些质体基因具有内含子,这些内含子主要是II组内含子。目的本研究旨在获得从绿藻到开花植物的质体组的基因组和进化见解。方法从NCBI细胞器基因组数据库中提取115种来自绿藻,苔藓植物,蕨类植物(带有孢子的维管植物),裸子植物和被子植物的菌落。通过内部开发的Phyton代码分析了质膜的结构,基因含量和GC含量。通过在NCBI中的注释信息中手动分析了内含子功能,包括存在/不存在,长度,内含子阶段。结果除了少数丢失了反向重复序列(IR)的塑性体外,大多数塑性体均保留了规范的四方结构。IR的扩大,减少或缺失导致了质体的长度变化。蛋白质编码基因的数量从40到92不等,平均每个质体组为79.43±5.84,在特定谱系中基因损失明显。trn基因的数量范围从13到33,平均每个质子组21.19±2.42。核糖体RNA基因rrn位于IR中,因此除了丢失其中一个IR的物种外,它们一式两份存在。GC含量从24.9%到51.0%不等,平均38.21±3.27%,表明偏向于高AT含量。质体内含子存在于18个蛋白质编码基因,6个trn基因和1个rrn基因中。内含子丢失发生在不同植物谱系的直系同源基因之间。质体内含子与核内含子相比很长,这可能与剪接体核内含子和自我剪接的II类质体内含子有关。trnK-UUU内含子包含编码matK基因的成熟酶,但在丢失了trnK-UUU的叶绿藻和单生蕨中却保留了matK。NCBI数据库中内含子位置有许多注释伪像。在内含子阶段的分析中,0期内含子比2期和3期内含子更频繁。在来自核苷酸插入的clpP的内含子中观察到相多态性。与古细菌或真核tRNA内含子相比,质体trn内含子长。在六个质体trn内含子中,一个在D环,另一个五个在反密码子环。插入位点在古细菌,真核和质体tRNA基因的trn基因之间是保守的。结论当前的研究对以前从绿藻到开花植物的叶绿体基因组的结构变异,基因含量和GC含量的发现进行了更新。该研究还包括一些关于质体内含子的高贵发现和讨论,包括其长度变化和相位变化。我们还提出并纠正了基因组数据库中蛋白质编码和tRNA基因中内含子的一些错误注释,这些注释将来可能会被叶绿体转录组分析所证实。结论当前的研究对以前从绿藻到开花植物的叶绿体基因组的结构变异,基因含量和GC含量的发现进行了更新。该研究还包括一些关于质子组内含子的高贵发现和讨论,包括其长度变化和相位变化。我们还提出并纠正了基因组数据库中蛋白质编码和tRNA基因中内含子的一些错误注释,这些注释将来可能会被叶绿体转录组分析所证实。结论当前的研究对以前从绿藻到开花植物的叶绿体基因组的结构变异,基因含量和GC含量的发现进行了更新。该研究还包括一些关于质子组内含子的高贵发现和讨论,包括其长度变化和相位变化。我们还提出并纠正了基因组数据库中蛋白质编码和tRNA基因中内含子的一些错误注释,这些注释将来可能会被叶绿体转录组分析所证实。
更新日期:2020-03-21
down
wechat
bug