当前位置: X-MOL 学术BMC Genomics › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Complete genome sequence and annotation of the laboratory reference strain Shigella flexneri serotype 5a M90T and genome-wide transcriptional start site determination
BMC Genomics ( IF 3.5 ) Pub Date : 2020-04-06 , DOI: 10.1186/s12864-020-6565-5
Ramón Cervantes-Rivera , Sophie Tronnet , Andrea Puhar

Shigella is a Gram-negative facultative intracellular bacterium that causes bacillary dysentery in humans. Shigella invades cells of the colonic mucosa owing to its virulence plasmid-encoded Type 3 Secretion System (T3SS), and multiplies in the target cell cytosol. Although the laboratory reference strain S. flexneri serotype 5a M90T has been extensively used to understand the molecular mechanisms of pathogenesis, its complete genome sequence is not available, thereby greatly limiting studies employing high-throughput sequencing and systems biology approaches. We have sequenced, assembled, annotated and manually curated the full genome of S. flexneri 5a M90T. This yielded two complete circular contigs, the chromosome and the virulence plasmid (pWR100). To obtain the genome sequence, we have employed long-read PacBio DNA sequencing followed by polishing with Illumina RNA-seq data. This provides a new hybrid strategy to prepare gapless, highly accurate genome sequences, which also cover AT-rich tracks or repetitive sequences that are transcribed. Furthermore, we have performed genome-wide analysis of transcriptional start sites (TSS) and determined the length of 5′ untranslated regions (5′-UTRs) at typical culture conditions for the inoculum of in vitro infection experiments. We identified 6723 primary TSS (pTSS) and 7328 secondary TSS (sTSS). The S. flexneri 5a M90T annotated genome sequence and the transcriptional start sites are integrated into RegulonDB (http://regulondb.ccg.unam.mx) and RSAT (http://embnet.ccg.unam.mx/rsat/) databases to use their analysis tools in the S. flexneri 5a M90T genome. We provide the first complete genome for S. flexneri serotype 5a, specifically the laboratory reference strain M90T. Our work opens the possibility of employing S. flexneri M90T in high-quality systems biology studies such as transcriptomic and differential expression analyses or in genome evolution studies. Moreover, the catalogue of TSS that we report here can be used in molecular pathogenesis studies as a resource to know which genes are transcribed before infection of host cells. The genome sequence, together with the analysis of transcriptional start sites, is also a valuable tool for precise genetic manipulation of S. flexneri 5a M90T. Further, we present a new hybrid strategy to prepare gapless, highly accurate genome sequences. Unlike currently used hybrid strategies combining long- and short-read DNA sequencing technologies to maximize accuracy, our workflow using long-read DNA sequencing and short-read RNA sequencing provides the added value of using non-redundant technologies, which yield distinct, exploitable datasets.

中文翻译:

实验室参考菌株弗氏志贺氏菌血清型5a M90T的完整基因组序列和注释,以及全基因组转录起始位点的确定

志贺氏菌是一种革兰氏阴性兼性细胞内细菌,可引起人类细菌性痢疾。志贺氏菌由于其毒性质粒编码的3型分泌系统(T3SS)而侵袭结肠粘膜细胞,并在靶细胞的细胞质中繁殖。尽管实验室参考菌株弗氏链球菌血清型5a M90T已被广泛用于理解发病机理的分子机制,但尚无完整的基因组序列,因此极大地限制了采用高通量测序和系统生物学方法的研究。我们已经对弗氏链球菌5a M90T的全基因组进行了测序,组装,注释和手动管理。这产生了两个完整的环状重叠群,即染色体和毒性质粒(pWR100)。为了获得基因组序列,我们采用了长读取的PacBio DNA测序,然后使用Illumina RNA-seq数据进行了抛光。这提供了一种新的杂交策略,可用于制备无间隙,高度准确的基因组序列,该序列还涵盖了富含AT的轨道或转录的重复序列。此外,我们对转录起始位点(TSS)进行了全基因组分析,并确定了在体外培养实验接种物典型培养条件下5'非翻译区(5'-UTR)的长度。我们确定了6723个主要TSS(pTSS)和7328个次要TSS(sTSS)。弗氏链球菌5a M90T注释的基因组序列和转录起始位点已整合到RegulonDB(http://regulondb.ccg.unam.mx)和RSAT(http://embnet.ccg.unam.mx/rsat/)数据库中在弗氏链球菌5a M90T基因组中使用他们的分析工具。我们提供弗氏链球菌血清型5a的第一个完整基因组,特别是实验室参考菌株M90T。我们的工作为在高质量的系统生物学研究(例如转录组和差异表达分析)或基因组进化研究中使用弗氏链球菌M90T的可能性提供了可能性。此外,我们在此报告的TSS目录可用于分子发病机制研究,作为了解感染宿主细胞之前转录哪些基因的资源。基因组序列以及转录起始位点的分析,也是精确控制弗氏链球菌5a M90T遗传操作的有价值的工具。此外,我们提出了一种新的杂交策略来制备无间隙,高精度的基因组序列。
更新日期:2020-04-22
down
wechat
bug