当前位置: X-MOL 学术Plant Biotech. J. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
iSoybean: A database for the mutational fingerprints of soybean
Plant Biotechnology Journal ( IF 13.8 ) Pub Date : 2022-05-17 , DOI: 10.1111/pbi.13844
Mengzhu Zhang 1 , Xiyu Zhang 1 , Xinyu Jiang 1 , Lei Qiu 2 , Guanghong Jia 1 , Longfei Wang 1 , Wenxue Ye 1 , Qingxin Song 1
Affiliation  

Soybean (Glycine max L. Merrill) is one of the most important commercial crops worldwide. However, soybean has undergone severe genetic bottlenecks during domestication (Hyten et al.,2006). It is essential to exploit novel sources of genetic diversity and to expand gene pools for soybean improvement. Plant mutation breeding has been widely used by plant breeders to create novel genetic diversity. Ethyl methanesulfonate (EMS) is a chemical mutagen believed to mainly induce point mutations, which is commonly used to develop mutant populations in soybean (Li et al.,2017; Tsuda et al.,2015). However, lack of genome-wide characterization of mutations restricts the utilization of these mutant populations in the soybean community.

To provide novel genetic diversity for soybean breeding, we developed an EMS-induced mutant population and performed whole-genome sequencing (WGS) of 1044 mutant lines for the characterization of induced mutations (Figure 1a). About 21.5% of plants showed visual phenotypic variation in the M2 population, including leaf morphology, plant architecture and seed shape (Figure S1). On average, 76 million reads (11.4 Gb) were generated for each mutant line, resulting in an average sequencing depth of 11.2x (Table S1). In total, 6 774 731 mutations including 3 141 030 homozygous and 3 633 701 heterozygous mutations were pinpointed in 1044 mutant lines, giving an average mutation density of ~1 mutation per 150 kb for each mutant line (~6.7 mutations per kb for 1044 mutant lines) (Figure 1b, Table S1). EMS primarily induces GC > AT transitions. Totally, 4 801 170 GC > AT mutations (71% of total mutations) were detected in EMS-treated mutant population (Table S1). To examine the error rate for mutation identification, we randomly selected 105 GC > AT and 45 non-GC > AT mutations for validation using Sanger sequencing (Table S2). Among them, 104 GC > AT (99%) and 43 non-GC > AT (96%) mutations were confirmed to be positive, suggesting low error rate for identification of both GC > AT and non-GC > AT mutations in this study.

Details are in the caption following the image
Figure 1
Open in figure viewerPowerPoint
(a) Soybean seeds were subjected to EMS treatment. A single M2 plant for each mutant line was used for construction of WGS library and harvest of M3 seeds. The M3 seeds were planted to obtain M4 seeds for seed stock. Nucleotides in red indicate identified mutations. (b) Distribution of number of mutations per mutant line. (c) Example of a large deletion in mutant line NJAU0044. (d) Functional annotation of mutations located in coding sequences of soybean genes. (e) Knockout of GmKIX8-1 (NJAU1840, stop gained) induced larger seeds. (f) CHH methylation changes in gene region in gmdcl3, gmmet1a and gmcmt2a mutants compared with wild type. (g) Size distribution profiles for small RNAs derived from leaves of gmdcl3 mutant and wild type. (h) The snapshot of the iSoybean website. [Colour figure can be viewed at wileyonlinelibrary.com]

In addition to point mutations, we identified 22 373 small Indels (<50 bp), representing an average of 21.2 small Indels per mutant line (Figure S2a). A total of 1018 genes were found to be affected by 1034 small Indels. Compared with point mutations, small Indels were relatively rare in the mutant population (Figure S2b). Previous studies confirmed EMS mutagenesis could induce large structural variations in rice and wheat (Henry et al.,2014). Through the calculation of coverage variation along chromosomes, we detected 37 large deletions (>20 kb) in 33 mutant lines (Figure 1c). Totally, 401 genes were knocked out by these large deletions (Table S3).

To further analyse the effect of mutations on gene functions, we classified the mutations in gene models into truncation mutations (stop gained, start loss and mis-splicing), missense mutations and synonymous mutations (Figure 1d). We identified 34 178 truncation mutations, affecting 22 092 protein-coding genes which account for 41.8% of all soybean genes in reference genome (Figure 1d). In addition, there were 87% (48 613 genes) of soybean genes affected by 385 142 missense mutations. In total, 92.9% of soybean genes were affected by truncation or missense mutation, of which 85% of soybean genes contained two or more non-synonymous mutations (Figure S3). For example, we observed larger seeds by knockout of GmKIX8-1 in mutant NJAU1840 and early flowering due to knockout of GmE1 in mutant NJAU0143 as reported in previous studies (Figure 1e, Figure S4) (Nguyen et al.,2021; Xia et al.,2012).

The high density of mutations in the gene regions could facilitate functional genomics through forward and reverse genetic approaches. As an example, we examined DNA methylation changes by mutations in genes involved in DNA methylation (Figure S5). In plants, DNA methylation is catalysed in CG, CHG and CHH contexts through maintenance and de novo pathways (Figure S5). All homologous genes contained at least one truncation or missense mutation in our mutant population (Figure S5). To examine the effects of these mutations on DNA methylation, we analysed genome-wide DNA methylation changes by truncation mutations of GmDCL3 (Glyma.04G057400), GmMET1a (Glyma.04G187600) and GmCMT2a (Glyma.16G103500), compared with wild type (WT) (Figure S6). No obvious DNA methylation changes in gene region were observed in gmmet1a and gmcmt2a mutants compared with WT (Figure 1f), which may be due to gene redundancy of GmMET1 and GmCMT2 in soybean genome (Figure S5). There is only one homologue of Arabidopsis DCL3 gene in soybean. Expectedly, gmdcl3 mutant showed much lower CHH methylation levels in the gene region than WT (Figure 1f). Consistent with the function of DCL3 in the generation of 24-nt small RNAs (smRNAs), small RNA-seq analysis revealed a substantial decrease of 24-nt smRNAs in gmdcl3 mutant compared with WT (Figure 1g). These results demonstrate the feasibility of this mutant population to elucidate gene function through reverse genetics.

To make the mutant population available to soybean researchers, we established a website named iSoybean (www.isoybean.org) (Figure 1h). Users can search for mutations for a specific gene or browse all mutations in genomes using a JBrowse graphic interface. The desired mutant seeds can be freely requested from the Nanjing Agricultural University using iSoybean website. In conclusion, our sequenced mutant population provides valuable open-access resource for mutation discovery and will facilitate functional genomic studies to promote genetic breeding in soybean.



中文翻译:

iSoybean:大豆突变指纹数据库

大豆 ( Glycine max L. Merrill) 是全球最重要的经济作物之一。然而,大豆在驯化过程中经历了严重的遗传瓶颈(Hyten et al., 2006)。开发新的遗传多样性来源和扩大大豆改良基因库至关重要。植物突变育种已被植物育种者广泛用于创造新的遗传多样性。甲磺酸乙酯 (EMS) 是一种化学诱变剂,被认为主要诱导点突变,通常用于在大豆中开发突变种群 (Li et al., 2017 ; Tsuda et al., 2015)。然而,缺乏对突变的全基因组表征限制了这些突变种群在大豆群落中的利用。

为了为大豆育种提供新的遗传多样性,我们开发了 EMS 诱导的突变群体,并对 1044 个突变系进行了全基因组测序 (WGS),以表征诱导突变(图 1a)。大约 21.5% 的植物在 M2 种群中表现出视觉表型变异,包括叶片形态、植物结构和种子形状(图 S1)。平均而言,每个突变系产生了 7600 万次读取(11.4 Gb),导致平均测序深度为 11.2 倍(表 S1)。总共在 1044 个突变系中确定了 6 774 731 个突变,包括 3 141 030 个纯合突变和 3 633 701 个杂合突变,每个突变系的平均突变密度为每 150 kb 约 1 个突变(1044 个突变体每 kb 约 6.7 个突变线)(图 1b,表 S1)。EMS 主要诱导 GC > AT 转换。完全,在 EMS 处理的突变群体中检测到 4 801 170 个 GC > AT 突变(占总突变的 71%)(表 S1)。为了检查突变识别的错误率,我们随机选择了 105 个 GC > AT 和 45 个非 GC > AT 突变进行 Sanger 测序验证(表 S2)。其中 104 个 GC > AT (99%) 和 43 个 non-GC > AT (96%) 突变被确认为阳性,表明本研究中识别 GC > AT 和 non-GC > AT 突变的错误率较低.

详细信息在图片后面的标题中
图1
在图形查看器中打开微软幻灯片软件
(a) 大豆种子经过 EMS 处理。每个突变系的单个 M2 植物用于构建 WGS 文库和收获 M3 种子。种植M3种子以获得用于种子库的M4种子。红色的核苷酸表示已识别的突变。( b )每个突变系突变数的分布。(c) 突变系 NJAU0044 中大量缺失的示例。(d) 位于大豆基因编码序列中的突变的功能注释。(e) GmKIX8-1 (NJAU1840, stop gain)的敲除诱导了更大的种子。(f)与野生型相比,gmdcl3gmmet1agmcmt2a突变体基因区域的 CHH 甲基化变化。(g) 源自gmdcl3叶子的小 RNA 的大小分布图突变型和野生型。(h) iSoybean 网站的快照。[可以在wileyonlinelibrary.com查看彩色图]

除了点突变外,我们还鉴定了 22 373 个小 Indels(<50 bp),代表每个突变系平均有 21.2 个小 Indels(图 S2a)。共发现 1018 个基因受到 1034 个小 Indels 的影响。与点突变相比,小Indels在突变群体中相对罕见(图S2b)。先前的研究证实 EMS 诱变可以在水稻和小麦中引起较大的结构变异(Henry等人,2014 年)。通过计算沿染色体的覆盖率变化,我们在 33 个突变系中检测到 37 个大缺失(>20 kb)(图 1c)。总共有 401 个基因被这些大缺失敲除(表 S3)。

为了进一步分析突变对基因功能的影响,我们将基因模型中的突变分为截短突变(停止获得、开始丢失和错误剪接)、错义突变和同义突变(图1d)。我们确定了 34 178 个截断突变,影响了 22 092 个蛋白质编码基因,占参考基因组中所有大豆基因的 41.8%(图 1d)。此外,有 87%(48 613 个基因)的大豆基因受到 385 142 个错义突变的影响。总共有 92.9% 的大豆基因受到截断或错义突变的影响,其中 85% 的大豆基因包含两个或多个非同义突变(图 S3)。例如,我们通过敲除突变体 NJAU1840 中的 GmKIX8-1 观察到较大的种子,以及由于敲除GmE1而导致的早期开花在先前研究中报道的突变 NJAU0143 中(图 1e,图 S4)(Nguyen等人,2021;Xia等人,2012)。

基因区域中的高密度突变可以通过正向和反向遗传方法促进功能基因组学。例如,我们通过参与 DNA 甲基化的基因突变检查了 DNA 甲基化变化(图 S5)。在植物中,DNA 甲基化在 CG、CHG 和 CHH 环境中通过维持和从头途径催化(图 S5)。在我们的突变群体中,所有同源基因至少包含一个截断或错义突变(图 S5)。为了检查这些突变对 DNA 甲基化的影响,我们通过GmDCL3 ( Glyma.04G057400 )、GmMET1a ( Glyma.04G187600 ) 和GmCMT2a (Glyma.16G103500),与野生型(WT)相比(图S6)。与 WT 相比,在gmmet1agmcmt2a突变体中未观察到基因区域的明显 DNA 甲基化变化(图 1f),这可能是由于大豆基因组中GmMET1GmCMT2的基因冗余(图 S5)。拟南芥DCL3基因在大豆中只有一个同源物。预期,gmdcl3突变体在基因区域中显示出比WT低得多的CHH甲基化水平(图1f)。与DCL3在 24-nt 小 RNA (smRNAs) 生成中的功能一致,小 RNA-seq 分析显示 gmdcl3 中的 24-nt smRNAs 显着减少与WT相比的突变体(图1g)。这些结果证明了这种突变群体通过反向遗传学阐明基因功能的可行性。

为了向大豆研究人员提供突变种群,我们建立了一个名为 iSoybean (www.isoybean.org) 的网站(图 1h)。用户可以使用 JBrowse 图形界面搜索特定基因的突变或浏览基因组中的所有突变。所需的突变种子可以使用 iSoybean 网站从南京农业大学免费索取。总之,我们测序的突变群体为突变发现提供了宝贵的开放获取资源,并将促进功能基因组研究,以促进大豆的遗传育种。

更新日期:2022-05-17
down
wechat
bug