当前位置: X-MOL 学术Plant Biotech. J. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Burden tests can be used to map causal genes for a simple metabolic trait in an exome-sequenced polyploid mutant population
Plant Biotechnology Journal ( IF 10.1 ) Pub Date : 2022-07-10 , DOI: 10.1111/pbi.13890
Guillaume N Menard 1 , Peter J Eastmond 1
Affiliation  

Forward genetic screens are an excellent tool to assign gene function, but it is often necessary to employ map-based cloning to identify the causal genes. This can be laborious and represents a bottleneck in plant fundamental and applied research. With advances in DNA technology, it is becoming increasingly affordable to sequence large populations. Krasileva et al. (2017) exome sequenced tetraploid and hexaploid wheat ethyl methanesulfonate (EMS) mutagenized populations, primarily to facilitate reverse genetic screens. Gene redundancy allows a very high mutant load of 35–40 mutations per kilobase, and the populations of ~1500 and ~1200 lines each harbour ~22–23 missense or truncation mutations per gene. Here, we show that burden tests, a simple form of rare-variant association analysis developed for human disease genetics (Lee et al., 2014), can be used to identify causal genes in the hexaploid wheat (Triticum aestivum) cv. Cadenza mutant population, without the need for map-based cloning.

The statistical power to detect association with rare variants is very limited (Lee et al., 2014), and most mutations in the Cadenza EMS population are singletons (Krasileva et al., 2017). Burden tests work by collapsing multiple variants within a gene (or other functional groups) into a single test score, thereby increasing frequency and providing greater power (Lee et al., 2014). However, this power relies on the selected variants mostly being causal and having the same direction and magnitude of effect (Lee et al., 2014). Such assumptions likely hold for mutant populations where causal variants are most frequently deleterious (Meinke, 2013), and their severity can be predicted from sequence analysis (Kumar et al., 2009). The absence of genetic structure in mutant populations should simplify association studies and collapsing homoeologous groups, that lack functional divergence in ‘recent’ polyploids like wheat (Krasileva et al., 2017), should also improve power.

To investigate whether burden tests can be applied to the Cadenza population, we measured the fatty acid composition of lipids in individual M4 grains (caryopses) from 1188 exome-sequenced lines using gas chromatography and calculated the proportion of unsaturated fatty acids that are polyunsaturated (ω-6 desaturation efficiency or ω-6DE), which is a simple adaptive metabolic trait (Menard et al., 2017) and a determinant of edible oil quality (Hajiahmadi et al., 2020). As summarized in Figure 1a, we extracted a list of putative deleterious mutations in the M2 population (Krasileva et al., 2017) using BioMart within EnsemblPlants (https://plants.ensembl.org/biomart/martview) and collapsed them by gene and by homoeologous group (triad) (Ramírez-González et al., 2018). These mutations were given equal weight and include stop codon gained, start codon lost, splice donor and acceptor variants and non-synonymous mutation with a SIFT (sorting intolerance from tolerance) score <0.05 (Kumar et al., 2009). We then performed gene and triad-based burden tests using a single-locus linear model (CMLM) implemented in GAPIT (genome association and prediction integrated tool) (Lipka et al., 2012).

Details are in the caption following the image
Figure 1
Open in figure viewerPowerPoint
Applying burden tests to the Cadenza exome-sequenced EMS population to identify genes that control grain ω-6 fatty acid desaturation efficiency (ω-6DE). (a) Workflow diagram. White boxes show resources created by Krasileva et al. (2017). Manhattan plots showing trait associations with (b) 82 950 genes and (c) 17 616 triads. Collapsed variant frequency threshold = 0.002. Dotted line marks significance threshold after Bonferroni correction for α = 0.05. Putative TaFAD2 and TaROD1 genes are highlighted. Quantile–quantile plots shown on right. (d) TaFAD2 expression in grains at hard dough stage (mean ± SE, n = 3). tpm is transcripts per kilobase million. RNA-seq data from Ramírez-González et al. (2018). (e) Box plots for ω-6DE in M4 grain from all mutant lines containing putative deleterious (D) and non-deleterious (ND) variants in each TaFAD2 gene (n = 22–1166) and from two independent BC1F2 homozygous mutants (M) and their wild type segregants (WT) (n = 5). Asterisks denote significant differences (P < 0.05, unpaired Student's t-test). Cadenza line numbers and TaFAD2 mutations leading to amino acid substitutions or premature stop codons* are 0277 (W107*), 0290 (P31S), 1569 (W107*), 1235 (L347F), 1366 (Q167*) and 1423 (W92*). [Colour figure can be viewed at wileyonlinelibrary.com]

We identified three genes and two triads that are significantly (P < 0.05) associated with ω-6DE, after applying Bonferroni correction (Figures 1b,c and S1). The three genes TraesCS6A02G280000, TraesCS6B02G309400 and TraesCS6D02G260200 form one triad and are predicted to encode homologues of FATTY ACID DESATURASE 2 (FAD2) (Hajiahmadi et al., 2020). FAD2 is a microsomal ω-6 fatty acid desaturase that is known to control ω-6DE in Arabidopsis thaliana seeds (Menard et al., 2017; Okuley et al., 1994). Hexaploid wheat contains eleven putative FAD2 genes (Hajiahmadi et al., 2020), and TraesCS6A02G280000 (TaFAD2.1), TraesCS6B02G309400 (TaFAD2.6) and TraesCS6D02G260200 (TaFAD2.8) are the most strongly expressed in developing grains of cv. Azhurnava (Figure 1d; Ramírez-González et al., 2018). The second triad (TraesCS7A02G378300, TraesCS7B02G280100 and TraesCS7D02G375100) encode putative homologues of REDUCED OLEATE DESATURATION 1 (ROD1), which supplies FAD2 with substrate (Lu et al., 2009).

TaFAD2 and TaROD1 transcripts are average length for wheat (~1.6 and ~1.5 kb), encoding proteins of ~390 and ~ 300 amino acid residues, respectively. The 1188 M4 lines that we screened contained 22–24 putative deleterious mutations in each TaFAD2 gene, and 6–9 in each TaROD1 gene, when the M2 generation was exome sequenced (Krasileva et al., 2017). To confirm that disruption of the TaFAD2 genes causes a reduction in ω-6DE, we selected two independent lines with mutations in each gene that had low ω-6DE in our screen (Figure 1e). We backcrossed them to wildtype and identified five homozygous and five wildtype segregant BC1F2 plants using KASP (kompetitive allele specific PCR) assays and further confirmed their genotype by DNA sequencing (Krasileva et al., 2017). We then analysed the fatty acid composition of their BC1F3 grains and found that ω-6DE is significantly (P < 0.05) lower in all the homozygous TaFAD2 mutants (M) versus wildtype (WT) segregants (Figure 1e). The decrease in ω-6DE is small (<9%), but owing to the high broad-sense heritability of the trait (H2 ~0.9), the effect size is very large (Cohen's d > 0.8).

In conclusion, we show that gene and homoeologous group-based burden tests can identify causal genes for a simple metabolic trait in an exome-sequenced polyploid mutant population. Many rare-variant association analysis methods have been developed and may be applicable, including burden tests with more sophisticated weighting, variance-component and combined tests (Lee et al., 2014). We have collapsed point mutations in the Cadenza population, but deletions are also present (Krasileva et al., 2017) and could be included. The gene redundancy that exists in polyploid mutant populations likely provides a trade-off between power and effect size when applying burden tests. Redundancy allows polyploids to tolerate high mutant loads (Krasileva et al., 2017), providing smaller populations with more collapsible variants per gene (and homoeologous group). However, redundancy also hides the phenotypic effects of variants (Krasileva et al., 2017). It is intuitive that more heritable traits that are controlled by fewer (and larger) genes will likely be more amenable to genetic dissection using burden tests. Mutant populations of tetraploid wheat (Krasileva et al., 2017) and many other polyploid crops such oilseed rape (Brassica napus) and false flax (Camelina sativa) might also be amenable to burden tests.



中文翻译:

负担测试可用于绘制外显子测序多倍体突变群体中简单代谢特征的因果基因

正向遗传筛选是分配基因功能的绝佳工具,但通常需要使用基于图谱的克隆来识别因果基因。这可能很费力,并且代表了植物基础和应用研究的瓶颈。随着 DNA 技术的进步,对大量人群进行测序变得越来越便宜。Krasileva等人。( 2017) 外显子组测序的四倍体和六倍体小麦甲磺酸乙酯 (EMS) 诱变种群,主要是为了促进反向遗传筛选。基因冗余允许每千碱基有 35-40 个突变的非常高的突变负荷,并且每个基因约 1500 和约 1200 行的种群具有约 22-23 个错义或截断突变。在这里,我们展示了负担测试,一种为人类疾病遗传学开发的罕见变异关联分析的简单形式(Lee等人,  2014 年),可用于识别六倍体小麦(Triticum aestivum)cv 中的因果基因。Cadenza 突变种群,无需基于图谱的克隆。

检测与罕见变异关联的统计能力非常有限(Lee等人,  2014 年),并且 Cadenza EMS 群体中的大多数突变是单例(Krasileva等人,  2017 年)。负担测试通过将基因(或其他功能组)内的多个变体折叠成单个测试分数来工作,从而增加频率并提供更大的功率(Lee et al .,  2014)。然而,这种力量依赖于选定的变体,这些变体大多是因果的,并且具有相同的方向和影响幅度(Lee et al .,  2014)。这样的假设可能适用于因果变异最常有害的突变群体(Meinke,  2013 年),并且可以通过序列分析预测其严重程度(Kumar等人,  2009 年)。突变群体中缺乏遗传结构应该简化关联研究和崩溃同源群体,在“近期”多倍体(如小麦)中缺乏功能分歧(Krasileva等人,  2017 年),也应该提高效力。

为了研究负荷测试是否可以应用于 Cadenza 种群,我们使用气相色谱法测量了 1188 个外显子组测序系中单个 M 4谷物(颖果)中脂质的脂肪酸组成,并计算了多不饱和不饱和脂肪酸的比例。 ω-6 去饱和效率或 ω-6DE),它是一种简单的适应性代谢性状(Menard等人,  2017 年),也是食用油品质的决定因素(Hajiahmadi等人,  2020 年)。如图 1a 所示,我们提取了 M 2种群中假定的有害突变列表(Krasileva等人,  2017) 在 EnsemblPlants (https://plants.ensembl.org/biomart/martview) 中使用 BioMart,并通过基因和同源组 (triad) 将它们折叠起来 (Ramírez-González et al .,  2018 )。这些突变具有相同的权重,包括获得的终止密码子、丢失的起始密码子、剪接供体和受体变体以及 SIFT(从耐受中分类不耐受)评分 <0.05 的非同义突变(Kumar等人,  2009 年)。然后,我们使用在 GAPIT(基因组关联和预测集成工具)中实施的单基因座线性模型 (CMLM) 进行了基于基因和三元组的负荷测试 (Lipka et al .,  2012 )。

详细信息在图片后面的标题中
图1
在图形查看器中打开微软幻灯片软件
将负荷测试应用于 Cadenza 外显子组测序的 EMS 群体,以识别控制谷物 ω-6 脂肪酸去饱和效率 (ω-6DE) 的基因。(a) 工作流程图。白框显示由 Krasileva等人创建的资源。(2017 年)。曼哈顿图显示了与(b)82 950 个基因和(c)17 616 个三元组的性状关联。折叠变体频率阈值 = 0.002。虚线标记了 Bonferroni 校正后的显着性阈值,α = 0.05。推定的 TaFAD2TaROD1基因被突出显示。分位数-分位数图如右图所示。(d)硬面团阶段谷物中TaFAD2的表达(平均值±标准差,n = 3)。tpm 是每千碱基百万的转录本。来自 Ramírez-González等人的 RNA-seq 数据。(2018 年)。(e) M 4谷物中 ω-6DE 的箱线图,来自所有突变系,每个TaFAD2基因 ( n = 22-1166)中含有推定的有害 (D) 和非有害 (ND) 变体, 以及来自两个独立的 BC 1 F 2纯合突变体 (M) 及其野生型分离子 (WT) ( n  = 5)。星号表示显着差异(P  < 0.05,未配对学生t检验)。华彩行号和TaFAD2导致氨基酸取代或过早终止密码子*的突变是 0277 (W107*)、0290 (P31S)、1569 (W107*)、1235 (L347F)、1366 (Q167*) 和 1423 (W92*)。[可以在wileyonlinelibrary.com查看彩色图]

 在应用 Bonferroni 校正后,我们确定了与 ω-6DE 显着相关的三个基因和两个三联体(P < 0.05)(图 1b、c 和 S1)。TraesCS6A02G280000TraesCS6B02G309400TraesCS6D02G260200三个基因形成一个三联体,预计将编码 FATTY ACID DESATURASE 2 (FAD2) 的同源物 (Hajiahmadi et al .,  2020 )。FAD2 是一种微粒体 ω-6 脂肪酸去饱和酶,已知可控制拟南芥种子中的 ω-6DE(Menard等人,  2017 年;Okuley等人,  1994 年)。六倍体小麦含有十一种推定的FAD2基因 (Hajiahmadi et al .,  2020 ) 和TraesCS6A02G280000 ( TaFAD2.1 )、TraesCS6B02G309400 ( TaFAD2.6 ) 和TraesCS6D02G260200 ( TaFAD2.8 ) 在 cv. Azhurnava(图 1d;Ramírez-González等人,  2018 年)。第二个三元组(TraesCS7A02G378300、TraesCS7B02G280100 和 TraesCS7D02G375100)编码 REDUCED OLEATE DESATURATION 1 (ROD1) 的推定同源物,它为 FAD2 提供底物(Lu等人,  2009 年)。

TaFAD2TaROD1转录本是小麦的平均长度(~1.6 和~1.5 kb),分别编码~390 和~300 个氨基酸残基的蛋白质。当对 M 2代进行外显子组测序时,我们筛选的 1188 个 M 4系在每个TaFAD2基因中包含 22-24 个假定的有害突变,在每个TaROD1基因中包含 6-9个(Krasileva,  2017)。确认TaFAD2的中断基因导致 ω-6DE 减少,我们选择了两个独立的系,每个基因中都有突变,在我们的筛选中具有低 ω-6DE(图 1e)。我们将它们与野生型回交,并使用 KASP(竞争性等位基因特异性 PCR)测定鉴定了 5 种纯合和 5 种野生型分离株 BC 1 F 2植物,并通过 DNA 测序进一步证实了它们的基因型(Krasileva等人,  2017 年)。然后我们分析了他们的BC 1 F 3谷物的脂肪酸组成,发现ω-6DE在所有纯合TaFAD2中显着降低( P  <0.05)突变体(M)与野生型(WT)分离子(图1e)。ω-6DE 的下降幅度很小(<9%),但由于该性状具有较高的广义遗传力(H 2 ~0.9),因此效应量非常大(Cohen's d  > 0.8)。

总之,我们表明基于基因和同源组的负荷测试可以识别外显子组测序的多倍体突变群体中简单代谢性状的因果基因。已经开发了许多稀有变异关联分析方法并且可能适用,包括具有更复杂加权的负担测试、方差分量和组合测试 (Lee et al .,  2014 )。我们已经在 Cadenza 种群中消除了点突变,但也存在缺失(Krasileva等人,  2017) 并且可以包括在内。在应用负担测试时,多倍体突变群体中存在的基因冗余可能会在功率和效应大小之间进行权衡。冗余允许多倍体耐受高突变负荷(Krasileva等人,  2017 年),为较小的群体提供每个基因(和同源组)更多可折叠的变体。然而,冗余也隐藏了变体的表型效应(Krasileva等人,  2017 年)。直观的是,由更少(和更大)基因控制的更多可遗传性状可能更适合使用负担测试进行基因分析。四倍体小麦的突变种群(Krasileva,  2017) 和许多其他多倍体作物,如油菜 ( Brassica napus ) 和假亚麻 ( Camelina sativa ) 也可能适合进行负担测试。

更新日期:2022-07-10
down
wechat
bug