American Journal of Hematology ( IF 10.1 ) Pub Date : 2021-12-27 , DOI: 10.1002/ajh.26450 Ane Amundarain 1, 2 , Luis V Valcárcel 1, 3 , Raquel Ordoñez 1, 2 , Leire Garate 1, 2, 4 , Estíbaliz Miranda 1, 2 , Xabier Cendoya 3 , Arantxa Carrasco-Leon 1, 2 , María José Calasanz 2, 5 , Bruno Paiva 2, 4, 5, 6 , Cem Meydan 7, 8, 9 , Christopher E Mason 7, 8, 9 , Ari Melnick 7 , Paula Rodriguez-Otero 2, 4 , José I Martín-Subero 2, 10, 11, 12 , Jesús San Miguel 2, 4 , Francisco J Planes 3 , Felipe Prósper 1, 2, 4 , Xabier Agirre 1, 2
Multiple myeloma (MM) is a hematologic neoplasm characterized by a clonal expansion of malignant plasma cells (PCs) in the bone marrow, showing clinical, genetic, and epigenetic heterogeneity. Chromosomal translocations are one of the hallmarks of MM, and mainly involve the immunoglobulin heavy chain locus (IGH). These translocations usually result in the placement of various oncogenes under the control of IGH, leading to the up-regulation of genes that provide a selective growth advantage to MM cells.1 Five recurrent IGH translocations have been described in MM; however, in many cases, the second gene involved is not defined in routine clinical analyses. Besides, recent studies have reported novel recurrent fusion partners and novel non-IGH fusions beyond well-known translocations.2, 3 Nevertheless, these approaches did not consider the normal counterpart of B-cells, which may provide new insights regarding the role of fusion transcripts (FT) in MM. Furthermore, MM is also associated with deregulation of long noncoding RNAs (lncRNA), a group of genes with increasing relevance in cancer.4 Various studies suggest the involvement of lncRNAs in chromosomal translocations; however, this has not been assessed in MM.
Here, to define the landscape of expressed FTs in MM, we analyzed the strand-specific RNA-seq (ssRNA-seq) data of 35 samples obtained from 6 different B-cell subpopulations (5 naïve, 7 centroblast, 7 centrocyte, 8 memory, 5 tonsillar PC, and 3 bone marrow [BM] PC samples) obtained from 11 healthy donors (8 tonsil and 3 BMPCs) and PCs from 37 MM patients, paying particular attention to FTs involving lncRNAs (lncFT). Using the STAR-Fusion algorithm, we initially identified 2169 FTs. After applying several computational filtering steps, we defined 1454 FTs expressed in B-cells and MM samples (Figure S1). The highest numbers of FTs were detected in healthy donor PCs (tonsillar plasma cells [TPC] and BMPC) (Figure S2A-B), and based on the biological relevance of IG genes in B-cells and malignant PC, detected FTs were classified into IG and REST (none of the associated genes corresponded to an IG gene) categories (Figure S2C–E). The 82.5% of FTs detected in healthy PCs occurred with IG genes, harbored very few reads per transcript, and were only supported by junction reads without any spanning reads covering the non-IG partner gene. Therefore, FTs that were not supported by at least one spanning read were filtered out, resulting in the final detection of 208 expressed FTs in normal B-cells and MM cells (Figure S1A). To validate our results and identify FTs consistently detected, we also applied ARRIBA and STAR-SEQR algorithms to our cohort with the same filters described above. One hundred and fifty-eight FTs were detected by at least two algorithms and were selected for further analyses after a quality check step (Appendix S1; Figures S1A, S2F–I; Table S1). These expressed FTs were detected in every cell population, with a significantly higher number of FTs in MM cells (median of 3 ± 2.97) (Wilcoxon p-value <.001) (Figure S2G). A similar number of reads per FT was detected in all cell subpopulations (analysis of variance [ANOVA] p-value .382), suggesting that FTs are expressed consistently at low levels (Figure S2G), and most of the expressed FTs occurred between two non-IG partners (Figure S2H). Characteristic features of B-cells include IG gene rearrangement and active transcription of IG genes, leading to the transcription of thousands of similar transcripts from these loci,5 which could be misidentified as FTs in these cells. Thus, cell-specific features should be considered when implementing an adequate FT detection pipeline for each cell type to exclude false positive events. Furthermore, the presence of FTs in normal B-cells indicates that FTs are not exclusive to tumor cells, suggesting that FTs may contribute to transcriptional diversity in healthy tissues.
From the 158 FTs, we filtered out those detected in at least one normal B-cell sample to focus on MM-specific FTs, leading to the identification of 79 expressed FTs (61 unique) (Figure S1A, Table S2), 29.5% of which had not been previously described (Figure 1A). At least one expressed FT was identified in 75.7% of the MM samples (Figure 1B), indicating that some FTs emerge specifically after malignant transformation. A Human Phenotype ontology analysis of coding genes involved in these 61 unique FTs showed a significant enrichment of genes associated with B lymphocyte dysfunction phenotypes (p-adj <.05), suggesting that important genes for B-cell abnormalities and MM pathogenesis may be more prone to FT formation. Most of the MM-expressed FTs showed an overall low read count, with some exceptions (Figure 1C). As previously described,2 85.3% of MM FTs were patient-specific, but 9 were recurrently expressed FTs (Figure 1D). A total of 88.5% of MM-specific FTs were derived from the fusion between 2 non-IG partners and IG FT percentages were lower than those previously reported2, 3 probably due to our smaller cohort size (Figure 1E). Nevertheless, we identified the IGH-NSD2 expressed FT derived from t(4;14) in two patients and two MM cell lines (Figure S3A–C), and described a novel FT between the genes GBE1 and KIF20B in one MM cell line (Figure S3D–F). Recent studies have shown the implication of lncFTs in MM, such as lncFTs with PVT1, but a complete characterization of the lncFT transcriptome is still pending.3 We observed that the 27.9% of MM-specific expressed FTs were lncFTs (Figure 1F), some of them leading to the overexpression of the associated lncRNA (Figure 1G,H), as in the case of FTs involving oncogenes.2, 3 Interestingly, a relevant fraction of MM-specific FTs occurred between two adjacent genes in the same DNA strand, being defined as transcription read-throughs (RT) (Figure 1I), a novel class of MM-specific FT. A total of 64.3% of RTs involved a lncRNA as a fusion partner gene (Figure 1J), some of them were detected in both MM patient samples and cell lines with a low expression and were validated in cell lines through real-time quantitative reverse transcription PCR (qRT-PCR) and Sanger sequencing (Figure S3G–I). Furthermore, we found recurrences for three RTs with lncRNAs, such as the FT between AC092691.1 and LSAMP (Figure 1K), showing an increased expression of the AC092691.1 in comparison to normal PCs and MM samples without the FT (Figure 1L). The presence of functional oncogenic lncFTs and RTs has been reported for other tumor types,6 suggesting that lncFTs could be important in MM, but additional studies will be needed to determine their role in MM.
Finally, we analyzed whether lncFTs might have an impact on the outcome of MM patients by analyzing expressed FTs in 599 MM patients included in the MMRF CoMMpass data set release IA15. We used the intersection of STAR-Fusion and ARRIBA, identifying 556 expressed lncFTs. Interestingly, we found that 35% of MM-specific unique lncFTs defined in our cohort were present in the CoMMpass data set. We observed various FTs between lncRNAs and IG genes (IGK-FAM230C, IGH-LINC-PINT), suggesting that MM patients with IG translocations could involve both coding and noncoding partner genes, and that lncRNAs could explain some of the MM cases in which the associated IG gene in translocations remained unknown. We validated the robustness of our algorithm by comparing the number of patients in which we detected the IGH-NSD2 FT with those patients in which t(4;14) was detected by whole-genome sequencing (WGS), identifying IGH-NSD2 FT in 75 of the 79 samples positive by WGS, and additionally, detecting the expression of this FT in 2 other MM samples where WGS for t(4;14) was negative (Fisher's exact test p-value = 6.7e-88). To assess whether the lncFTs could be associated with prognosis in MM, we selected those lncFTs that were detected in more than 2% of MM patients (Figure S4A), and we evaluated the combination of lncFTs and the defined high-risk genetic markers1 (International Staging System [ISS] stage, t(4;14), t(14;16), t(14;20), del(17p), deletion of CDKN2C, del(1p), amp(1q), and mutations of TP53) using a multivariate coxph model and BIC to select the optimal number of variables. We discovered that the expression of 3 lncFTs (TEX35-AL37796.1, AL050309.1-KLF8, and PVT1-IGL), together with the ISS stage and TP53 mutations resulted in a significantly lower progression-free survival (PFS) (global p < .0001) and the model stratifies the MM patients according to the number of events they have (an event consists of having any of the five risk factors) into four risk groups (Figure 1M,N). Similarly, expression of 1 lncFT (TEX35-AL37796.1) together with the ISS stage, del(17p) or amp(1q) also resulted in statistically significant worse overall survival (OS) (global p < .0001), identifying five groups with significant differences in their OS (Figure 1O,P). An ANOVA test comparing the models derived from high-risk genetic factors only or combining them with lncFTs resulted in a significant improvement for the combination of both risk factors for PFS (p-value = 6.3e−5, Figure S4B) and OS (p-value = .019, Figure S4C). These findings should be validated in other MM cohorts, but our results suggest that lncFTs in MM could contribute to a better patient stratification, impacting in patient management in terms of treatment choice or contributing to the identification of specific subgroups of patients suitable for personalized therapies.
In summary, this study provides the first comprehensive landscape of expressed lncFTs and RTs in MM, demonstrating that FTs may also be expressed in normal B-cells and that expression of recurrent lncFTs may have a significant impact in PFS and OS in MM patients.
中文翻译:
参与多发性骨髓瘤表达的融合转录本的长链非编码 RNA 的景观和临床意义
多发性骨髓瘤 (MM) 是一种血液肿瘤,其特征是骨髓中恶性浆细胞 (PC) 的克隆性扩增,表现出临床、遗传和表观遗传的异质性。染色体易位是MM的标志之一,主要涉及免疫球蛋白重链基因座(IGH)。这些易位通常导致将各种癌基因置于IGH的控制之下,从而导致为 MM 细胞提供选择性生长优势的基因上调。1五复发性IGH易位已在 MM 中描述;然而,在许多情况下,涉及的第二个基因在常规临床分析中没有定义。此外,最近的研究报告了除了众所周知的易位之外的新的复发性融合伙伴和新的非IGH融合。2, 3然而,这些方法没有考虑 B 细胞的正常对应物,这可能为融合转录物 (FT) 在 MM 中的作用提供新的见解。此外,MM 还与长链非编码 RNA (lncRNA) 的失调有关,lncRNA 是一组与癌症相关性越来越高的基因。4各种研究表明 lncRNA 参与了染色体易位;但是,这尚未在 MM 中进行评估。
在这里,为了定义 MM 中表达 FT 的情况,我们分析了从 6 个不同 B 细胞亚群(5 个幼稚、7 个中心母细胞、7 个中心细胞、8 个记忆)获得的 35 个样本的链特异性 RNA-seq (ssRNA-seq) 数据、5 个扁桃体 PC 和 3 个骨髓 [BM] PC 样本)从 11 名健康供体(8 个扁桃体和 3 个 BMPC)和来自 37 名 MM 患者的 PC 中获得,特别注意涉及 lncRNA (lncFT) 的 FT。使用 STAR-Fusion 算法,我们最初识别了 2169 个 FT。在应用几个计算过滤步骤后,我们定义了 1454 个以 B 细胞和 MM 样本表示的 FT(图 S1)。在健康供体 PC(扁桃体浆细胞 [TPC] 和 BMPC)中检测到的 FT 数量最多(图 S2A-B),并且基于IG的生物学相关性B 细胞和恶性 PC 中的基因,检测到的 FT 分为IG和 REST(相关基因均不对应于IG基因)类别(图 S2C-E)。在健康 PC 中检测到的 82.5% 的 FT 发生在IG基因中,每个转录本的读数非常少,并且仅由连接读数支持,没有任何跨越非IG的读数伴侣基因。因此,至少一个跨越读取不支持的 FT 被过滤掉,最终在正常 B 细胞和 MM 细胞中检测到 208 个表达的 FT(图 S1A)。为了验证我们的结果并确定始终检测到的 FT,我们还使用上述相同的过滤器将 ARRIBA 和 STAR-SEQR 算法应用于我们的队列。至少有两种算法检测到 158 个 FT,并在质量检查步骤后选择用于进一步分析(附录 S1;图 S1A,S2F-I;表 S1)。在每个细胞群中都检测到了这些表达的 FT,MM 细胞中的 FT 数量显着增加(中位数为 3 ± 2.97)(Wilcoxon p-值 <.001)(图 S2G)。在所有细胞亚群中检测到每个 FT 的读数相似数量(方差分析 [ANOVA] p值 .382),表明 FT 始终以低水平表达(图 S2G),并且大多数表达的 FT 发生在两个非IG合作伙伴(图 S2H)。B 细胞的特征包括IG基因重排和IG基因的活跃转录,导致从这些基因座转录数千个相似的转录物,5这可能被错误地识别为这些单元格中的 FT。因此,在为每种细胞类型实施适当的 FT 检测管道以排除假阳性事件时,应考虑细胞特定的特征。此外,正常 B 细胞中 FTs 的存在表明 FTs 不是肿瘤细胞独有的,这表明 FTs 可能有助于健康组织中的转录多样性。
从 158 个 FT 中,我们过滤掉在至少一个正常 B 细胞样本中检测到的那些,以专注于 MM 特异性 FT,从而识别出 79 个表达的 FT(61 个唯一)(图 S1A,表 S2),29.5%以前没有描述过 (图 1A)。在 75.7% 的 MM 样本中鉴定出至少一种表达的 FT(图 1B),表明一些 FT 在恶性转化后特异性出现。对涉及这 61 个独特 FT 的编码基因的人类表型本体分析显示,与 B 淋巴细胞功能障碍表型相关的基因显着富集(p-adj <.05),表明 B 细胞异常和 MM 发病机制的重要基因可能更容易形成 FT。大多数 MM 表达的 FT 显示总体读取计数较低,但有一些例外(图 1C)。如前所述,2 85.3% 的 MM FT 是患者特异性的,但 9 是反复表达的 FT(图 1D)。共有 88.5% 的 MM 特异性 FT 来自 2 个非IG合作伙伴之间的融合,并且IG FT 百分比低于先前报道的 2、3可能是由于我们的队列规模较小(图 1E)。尽管如此,我们还是确定了IGH-NSD2在两名患者和两个 MM 细胞系中表达源自 t(4;14) 的 FT(图 S3A-C),并描述了一个 MM 细胞系中基因GBE1和KIF20B之间的新 FT (图 S3D-F)。最近的研究表明 lncFT 在 MM 中的意义,例如具有 PVT1 的lncFT,但 lncFT 转录组的完整表征仍然悬而未决。3我们观察到 27.9% 的 MM 特异性表达的 FT 是 lncFT(图 1F),其中一些导致相关 lncRNA 的过度表达(图 1G,H),就像涉及癌基因的 FT 一样。2、3有趣的是,MM 特异性 FT 的相关部分发生在同一 DNA 链中的两个相邻基因之间,被定义为转录通读 (RT) (图 1I),这是一类新的 MM 特异性 FT。共有 64.3% 的 RT 涉及作为融合伴侣基因的 lncRNA(图 1J),其中一些在 MM 患者样本和低表达细胞系中检测到,并通过实时定量逆转录在细胞系中进行了验证PCR (qRT-PCR) 和 Sanger 测序(图 S3G-I)。此外,我们发现了三个带有 lncRNA 的 RT 的复发,例如AC092691之间的 FT 。图1和LSAMP(图 1K)显示了AC092691的表达增加。1与没有 FT 的普通 PC 和 MM 样本相比(图 1L)。据报道,其他肿瘤类型存在功能性致癌 lncFTs 和 RTs,6这表明 lncFTs 在 MM 中可能很重要,但需要更多的研究来确定它们在 MM 中的作用。
最后,我们通过分析 MMRF CoMMpass 数据集发布 IA15 中包含的 599 名 MM 患者中表达的 FT,分析了 lncFT 是否可能对 MM 患者的结果产生影响。我们使用了 STAR-Fusion 和 ARRIBA 的交集,识别了 556 个表达的 lncFT。有趣的是,我们发现在我们的队列中定义的 35% 的 MM 特定的独特 lncFT 存在于 CoMMpass 数据集中。我们观察到 lncRNA 和IG基因(IGK-FAM230C、IGH-LINC-PINT)之间的各种 FT,这表明具有IG易位的 MM 患者可能同时涉及编码和非编码伴侣基因,并且 lncRNA 可以解释一些 MM 病例,其中关联IG易位中的基因仍然未知。我们通过比较我们检测到 IGH-NSD2 FT 的患者数量与通过全基因组测序 (WGS) 检测到 t(4;14) 的患者数量来验证我们算法的稳健性,在79 个样本中的 75 个被 WGS 阳性,此外,在 t(4;14) 的 WGS 为阴性的其他 2 个 MM 样本中检测到该 FT 的表达(Fisher 精确检验 p 值 = 6.7e-88)。为了评估 lncFTs 是否与 MM 的预后相关,我们选择了在超过 2% 的 MM 患者中检测到的那些 lncFTs(图 S4A),我们评估了 lncFTs 和定义的高风险遗传标记1的组合(国际分期系统 [ISS] 阶段、t(4;14)、t(14;16)、t(14;20)、del(17p)、删除CDKN2C、del(1p)、amp(1q) 和TP53的突变)使用多变量coxph模型和 BIC 来选择最佳变量数。我们发现 3 个 lncFTs(TEX35 -AL37796.1 、 AL050309.1 - KLF8和PVT1 -IGL)的表达,连同 ISS 阶段和TP53突变导致无进展生存期(PFS)显着降低(全球p < .0001),该模型根据 MM 患者发生的事件数量(事件包括具有五个风险因素中的任何一个)将 MM 患者分层为四个风险组(图 1M,N)。类似地,1 lncFT ( TEX35-AL37796 . 1)的表达与 ISS 阶段、del(17p) 或 amp(1q) 也导致统计学上显着更差的总生存期 (OS) (全局p < .0001),确定了五个组他们的操作系统存在显着差异(图1O,P)。ANOVA 检验比较仅源自高风险遗传因素的模型或将它们与 lncFT 相结合,导致 PFS(p值 = 6.3e-5,图 S4B)和 OS(p-值 = .019,图 S4C)。这些发现应该在其他 MM 队列中得到验证,但我们的结果表明,MM 中的 lncFT 可能有助于更好的患者分层,在治疗选择方面影响患者管理或有助于识别适合个性化治疗的特定患者亚组。
总之,本研究首次全面展示了 MM 中表达的 lncFTs 和 RTs,证明 FTs 也可能在正常 B 细胞中表达,并且复发 lncFTs 的表达可能对 MM 患者的 PFS 和 OS 产生重大影响。