Introduction

Sarcomas are a rare group of heterogeneous tumors that arise within bone or soft tissues including connective tissue, muscle, nerves, blood vessels, and fat [1]. Sarcomas account for roughly 1% of all adult cancers and around 20% of pediatric solid malignancies [2, 3]. Soft tissue tumors are conventionally classified according to morphological, immunohistochemical, and in many cases molecular characteristics into more than 100 distinct subtypes, which can make differential diagnosis difficult [1]. At a molecular level they are classified into two main categories: sarcomas with specific genetic alterations and sarcomas displaying multiple complex karyotypic abnormalities [1]. Focusing on the first category, most sarcoma-specific genetic alterations involve chromosomal translocations that result in fusion genes. Fusion genes arise from the juxtaposition of two separate genes, resulting in the translation of deregulated or chimeric proteins with altered function and potential oncogenic activity [4]. It has been estimated that approximately one third of sarcomas carry a detectable driver fusion gene [5, 6].

Whilst some fusion genes in tumors of different origin appear to be passenger mutations resulting from genetic instability, in sarcomas the majority are considered strong driver alterations [4]. Certain sarcoma subtypes display recurrent, often pathognomonic translocations resulting in fusion genes, the identification of which can support or define diagnosis [7, 8]. The detection of recurrent translocations that are present in some benign soft tissue and bone tumors is also of diagnostic utility [1]. Copy number changes in particular genes can also support diagnosis. For example, amplification of genes within 12q13-15 including MDM2 and CDK4 genes in adipocytic neoplasms has been shown to be characteristic of well differentiated and dedifferentiated liposarcomas [9]. Detection of MDM2 amplification using FISH is therefore an extremely useful ancillary diagnostic tool [10, 11].

In addition to their diagnostic potential, it has been demonstrated that detection of these chromosomal aberrations and fusion genes may be of use in the treatment of sarcoma patients, highlighting the importance of accurate diagnosis [12,13,14,15,16,17,18]. This is emphasized by the 2016 Clinical Effect of Molecular Methods in Sarcoma Diagnosis (GENSARC) study, in which 53 out of 384 (13.8%) sarcoma diagnoses were amended following molecular findings. These new findings not only modified patient subtype diagnosis but also had implications for patient management as well as prognosis assessment [19].

Due to the vast number of sarcoma histological sub-classifications and overlapping histological and immunohistochemical features, conventional hematoxylin and eosin staining and immunohistochemistry (IHC) are used to support diagnosis. In addition, molecular analysis using reverse transcription polymerase chain reaction (RT-PCR), fluorescence in situ hybridization (FISH) and more recently next generation sequencing (NGS) are extremely valuable diagnostic tools [20, 21]. Whilst RT-PCR and FISH are considered the gold standard tools for fusion gene detection in sarcomas in the clinical setting, there are certain disadvantages to these tests. Required prior knowledge of alterations, the lability of RNA and scalability constraints can all pose challenges using these methods.

NGS however is a high-throughput, cost-effective tool allowing for multiple chromosomal regions to be sequenced in parallel to interrogate a wide range of genetic variation across the genome [22, 23]. Targeted NGS, in particular, allows for a set of genes or regions to be enriched and sequenced in a single assay, reducing cost, turnaround time, and data analysis burden [24]. However, there are currently no commercially available targeted NGS panels for fusion gene detection in sarcoma using DNA, only NGS tests focused on RNA sequencing and fusion gene analysis, most of which do not support copy number variation (CNV) or mutation detection [25, 26].

For this study, we designed a custom capture-based NGS panel for use with genomic DNA extracted from soft tissue tumor and bone tumor formalin-fixed paraffin embedded (FFPE) tissue. Our aims were to assess the feasibility of this panel and determine whether this assay can detect clinically-relevant fusion genes and copy number variations in diagnostic FFPE tissue samples that have previously been characterized by FISH or RT-PCR in clinical laboratories.

Methods

Ethics approval and consent to participate

Local approval was obtained for the molecular analysis of all clinical material in this study according to standard clinical practice. Ethics approval was obtained from the Northern Ireland Biobank (NIB) (NIB17-0231). The RNOH Biobank was approved by the National Research Ethics Committee of the Health Research Committee (reference 15/YH/0311: Integrated Research Application System (IRAS) project identifier: 18309). This specific project was approved by the National Research Ethics Committee approved UCL/UCLH Biobank Ethics Committee (specific project reference no. EC17.14).

Tumor specimen collection

A total of 113 resection (n = 83) and core biopsies (n = 30) were selected from FFPE preserved sarcoma cases dating from 2013–2018 from the Belfast Health and Social Care Trust (Belfast, United Kingdom), Queen Elizabeth Hospital (Birmingham, United Kingdom), Radboud University Medical Center (Nijmegen, Netherlands), Royal Infirmary of Edinburgh (Edinburgh, Scotland) and Royal National Orthopedic Hospital (London, United Kingdom). The selection criteria required all samples to have previously tested positive by FISH/RT-PCR for either a fusion gene or MDM2 amplification during routine diagnostics at the respective collaborating institution, as part of the standard of care for those patients. The 113 samples included 22 tumor subtypes harboring either MDM2 gene amplification or one of 25 distinct chromosomal translocations including ALK-EML4, ASPSCR1-TFE3, BCOR-CCNB3, C11orf95-MKL2, COL1A1-PDGFB, COL1A1-USP6, ETV6-NTRK3, EWSR1-ATF1, EWSR1-ERG, EWSR1-FLI1, EWSR1-NR4A3, EWSR1-WT1, FUS-CREB3L2, FUS-DDIT3, FUS-ERG, HEY1-NCOA2, NAB2-STAT6, PAX3-FOXO1, RANBP2-ALK, SS18-SSX1, SS18-SSX2, TAF15-NR4A3, USP6-FGFR1, USP6-MYH9, and WWTR1-CAMTA1.

DNA Isolation from FFPE

Tumor content was determined by a pathologist by reviewing the hematoxylin and eosin (H&E) stained tissue sections and estimating the percentage of malignant nuclei out of the total nucleated cellularity. DNA was extracted from scrolls or macrodissected unstained sections of FFPE tissue (3–8 × 5 µm thick sections per extraction) using the Maxwell 16 FFPE Tissue LEV DNA Purification Kit (Promega, Wisconsin, US) according to manufacturer’s instructions.

For FFPE sections on slides, prior to extraction, sections were dewaxed on the Tissue-Tek® Film® Prism (Sakura, Japan) involving two xylene 2-min incubations, one xylene 1-min incubation, a 95% ethanol 1-min incubation and a 90% ethanol 1-min incubation. Immediately after dewaxing, using an annotated H&E reference slide to highlight tumor areas, the unstained sections were macro-dissected to enrich for tumor tissue before placing in a collection tube containing 200 μL of incubation buffer. For tissue received in scrolls, no dewaxing or macro-dissection was performed and 200 μL of incubation buffer was added directly to the sample. Proteinase K treatment was performed by incubating the sample at 70 °C overnight before completing the automated DNA extraction using the Maxwell® 16 Instrument, configured for LEV operation, with final elution into 50 μL of nuclease-free water. DNA was quantified using the Qubit 2.0 Fluorometer (Thermo Fisher Scientific, MA, US) and quality assessed using either the Fragment Analyzer™ Automated CE System or 4200 TapeStation System.

Panel design

An in-depth literature review was performed to identify gene translocations involved in sarcomas. Fusion breakpoint sequences were collected from relevant case reports and precise chromosome locations were validated with Ensembl GRCh38.p12 (https://www.ensembl.org) and Integrative Genomics Viewer (IGV) (http://software.broadinstitute.org/software/igv) [27, 28]. The start and end chromosomal location of each intron for all 202 fusion variants was identified using Ensembl. For copy number variation, probes were designed for the chromosomal location of the exons of the genes of interest and surrounding regions spanning 200 kb. All chromosomal locations were checked against areas of poor mappability and CpG islands using the ENCODE Duke Mappability and Uniqueness track and CpG island track available for download from the University of California Santa Cruz (UCSC) Genome Browser. The final design was manufactured through the EZ Choice Probe design (Roche Nimblegen, Madison, WI, USA).

NGS library construction and sequencing

Where available, library preparation was performed on 200 or 400 ng of DNA. This decision was based on average fragment size of the DNA measured using either the Fragment Analyzer™ Automated CE System or 4200 TapeStation System. Either 200 ng of good quality DNA (>1000 bp average length) or 400 ng of poorer quality DNA (<1000 bp average length) was loaded into the preparation reaction.

The KAPA HyperPlus Kit (Roche Sequencing Solutions, Inc., California, USA), SeqCap EZ adapters and associated SeqCap target enrichment reagents (Roche Sequencing Solutions, Inc., California, US) were used according to the manufacturer’s protocol, including dual size selection of the libraries (250–450 bp). Following enzymatic fragmentation, DNA was end-repaired, A-tailed, indexed adapters ligated and amplified using PCR (6 cycles) before dual size selection. For each run, 24 libraries were hybridized overnight (16–20 h) using 1 μg of the pooled libraries and custom designed biotinylated DNA baits complementary to the genomic regions of interest (NimbleGen SeqCap EZ library, Roche Nimblegen, Madison, WI, US). Following hybridization and stringency washes, the resulting DNA was PCR amplified (11 cycles) and purified using AMPure XP beads (Beckman Coulter, Danvers, MA, US) at a ratio of 1.8× and quantified using the KAPA Quantification q-PCR Kit (KAPA Biosystems, Wilmington, MA, US) and Qubit High Sensitivity (HS) assay kit with the Qubit 2.0 Fluorometer (Thermo Fisher Scientific, MA, US). The quality and molarity of the PCR-amplified hybridized library was measured using the Fragment Analyzer™ Automated CE System or 4200 TapeStation System.

Sequencing was performed on a NextSeq 500 (Illumina, San Diego, CA, USA) using the NextSeq 550 System Mid-Output Kit v2 (150 cycles) with 76 bp paired-end read cycles according to the manufacturer’s instructions.

Low input NGS library construction and sequencing

FFPE samples with less than 200 ng of DNA input available were also prepared for sequencing, irrespective of DNA quality, with DNA input varying between 15 and 170 ng. Samples with a DNA input between 40 and 200 ng were processed exactly as above. In samples where DNA input was lower than 40 ng, three additional PCR cycles (9 cycles total) were performed following enzymatic fragmentation, DNA end repair and A-tailing and indexed adapters ligation.

Structural variant data analysis

Base calls and quality scores were provided by the NextSeq 500 using real time analysis (RTA 2.0). FASTQ data were aligned to the hg38 build of the human reference genome using the Isaac Aligner (iSAAC-03.16.02.20) [29]. Translocations were detected from the aligned sequence data using Manta Structural Variant Caller (v0.28.0) [30]. The analysis was performed blinded to the original results obtained by each respective Center. Any samples with undetected translocations with the analysis pipeline were manually analyzed using integrative genomics viewer (IGV) by color coding reads by insert size and grouping alignments by chromosome of mate [28, 31].

Copy number variation data analysis

The fastq sequences were aligned to human reference genome version GRCh38/hg38 using Burrow-Wheeler Aligner (BWA, v0.7.17) [32]. The aligned reads were sorted, merged, and filtered using SAMtools (v4.0.12.0) [33]. Picard tools from GATK was used to sort, mark, and remove the duplicates. Local realignment of reads around INDELs and base recalibration were performed using Genome Analysis Toolkit (GATK) [34]. DNA copy number information was first acquired using CopywriteR R package version 2.16.0 utilizing off target reads from targeted sequencing data [35]. The results from this confirmed that MDM2/CDK4 genes were not amplified in fusion positive sarcoma cases (Supplementary File 1: Fig. S1) CNVPanelizer R package Version 1.16.0, which is based on subsampling strategy to predict the CNVs, was then used to detect MDM2 and CDK4 amplification in the respective cases. CNVpanelizer compares MDM2/CDK4 amplified samples with the non-amplified pool of fusion positive samples.

Results

Panel design

The overall objective was to develop a sarcoma-specific targeted NGS gene panel to detect both fusion genes and common copy number variants in FFPE-extracted DNA. This gene panel targets 202 unique fusion junctions involving 87 sarcoma-related fusion genes and 7 genes with known copy number variation in sarcoma (Supplementary File 2: Table S1). Fourteen additional fusion junctions were identified during the literature search but were excluded from final panel design due to their large intron size, rarity and/or diagnostic value (Supplementary File 2: Table S1).

Sequencing quality

A validation set of 100 sarcoma samples with >200 ng of DNA characterized either by FISH (n = 61) or RT-PCR (n = 39), consisting of 22 soft tissue and bone tumor subtypes covering 24 distinct chromosomal translocations and MDM2/CDK4 gene amplifications were used. The average tumor content was 70% (range 30–90%). An initial repeatability assessment was performed by repeating the first sequencing analysis (with 24 clinical samples) from library preparation on two different hybridizations and two different sequencing runs on different dates. Both sequencing runs were able to identify the same fusion genes following analysis. The targeted NGS assay yielded an average of 7,595,994 reads per sample (range 2,590,654–13,667,938). The mean depth of unique coverage was 602.6× ± 161.6 SD.

Detection of structural variants

Out of the 100 patient samples, 89 harbored a fusion gene and 87/89 (97.8%) fusion genes were detected by the NGS assay (Table 1). All fusions had previously been confirmed during diagnostic testing using FISH, RT-PCR, or both. In the two discrepant cases, the NGS assay failed to detect WWTR1-CAMTA1 and CIC-DUX4 translocations which were detectable using RT-PCR and FISH, respectively. The analytical sensitivity of the panel for detection of fusion genes in this validation set was 97.8%, and the analytical specificity was 100%, with no false positives observed across any of the fusion genes detectable by the NGS panel in any of the cases.

Table 1 Table of sarcoma subtypes and abnormalities detected.

Eleven of the 100 patient samples were characterized by MDM2 amplification confirmed by FISH. The NGS assay detected both MDM2 and CDK4 amplification in 10/11 (91%) of these samples, failing to detect MDM2 or CDK4 amplification in one sample. As only MDM2 amplification is tested diagnostically and therefore comparable to our test, the analytical sensitivity of the NGS assay in detecting MDM2 amplification in this small cohort of samples is 91% and analytical specificity 100%, with no false positives being detected (Table 1).

To determine minimum DNA input required for the assay, DNA extracted from a further 13 samples from which 200 ng of DNA could not be obtained was used. The DNA quantity available from these samples ranged from 15 to 171 ng with an average fragment size of 1379 bp. For this cohort of patients, the targeted NGS assay yielded an average of 5,049,051 reads per sample (range 2,227,787–7,074,052) and the mean unique target coverage depth was 233× ± 120.8 SD. Out of the 13 samples with lower DNA input, the NGS assay detected all fusion genes previously identified using FISH or RT-PCR (Table 2).

Table 2 Low input DNA samples.

Out of the 113 samples combined, a total of 25 fusion genes out of a possible 87 fusions targeted by the panel were identified (Fig. 1). A full sample list including translocations identified and their location are also provided (Supplementary File 3: Table S2).

Fig. 1
figure 1

Chord diagram displaying the 25 fusion gene variants identified in 113 patients.

Test sensitivity and specificity

When combining both fusion gene detection and MDM2/CDK4 amplification in all 113 samples, the assay sensitivity was 97% (92.5–99.1% 95% CI) and the observed specificity remained at 100%. The failure rate of the panel, even when using low DNA inputs, such as those obtained from core biopsies (10 of 13 samples), remained at 0%. A sample would be considered failed if the minimum read depth was less than 100× and no alterations were detected.

Analysis performance

Overall, 80% (n = 82) of translocations were called by structural variant caller Manta during automated analysis, with manual IGV analysis only required for the remaining 20% (n = 20) of fusion positive cases. MDM2 and CDK4 amplification was detected in the respective ten cases using R package CNVPanelizer, with a log2ratio of 1.5 or above considered as gene amplification [36]. Figure 2 shows the MDM2 and CDK4 amplified samples including centromeric and telomeric control regions. Significant difference was observed between MDM2 with its centromeric (p-value = 0.002) and telomeric (p-value = 0.008) region, demonstrating gene amplification instead of polysomy.

Fig. 2
figure 2

Boxplot showing log2 ratio for CDK4, MDM2, centromere and telomere control regions for the 10 MDM2-amplified well/dedifferentiated liposarcoma and intimal sarcoma cases. p-value was calculated using a student's t-test.

Discussion

The aim of this study was to design and validate a DNA-based custom targeted sequencing panel to detect known and clinically relevant fusion genes and copy number variants in FFPE material from soft-tissue and bone tumors and to assess its potential clinical applicability. In total, 25 different fusions were identified in 102 cases using the targeted gene panel in a cohort of 113 patients, representing only a third of the fusions that the panel can potentially detect. MDM2/CDK4 amplification was detected in 10 out of the 11 cases with MDM2 amplification detected by FISH and in none of the fusion gene positive cases. An expanded patient cohort would be required to evaluate the additional fusion genes and copy number alterations.

FISH can detect translocations by probing for either breakage of one gene using break-apart probes or the joining of two specific gene partners using fusion probes, however multiple tests per sample are generally required as it lacks multiplexing capability, resulting in increased tissue requirements, labor and cost [37]. RT-PCR on the other hand can be more sensitive than FISH in specimens with lower tumor content [38,39,40]. RT-PCR however may not detect rare fusion genes or those with unusual breakpoints and cannot easily detect copy number variation, for example MDM2 amplification, making FISH a continual requirement for comprehensive sarcoma testing [41].

In a previous study of 772 EWSR1-rearranged soft tissue sarcoma FFPE samples, FISH failed in 2.5% of cases whereby RT-PCR failed in 18% [41]. In another study involving 50 synovial sarcoma cases, RT-PCR was found to be more sensitive than FISH, yet concordant results were only reported in 76% of cases [42]. Failure rates of up to 25% have also been reported in fusion positive tumor samples using RNA NGS sequencing, mainly attributed to small sample size, tissue heterogeneity and tissue fixation processes [25, 43]. Conclusive molecular diagnostic test results therefore cannot be achieved in up to a quarter of fusion-positive sarcoma patients with current methods[41, 42, 44].

Due to the lower input requirements, sequencing costs and data storage, panel-based targeted sequencing is currently more suited to the clinical setting than more comprehensive approaches, such as whole genome sequencing (WGS), and there are many examples of the development and implementation of targeted sequencing panels in oncology [45,46,47,48]. The sarcoma-specific NGS panel described here can detect both fusion gene partners without any prior knowledge of the diagnosis, irrespective of whether both, or only one gene partner are included in the panel design. This has advantages over both FISH and RT-PCR: In Ewing’s sarcoma, for example, EWSR1 is most frequently fused with FLI1 or ERG accounting for over 90% of cases [49]. RT-PCR primer design generally focuses on the more common fusion type and false negatives can therefore be reported in cases with rarer fusion transcripts (10% of all Ewing’s sarcomas). Whilst EWSR1 FISH break-apart probes can combat this problem for the majority of cases, a small subset of Ewing’s sarcomas harbor FUS gene rearrangements, substituting for EWSR1, which would consequently result as a false negative using this technique [50]. In addition, the fusion gene partner remains unknown, which can be crucial for subtype diagnosis, since EWSR1 rearrangements are also present in multiple sarcoma subtypes other than Ewing’s sarcoma. Similarly, being unable to identify different partners that have an effect on prognosis, such as SSX1 and SSX2 in synovial sarcomas or PAX3 and PAX7 in alveolar rhabdomyosarcomas, can limit the usefulness of FISH [51, 52]. FISH analysis may also be limited by nuclear truncation artefacts that can lead to false positive or negative results. The presence of a split signal caused by a non-functional rearrangement may also lead to false positive results [53]. NGS on the other hand, provides the exact breakpoint and chromosomal location of both genes involved in the translocation limiting the possibility of a false positive result.

The targeted gene panel can simultaneously screen multiple genes in multiple samples in a single test, alleviating the need for multiple RT-PCR or FISH tests per sample thereby reducing the price, turnaround time as well as the amount of tissue required to achieve a definitive sarcoma subtype diagnosis. Sequencing with the targeted gene panel can be completed from library preparation to data analysis within 5 days for manual preparation or 3 days using automation. Analysis time and cost is also significantly reduced based on the automatic detection of approximately 80% of translocations (based on this cohort). A key advantage of this NGS panel is that it can be performed alongside immunohistochemical techniques instead of at the end of the histological diagnosis, as it does not require prior knowledge of the alterations to look for, potentially reducing the time to arrive at a comprehensive diagnosis. Nonetheless, FISH or RT-PCR may be the preferred method for confirming suspected translocations in urgent cases such as those requiring rapid, integral chemotherapy treatment or in cases with a strongly favored diagnosis.

Our results show that gene fusions can be accurately identified with as little as 15 ng input DNA from FFPE, making it suitable for both resection and core biopsy specimens. Assay sensitivity for FISH, RT-PCR, and particularly for WGS, decreases when using FFPE tissue in comparison to fresh tissue or in samples with significant heterogeneity, whereas the NGS targeted panel maintains its high sensitivity with lower FFPE DNA input [54,55,56]. The NGS panel showed an overall sensitivity of 97% with no false positives and 0% failure rate, indicating that it can be implemented in a clinical setting without compromising on performance against FISH and RT-PCR. Furthermore, since the panel is DNA-based it can be applied to circulating cell-free DNA, opening opportunities to non-invasive sarcoma diagnosis and monitoring, as demonstrated recently in a small group of EWRS1-rearranged sarcomas [57].

An obvious limitation of targeted NGS is the ability to detect fusion genes only in targeted regions included in the design. This panel targets intronic regions which have been known to result in low sequencing coverage in highly repetitive GC-rich regions [58]. Whilst low coverage is observed in some target regions, the percentage coverage in this panel is high on average (96%, range 35.6–100%). At least one gene involved in a fusion gene must be targeted for translocation detection. The unidentified WWTR1-CAMTA1 fusion, confirmed by RT-PCR, can be explained by the omission of intron 7–8 of the CAMTA1 gene and intron 4–5 of the WWTR1 gene from panel design, due to the rarity of epithelioid hemangioendothelioma (prevalence of less than 1 in a million), and more specifically the rarity of the particular breakpoints in question [59]. This however could be easily amended by expanding the size of the gene panel and including these intronic regions in an updated version.

The second discrepant case was a round cell sarcoma with a CIC rearrangement confirmed by FISH that was not detected by the NGS assay. This could be due to the rearrangement having a breakpoint not previously reported in the literature and thus not targeted by the panel but could also be due to poor coverage in the CIC and DUX4 regions on the panel, containing highly repetitive regions with poor mappability and, as such, shows lower coverage than most other regions in the NGS panel. More cases with this fusion are required to fully assess the limitations of the NGS panel.

Although this NGS panel is extremely comprehensive, it can be expanded to incorporate new fusion sarcomas and new fusion variants in known tumors. The approach to bone and soft tissue tumor diagnosis is well established and incorporates the clinical scenario, histopathological findings including IHC and the radiological impression which is particularly pertinent for bone tumors and molecular analysis. As a case may be encountered in which a diagnosis is strongly favored, it is important to consider all of these diagnostic parameters before offering a definitive diagnosis. In select cases, recourse may be taken to FISH or RT-PCR, however, given the breadth of this NGS panel and the option for its further expansion, this may be an uncommon scenario.

The MDM2 gene was considered amplified if the log2ratio was greater than 1.5 as our cohort contained an intimal sarcoma with low level MDM2 amplification. The average log2ratio for well/dedifferentiated liposarcomas was 5.6. The NGS assay failed to detect MDM2 amplification above the 1.5 log2ratio threshold in a case previously confirmed using a MDM2-CEP12 FISH probe. This particular case showed MDM2 amplification in only a fraction of nuclei as well as polysomy (Supplementary File 4: Fig. S2), which may have affected the detection by NGS.

Based on sensitivity and specificity, lower input FFPE material required and cost effectiveness as well as its ability to detect both translocations and gene amplifications all in the one test, the targeted NGS panel is therefore a useful ancillary tool to be used alongside immunohistochemistry to improve sarcoma diagnosis.