Open search unveils modification patterns in formalin-fixed, paraffin-embedded thermo HCD and SCIEX TripleTOF shotgun proteomes

https://doi.org/10.1016/j.ijms.2019.116266Get rights and content

Highlights

  • We offer a concrete list of five mass shifts to include when identifying proteins from FFPE tissues by shotgun proteomics.

  • We offer a variety of methods that may be used to detect excessive false discovery rates in open search algorithms.

  • We provide workarounds for problems likely to vex SCIEX TripleTOF users when attempting to use open-source software.

Abstract

The application of database search algorithms with very wide precursor mass tolerances for the “Open Search” paradigm has brought new efforts at post-translational modification discovery in shotgun proteomes. This approach has motivated the acceleration of database search tools by incorporating fragment indexing features. In this report, we compare open searches and sequence tag searches of high-resolution tandem mass spectra to seek a common “palette” of modifications when analyzing multiple formalin-fixed, paraffin-embedded (FFPE) tissues from Thermo Q-Exactive and SCIEX TripleTOF instruments. While open search in MSFragger produced some gains in identified spectra, careful FDR control limited the best result to 24% more spectra than narrow search (worst result: a loss of 9%). Open pFind produced high apparent sensitivity for PSMs, but entrapment sequences hinted that the actual error rate may be higher than reported by the software. Combining sequence tagging, open search, and chemical knowledge, we converged on this set of PTMs for our four FFPE sets: mono- and di-methylation (nTerm and Lys), single and double oxidation (Met and Pro), and variable carbamidomethylation (nTerm and Cys).

Introduction

A confluence of technologies in the early 2000s made formalin-fixed, paraffin-embedded (FFPE) tissues a viable resource for biomarker discovery. Rapidly scanning LTQs (linear ion traps) became available in 2003 [1]. LC-MS/MS for protein identification matured considerably as a discipline as Molecular and Cellular Proteomics produced its initial publication guidelines in 2004 [2]. Fractionation strategies for digging deeply into complex mixtures became commonplace as methods such as GeLC-MS (2003) [3] and MudPIT (1999) [4] became widespread. Greater cell type specificity became possible through laser capture microdissection (1999) [5]. Bernard Metz and coworkers began untangling the complex chemistry of FFPE through chemical treatment of synthetic peptides in 2004 [6]. In 2005, Brian Hood and colleagues produced some of the first LC-MS/MS inventories of FFPE tissues [7].

Hood’s early experiment combined laser capture microdissection and gas-phase fractionation with a Thermo Scientific LTQ to identify approximately 2000 distinct peptides from each sample. Their work revealed key themes that would dominate FFPE proteomics for years to come. First, many peptides released from FFPE samples contained “missed cleavages,” internal basic residues that increased their precursor charges; high resolution of precursor ions and eventually fragment ions would make these peptides more accessible in years to come. Second, FFPE peptides less frequently featured lysine residues at their C-termini, hinting at the role of primary amines in FFPE modification chemistry. Hood et al. gamely identified formyl adducts (+12 Da) of lysines and noted that more than half of observed Met residues were oxidized in some of their LC-MS/MS experiments. The broader palette of modifications, however, awaited more powerful informatic tools.

Identifying post-translational modifications under the 1995 Sequest model [8] requires a foreknowledge of all PTM masses along with the residues where they may be observed. Because the algorithm must consider an exponentially growing number of decorated peptides as multiple potential modification sites are found in each peptide sequence, PTM searches have gained a reputation for prohibitive slowness as well as a penchant for false positives [9]. Bioinformatics researchers have produced a variety of alternative strategies for the discovery of PTM masses, particularly sequence tagging and spectral alignment. The first of these techniques attempts to infer partial sequences directly from fragment ions and then allow for arbitrary mass shifts to be introduced to either side of this “tag” to reconcile the spectrum to a database peptide sequence [10]. Spectral alignment approaches such as QuickMod [11] or spectral networks [12] compare experimental MS/MS scans from different precursor masses to determine if fragment ions align with or without adding the precursor mass difference as a PTM. The approaches were generally recommended as a strategy for determining which PTMs should be included in a final database search rather than being seen as authoritative identifications of themselves [13]. In 2015, Ying Zhang and colleagues used spectral alignment for PTM discovery in FFPE to discover the prominence of lysine methylation in FFPE LC-MS/MS data [14].

Joel Chick of the Gygi Laboratory published the open search strategy in 2015 [15] offering the attractive promise of identifying 28% more spectra at similar FDRs by simply relaxing the precursor mass filter in Sequest (invoking a “∼10-fold” time penalty). Established proteome informatics teams at the University of Michigan and the Beijing Chinese Academy of Sciences created new search engines (MSFragger [16]) or modified existing ones (Open pFind [17]) to greatly accelerate the database search algorithm to enable routine open searching. A key enabling development was the use of indexing to determine which experimental spectra matched a calculated fragment ion mass, irrespective of precursor mass, a strategy first published in 2007 for Marshall Bern’s ByOnic algorithm [18]. Kong et al. achieved similar gains in identified spectra to those of the Gygi team through MSFragger, reporting 33.6% more spectra identified with a much lower time penalty. Notably, the MSFragger paper incorporated a single paragraph relating the use of the software to detect methylol adducts (+30 Da) in 442 LC-MS/MS experiments from FFPE breast cancer data (Kong et al. remained mute, however, concerning the frequency of lysine methylation in these data.). When Hao Chi and colleagues adapted pFind for open search by adding fragment indexing, precursor analysis, sequence tagging database reduction, and machine learning discrimination, they were able to profoundly boost identification rates over their earlier “narrow” searches in pFind, gaining between 61.2% and 88.2% PSMs [17].

To mainstream proteomics users, such gains in identified PSMs might have seemed the answer to long-standing frustrations surrounding the low fraction of MS/MS scans that we identify from experiments (50% of MS/MS scans measured at high resolution is currently thought to be a good result). Long experience in bioinformatics, however, caused our team to feel some skepticism at these claims, and we designed this study to evaluate the gains achievable through open search algorithms. We chose to emphasize FFPE experiments because of their complex and long-studied modification chemistry, and we included both Thermo high-resolution “HCD” [19] RAW files and SCIEX high-resolution “beam-type CID” WIFF files from the TripleTOF [20], since the latter have seen only minimal attention in open search articles to date. Would open search substantially improve the fraction of spectra we identified? Would its discovered palette of PTMs expand greatly upon those visible through sequence tagging?

Section snippets

Results and discussion

The open search approach was tested in the context of four FFPE data sets; “Zimmerman” and “Nair” were generated by Thermo Q-Exactive instruments, while “Nielsen” and “Buthelezi” were produced on SCIEX TripleTOF instruments. Zimmerman includes five technical LC-MS/MS replicates of a human colon tumor and was initially created to support a tutorial workshop. Nair encompasses five Group 3 and five Group 4 medulloblastomas from banked specimens, each analyzed by LC-MS/MS [21]. Nielsen investigated

Data sets

Nielsen” data were acquired from ProteomeXchange accession PXD000743 [22]. Twenty experiments in WIFF format were generated on a SCIEX TripleTOF 5600+ at Aarhus University in Aarhus, Denmark, during March of 2014. Four LC-MS/MS experiments were conducted from each of five patients (indicated by the numbers 36, 38, 39, 40, and 41), with ‘A’ files representing amyloid tissue and ‘B’ files representing control tissue. “-2” and “-3” suffixes represented two technical replicates from each sample.

Conclusion

As should be apparent from the Nielsen and Buthelezi results, open search is just as applicable to TripleTOF shotgun experiments as it is for Q-Exactive and other Thermo shotgun experiments. The readiness of software pipelines for this purpose, however, could certainly be improved. The SCIEX MS Data Converter remains in beta since 2012, and yet it represents the only way to produce high-quality ProteinPilot peak lists. ProteoWizard msConvert can produce Analyst centroid peak lists, but it

Author contributions

B.M. managed early data conversion steps for SCIEX WIFF files and visualized key statistics. J.O. contributed feedback on informatics pipeline design. O.N. and J.M.B. provided previously unpublished FFPE data. S.B. and S.S. provided previously unpublished FFPE data.

Declaration of competing interest

The authors declare no conflict of interest.

Acknowledgment

D.L.T. thanks the Laboratory of Alexey Nesvizhskii for discerning problems in the initial configurations of the MSFragger algorithm. The assistance of Hao Chi and Xin-Yi Xu was essential in pairing the Open pFind software with TripleTOF data. The authors are grateful to Nadia Sukusu Nielsen and Jan J. Enghild for making their WIFFs available as PXD000743, and we appreciate Lisa J. Zimmerman for making her FFPE tutorial data available as PXD001651. This work was supported by Research

References (38)

  • J.R. Yates et al.

    Method to correlate tandem mass spectra of modified peptides to amino acid sequences in the protein database

    Anal. Chem.

    (1995)
  • S. Dasari et al.

    Sequence tagging reveals unexpected modifications in toxicoproteomics

    Chem. Res. Toxicol.

    (2011)
  • E. Ahrné et al.

    QuickMod: a tool for open modification spectrum library searches

    J. Proteome Res.

    (2011)
  • N. Bandeira et al.

    Protein identification by spectral networks analysis

    Proc. Natl. Acad. Sci. U.S.A.

    (2007)
  • J.D. Holman et al.

    Informatics of protein and posttranslational modification detection via shotgun proteomics

    Methods Mol. Biol.

    (2013)
  • Y. Zhang et al.

    Unrestricted modification search reveals lysine methylation as major modification induced by tissue formalin fixation and paraffin embedding

    Proteomics

    (2015)
  • J.M. Chick et al.

    A mass-tolerant database search identifies a large proportion of unassigned spectra in shotgun proteomics as modified peptides

    Nat. Biotechnol.

    (2015)
  • A.T. Kong et al.

    MSFragger: ultrafast and comprehensive peptide identification in mass spectrometry-based proteomics

    Nat. Methods

    (2017)
  • H. Chi et al.

    Comprehensive identification of peptides in tandem mass spectra using an efficient open search engine

    Nat. Biotechnol.

    (2018)
  • Cited by (8)

    • Concentrated ionic liquids for proteomics: Caveat emptor!

      2023, International Journal of Biological Macromolecules
    • Comparison of different digestion methods for proteomic analysis of isolated cells and FFPE tissue samples

      2021, Talanta
      Citation Excerpt :

      Overall, as the most abundant modification authors identified +14 Da modification on lysine and N-term referring to methylation. On the other hand, +30 Da lysine modification in Tabb et al. dataset fell out TOP 10 PTMs probably due to wider range of investigated modifications [60]. Moreover, our data are in agreement with previously published work by Zhang et al. and Coscia et al. who reported increased on lysine methylation in FFPE tissue in terms of spectral counts and XIC-based label free quantitation [59,61].

    • Direct infusion–tandem mass spectrometry combining with data mining strategies enables rapid chemome characterization of medicinal plants: A case study of Polygala tenuifolia

      2021, Journal of Pharmaceutical and Biomedical Analysis
      Citation Excerpt :

      When the replacement of LC with GPF theory, all components in the extract simultaneously arrive at the ion source, receive ionization, and enter the Q1 cell to undergo mass spectrometric separations. Because of the selectivity ability of Q1 cell for MS/MSALL ® program, only a small portion (1 Da mass window) of MS1 ion cohort is allowed to enter the collision chamber at one time to undergo collision with neutral gas to generate a set of fragment ion species that are afterwards transmitted into Tof-chamber (scan rate as 7.5 MS/MS sans per second [26]) to yield MS2 spectrum within approximately 0.13 s. Therefore, the detected fragment ion species are soured from the portion of ion current, and the data file is output as nominal MS1-MS2 style. In most cases, only a single signal is observed fortunately, in each unit mass window.

    • Shotgun chemome characterization of Artemisia rupestris L. Using direct infusion-MS/MS<sup>ALL</sup>

      2021, Journal of Chromatography B: Analytical Technologies in the Biomedical and Life Sciences
      Citation Excerpt :

      Moreover, either data-dependent acquisition (DDA) [11] or data-independent acquisition (DIA) [12] mode can conveniently record the desired MS2 spectra under such situation, although the amount of MS2 spectra is usually limited by the scan rate of hybrid quadrupole-time of flight-MS (Qtof-MS) at DDA mode and annoying signal assignment task is always initiated by dissociating all concurrent precursor ions when applying DIA algorithm. In recent decades, great achievements have been reached in terms of the resolution (as great as 140 000 FWHM [13]) as well as the scan rate (as fast as 8.4 MS/MS scans per second [14]) for Qtof-MS, resulting in promising advantages in regards of selectivity and specificity. The so-called mass spectrometric separation indeed enables the discrimination of all components, except isomers.

    • PTM-shepherd: Analysis and summarization of post-translational and chemical modifications from open search results

      2021, Molecular and Cellular Proteomics
      Citation Excerpt :

      FFPE samples are also typically analyzed after long-term storage, during which they could be exposed to high temperatures and sunlight (25). Although previous studies have examined which modifications should be included when analyzing proteins from FFPE samples (25), this was revisited recently by Tabb et al. (16) using a two-pass search. First, an open search was used to identify prevalent mass shifts.

    View all citing articles on Scopus
    View full text