Open search unveils modification patterns in formalin-fixed, paraffin-embedded thermo HCD and SCIEX TripleTOF shotgun proteomes
Graphical abstract
Introduction
A confluence of technologies in the early 2000s made formalin-fixed, paraffin-embedded (FFPE) tissues a viable resource for biomarker discovery. Rapidly scanning LTQs (linear ion traps) became available in 2003 [1]. LC-MS/MS for protein identification matured considerably as a discipline as Molecular and Cellular Proteomics produced its initial publication guidelines in 2004 [2]. Fractionation strategies for digging deeply into complex mixtures became commonplace as methods such as GeLC-MS (2003) [3] and MudPIT (1999) [4] became widespread. Greater cell type specificity became possible through laser capture microdissection (1999) [5]. Bernard Metz and coworkers began untangling the complex chemistry of FFPE through chemical treatment of synthetic peptides in 2004 [6]. In 2005, Brian Hood and colleagues produced some of the first LC-MS/MS inventories of FFPE tissues [7].
Hood’s early experiment combined laser capture microdissection and gas-phase fractionation with a Thermo Scientific LTQ to identify approximately 2000 distinct peptides from each sample. Their work revealed key themes that would dominate FFPE proteomics for years to come. First, many peptides released from FFPE samples contained “missed cleavages,” internal basic residues that increased their precursor charges; high resolution of precursor ions and eventually fragment ions would make these peptides more accessible in years to come. Second, FFPE peptides less frequently featured lysine residues at their C-termini, hinting at the role of primary amines in FFPE modification chemistry. Hood et al. gamely identified formyl adducts (+12 Da) of lysines and noted that more than half of observed Met residues were oxidized in some of their LC-MS/MS experiments. The broader palette of modifications, however, awaited more powerful informatic tools.
Identifying post-translational modifications under the 1995 Sequest model [8] requires a foreknowledge of all PTM masses along with the residues where they may be observed. Because the algorithm must consider an exponentially growing number of decorated peptides as multiple potential modification sites are found in each peptide sequence, PTM searches have gained a reputation for prohibitive slowness as well as a penchant for false positives [9]. Bioinformatics researchers have produced a variety of alternative strategies for the discovery of PTM masses, particularly sequence tagging and spectral alignment. The first of these techniques attempts to infer partial sequences directly from fragment ions and then allow for arbitrary mass shifts to be introduced to either side of this “tag” to reconcile the spectrum to a database peptide sequence [10]. Spectral alignment approaches such as QuickMod [11] or spectral networks [12] compare experimental MS/MS scans from different precursor masses to determine if fragment ions align with or without adding the precursor mass difference as a PTM. The approaches were generally recommended as a strategy for determining which PTMs should be included in a final database search rather than being seen as authoritative identifications of themselves [13]. In 2015, Ying Zhang and colleagues used spectral alignment for PTM discovery in FFPE to discover the prominence of lysine methylation in FFPE LC-MS/MS data [14].
Joel Chick of the Gygi Laboratory published the open search strategy in 2015 [15] offering the attractive promise of identifying 28% more spectra at similar FDRs by simply relaxing the precursor mass filter in Sequest (invoking a “∼10-fold” time penalty). Established proteome informatics teams at the University of Michigan and the Beijing Chinese Academy of Sciences created new search engines (MSFragger [16]) or modified existing ones (Open pFind [17]) to greatly accelerate the database search algorithm to enable routine open searching. A key enabling development was the use of indexing to determine which experimental spectra matched a calculated fragment ion mass, irrespective of precursor mass, a strategy first published in 2007 for Marshall Bern’s ByOnic algorithm [18]. Kong et al. achieved similar gains in identified spectra to those of the Gygi team through MSFragger, reporting 33.6% more spectra identified with a much lower time penalty. Notably, the MSFragger paper incorporated a single paragraph relating the use of the software to detect methylol adducts (+30 Da) in 442 LC-MS/MS experiments from FFPE breast cancer data (Kong et al. remained mute, however, concerning the frequency of lysine methylation in these data.). When Hao Chi and colleagues adapted pFind for open search by adding fragment indexing, precursor analysis, sequence tagging database reduction, and machine learning discrimination, they were able to profoundly boost identification rates over their earlier “narrow” searches in pFind, gaining between 61.2% and 88.2% PSMs [17].
To mainstream proteomics users, such gains in identified PSMs might have seemed the answer to long-standing frustrations surrounding the low fraction of MS/MS scans that we identify from experiments (50% of MS/MS scans measured at high resolution is currently thought to be a good result). Long experience in bioinformatics, however, caused our team to feel some skepticism at these claims, and we designed this study to evaluate the gains achievable through open search algorithms. We chose to emphasize FFPE experiments because of their complex and long-studied modification chemistry, and we included both Thermo high-resolution “HCD” [19] RAW files and SCIEX high-resolution “beam-type CID” WIFF files from the TripleTOF [20], since the latter have seen only minimal attention in open search articles to date. Would open search substantially improve the fraction of spectra we identified? Would its discovered palette of PTMs expand greatly upon those visible through sequence tagging?
Section snippets
Results and discussion
The open search approach was tested in the context of four FFPE data sets; “Zimmerman” and “Nair” were generated by Thermo Q-Exactive instruments, while “Nielsen” and “Buthelezi” were produced on SCIEX TripleTOF instruments. Zimmerman includes five technical LC-MS/MS replicates of a human colon tumor and was initially created to support a tutorial workshop. Nair encompasses five Group 3 and five Group 4 medulloblastomas from banked specimens, each analyzed by LC-MS/MS [21]. Nielsen investigated
Data sets
“Nielsen” data were acquired from ProteomeXchange accession PXD000743 [22]. Twenty experiments in WIFF format were generated on a SCIEX TripleTOF 5600+ at Aarhus University in Aarhus, Denmark, during March of 2014. Four LC-MS/MS experiments were conducted from each of five patients (indicated by the numbers 36, 38, 39, 40, and 41), with ‘A’ files representing amyloid tissue and ‘B’ files representing control tissue. “-2” and “-3” suffixes represented two technical replicates from each sample.
Conclusion
As should be apparent from the Nielsen and Buthelezi results, open search is just as applicable to TripleTOF shotgun experiments as it is for Q-Exactive and other Thermo shotgun experiments. The readiness of software pipelines for this purpose, however, could certainly be improved. The SCIEX MS Data Converter remains in beta since 2012, and yet it represents the only way to produce high-quality ProteinPilot peak lists. ProteoWizard msConvert can produce Analyst centroid peak lists, but it
Author contributions
B.M. managed early data conversion steps for SCIEX WIFF files and visualized key statistics. J.O. contributed feedback on informatics pipeline design. O.N. and J.M.B. provided previously unpublished FFPE data. S.B. and S.S. provided previously unpublished FFPE data.
Declaration of competing interest
The authors declare no conflict of interest.
Acknowledgment
D.L.T. thanks the Laboratory of Alexey Nesvizhskii for discerning problems in the initial configurations of the MSFragger algorithm. The assistance of Hao Chi and Xin-Yi Xu was essential in pairing the Open pFind software with TripleTOF data. The authors are grateful to Nadia Sukusu Nielsen and Jan J. Enghild for making their WIFFs available as PXD000743, and we appreciate Lisa J. Zimmerman for making her FFPE tutorial data available as PXD001651. This work was supported by Research
References (38)
- et al.
A two-dimensional quadrupole ion trap mass spectrometer
J. Am. Soc. Mass Spectrom.
(2002) - et al.
Working group on publication guidelines for peptide and protein identification data, the need for guidelines in publication of peptide and protein identification data: working group on publication guidelines for peptide and protein identification data, mol
Cell. Proteom.
(2004) - et al.
Profiling core proteomes of human cell lines by one-dimensional PAGE and liquid chromatography-tandem mass spectrometry
Mol. Cell. Proteom.
(2003) - et al.
Identification of formaldehyde-induced modifications in proteins: reactions with model peptides
J. Biol. Chem.
(2004) - et al.
Proteomic analysis of formalin-fixed prostate cancer tissue
Mol. Cell. Proteom.
(2005) - et al.
Dynamic spectrum quality assessment and iterative computational analysis of shotgun proteomic data: toward more efficient identification of post-translational modifications, sequence polymorphisms, and novel peptides
Mol. Cell. Proteom.
(2006) - et al.
Equivalence of protein inventories obtained from formalin-fixed paraffin-embedded and frozen tissue in multidimensional liquid chromatography-tandem mass spectrometry shotgun proteomic analysis
Mol. Cell. Proteom.
(2009) Quality assessment for clinical proteomics
Clin. Biochem.
(2013)- et al.
Direct analysis of protein complexes using mass spectrometry
Nat. Biotechnol.
(1999) - et al.
The potential use of laser capture microdissection to selectively obtain distinct populations of cells for proteomic analysis--preliminary findings
Electrophoresis
(1999)
Method to correlate tandem mass spectra of modified peptides to amino acid sequences in the protein database
Anal. Chem.
Sequence tagging reveals unexpected modifications in toxicoproteomics
Chem. Res. Toxicol.
QuickMod: a tool for open modification spectrum library searches
J. Proteome Res.
Protein identification by spectral networks analysis
Proc. Natl. Acad. Sci. U.S.A.
Informatics of protein and posttranslational modification detection via shotgun proteomics
Methods Mol. Biol.
Unrestricted modification search reveals lysine methylation as major modification induced by tissue formalin fixation and paraffin embedding
Proteomics
A mass-tolerant database search identifies a large proportion of unassigned spectra in shotgun proteomics as modified peptides
Nat. Biotechnol.
MSFragger: ultrafast and comprehensive peptide identification in mass spectrometry-based proteomics
Nat. Methods
Comprehensive identification of peptides in tandem mass spectra using an efficient open search engine
Nat. Biotechnol.
Cited by (8)
Concentrated ionic liquids for proteomics: Caveat emptor!
2023, International Journal of Biological MacromoleculesComparison of different digestion methods for proteomic analysis of isolated cells and FFPE tissue samples
2021, TalantaCitation Excerpt :Overall, as the most abundant modification authors identified +14 Da modification on lysine and N-term referring to methylation. On the other hand, +30 Da lysine modification in Tabb et al. dataset fell out TOP 10 PTMs probably due to wider range of investigated modifications [60]. Moreover, our data are in agreement with previously published work by Zhang et al. and Coscia et al. who reported increased on lysine methylation in FFPE tissue in terms of spectral counts and XIC-based label free quantitation [59,61].
Direct infusion–tandem mass spectrometry combining with data mining strategies enables rapid chemome characterization of medicinal plants: A case study of Polygala tenuifolia
2021, Journal of Pharmaceutical and Biomedical AnalysisCitation Excerpt :When the replacement of LC with GPF theory, all components in the extract simultaneously arrive at the ion source, receive ionization, and enter the Q1 cell to undergo mass spectrometric separations. Because of the selectivity ability of Q1 cell for MS/MSALL ® program, only a small portion (1 Da mass window) of MS1 ion cohort is allowed to enter the collision chamber at one time to undergo collision with neutral gas to generate a set of fragment ion species that are afterwards transmitted into Tof-chamber (scan rate as 7.5 MS/MS sans per second [26]) to yield MS2 spectrum within approximately 0.13 s. Therefore, the detected fragment ion species are soured from the portion of ion current, and the data file is output as nominal MS1-MS2 style. In most cases, only a single signal is observed fortunately, in each unit mass window.
Shotgun chemome characterization of Artemisia rupestris L. Using direct infusion-MS/MS<sup>ALL</sup>
2021, Journal of Chromatography B: Analytical Technologies in the Biomedical and Life SciencesCitation Excerpt :Moreover, either data-dependent acquisition (DDA) [11] or data-independent acquisition (DIA) [12] mode can conveniently record the desired MS2 spectra under such situation, although the amount of MS2 spectra is usually limited by the scan rate of hybrid quadrupole-time of flight-MS (Qtof-MS) at DDA mode and annoying signal assignment task is always initiated by dissociating all concurrent precursor ions when applying DIA algorithm. In recent decades, great achievements have been reached in terms of the resolution (as great as 140 000 FWHM [13]) as well as the scan rate (as fast as 8.4 MS/MS scans per second [14]) for Qtof-MS, resulting in promising advantages in regards of selectivity and specificity. The so-called mass spectrometric separation indeed enables the discrimination of all components, except isomers.
Open search algorithms discover patterns of chemical modifications via LC-MS/MS
2021, Advances in Chemical ProteomicsPTM-shepherd: Analysis and summarization of post-translational and chemical modifications from open search results
2021, Molecular and Cellular ProteomicsCitation Excerpt :FFPE samples are also typically analyzed after long-term storage, during which they could be exposed to high temperatures and sunlight (25). Although previous studies have examined which modifications should be included when analyzing proteins from FFPE samples (25), this was revisited recently by Tabb et al. (16) using a two-pass search. First, an open search was used to identify prevalent mass shifts.