Open search unveils modification patterns in formalin-fixed, paraffin-embedded thermo HCD and SCIEX TripleTOF shotgun proteomes

doi:10.1016/j.ijms.2019.116266

International Journal of Mass Spectrometry

Volume 448, February 2020, 116266

https://doi.org/10.1016/j.ijms.2019.116266 Get rights and content

Highlights

•
We offer a concrete list of five mass shifts to include when identifying proteins from FFPE tissues by shotgun proteomics.
•
We offer a variety of methods that may be used to detect excessive false discovery rates in open search algorithms.
•
We provide workarounds for problems likely to vex SCIEX TripleTOF users when attempting to use open-source software.

Abstract

The application of database search algorithms with very wide precursor mass tolerances for the “Open Search” paradigm has brought new efforts at post-translational modification discovery in shotgun proteomes. This approach has motivated the acceleration of database search tools by incorporating fragment indexing features. In this report, we compare open searches and sequence tag searches of high-resolution tandem mass spectra to seek a common “palette” of modifications when analyzing multiple formalin-fixed, paraffin-embedded (FFPE) tissues from Thermo Q-Exactive and SCIEX TripleTOF instruments. While open search in MSFragger produced some gains in identified spectra, careful FDR control limited the best result to 24% more spectra than narrow search (worst result: a loss of 9%). Open pFind produced high apparent sensitivity for PSMs, but entrapment sequences hinted that the actual error rate may be higher than reported by the software. Combining sequence tagging, open search, and chemical knowledge, we converged on this set of PTMs for our four FFPE sets: mono- and di-methylation (nTerm and Lys), single and double oxidation (Met and Pro), and variable carbamidomethylation (nTerm and Cys).

Graphical abstract

Introduction

A confluence of technologies in the early 2000s made formalin-fixed, paraffin-embedded (FFPE) tissues a viable resource for biomarker discovery. Rapidly scanning LTQs (linear ion traps) became available in 2003 [1]. LC-MS/MS for protein identification matured considerably as a discipline as Molecular and Cellular Proteomics produced its initial publication guidelines in 2004 [2]. Fractionation strategies for digging deeply into complex mixtures became commonplace as methods such as GeLC-MS (2003) [3] and MudPIT (1999) [4] became widespread. Greater cell type specificity became possible through laser capture microdissection (1999) [5]. Bernard Metz and coworkers began untangling the complex chemistry of FFPE through chemical treatment of synthetic peptides in 2004 [6]. In 2005, Brian Hood and colleagues produced some of the first LC-MS/MS inventories of FFPE tissues [7].

Hood’s early experiment combined laser capture microdissection and gas-phase fractionation with a Thermo Scientific LTQ to identify approximately 2000 distinct peptides from each sample. Their work revealed key themes that would dominate FFPE proteomics for years to come. First, many peptides released from FFPE samples contained “missed cleavages,” internal basic residues that increased their precursor charges; high resolution of precursor ions and eventually fragment ions would make these peptides more accessible in years to come. Second, FFPE peptides less frequently featured lysine residues at their C-termini, hinting at the role of primary amines in FFPE modification chemistry. Hood et al. gamely identified formyl adducts (+12 Da) of lysines and noted that more than half of observed Met residues were oxidized in some of their LC-MS/MS experiments. The broader palette of modifications, however, awaited more powerful informatic tools.

Identifying post-translational modifications under the 1995 Sequest model [8] requires a foreknowledge of all PTM masses along with the residues where they may be observed. Because the algorithm must consider an exponentially growing number of decorated peptides as multiple potential modification sites are found in each peptide sequence, PTM searches have gained a reputation for prohibitive slowness as well as a penchant for false positives [9]. Bioinformatics researchers have produced a variety of alternative strategies for the discovery of PTM masses, particularly sequence tagging and spectral alignment. The first of these techniques attempts to infer partial sequences directly from fragment ions and then allow for arbitrary mass shifts to be introduced to either side of this “tag” to reconcile the spectrum to a database peptide sequence [10]. Spectral alignment approaches such as QuickMod [11] or spectral networks [12] compare experimental MS/MS scans from different precursor masses to determine if fragment ions align with or without adding the precursor mass difference as a PTM. The approaches were generally recommended as a strategy for determining which PTMs should be included in a final database search rather than being seen as authoritative identifications of themselves [13]. In 2015, Ying Zhang and colleagues used spectral alignment for PTM discovery in FFPE to discover the prominence of lysine methylation in FFPE LC-MS/MS data [14].

Joel Chick of the Gygi Laboratory published the open search strategy in 2015 [15] offering the attractive promise of identifying 28% more spectra at similar FDRs by simply relaxing the precursor mass filter in Sequest (invoking a “∼10-fold” time penalty). Established proteome informatics teams at the University of Michigan and the Beijing Chinese Academy of Sciences created new search engines (MSFragger [16]) or modified existing ones (Open pFind [17]) to greatly accelerate the database search algorithm to enable routine open searching. A key enabling development was the use of indexing to determine which experimental spectra matched a calculated fragment ion mass, irrespective of precursor mass, a strategy first published in 2007 for Marshall Bern’s ByOnic algorithm [18]. Kong et al. achieved similar gains in identified spectra to those of the Gygi team through MSFragger, reporting 33.6% more spectra identified with a much lower time penalty. Notably, the MSFragger paper incorporated a single paragraph relating the use of the software to detect methylol adducts (+30 Da) in 442 LC-MS/MS experiments from FFPE breast cancer data (Kong et al. remained mute, however, concerning the frequency of lysine methylation in these data.). When Hao Chi and colleagues adapted pFind for open search by adding fragment indexing, precursor analysis, sequence tagging database reduction, and machine learning discrimination, they were able to profoundly boost identification rates over their earlier “narrow” searches in pFind, gaining between 61.2% and 88.2% PSMs [17].

To mainstream proteomics users, such gains in identified PSMs might have seemed the answer to long-standing frustrations surrounding the low fraction of MS/MS scans that we identify from experiments (50% of MS/MS scans measured at high resolution is currently thought to be a good result). Long experience in bioinformatics, however, caused our team to feel some skepticism at these claims, and we designed this study to evaluate the gains achievable through open search algorithms. We chose to emphasize FFPE experiments because of their complex and long-studied modification chemistry, and we included both Thermo high-resolution “HCD” [19] RAW files and SCIEX high-resolution “beam-type CID” WIFF files from the TripleTOF [20], since the latter have seen only minimal attention in open search articles to date. Would open search substantially improve the fraction of spectra we identified? Would its discovered palette of PTMs expand greatly upon those visible through sequence tagging?

Section snippets

Results and discussion

The open search approach was tested in the context of four FFPE data sets; “Zimmerman” and “Nair” were generated by Thermo Q-Exactive instruments, while “Nielsen” and “Buthelezi” were produced on SCIEX TripleTOF instruments. Zimmerman includes five technical LC-MS/MS replicates of a human colon tumor and was initially created to support a tutorial workshop. Nair encompasses five Group 3 and five Group 4 medulloblastomas from banked specimens, each analyzed by LC-MS/MS [21]. Nielsen investigated

Data sets

“Nielsen” data were acquired from ProteomeXchange accession PXD000743 [22]. Twenty experiments in WIFF format were generated on a SCIEX TripleTOF 5600+ at Aarhus University in Aarhus, Denmark, during March of 2014. Four LC-MS/MS experiments were conducted from each of five patients (indicated by the numbers 36, 38, 39, 40, and 41), with ‘A’ files representing amyloid tissue and ‘B’ files representing control tissue. “-2” and “-3” suffixes represented two technical replicates from each sample.

Conclusion

As should be apparent from the Nielsen and Buthelezi results, open search is just as applicable to TripleTOF shotgun experiments as it is for Q-Exactive and other Thermo shotgun experiments. The readiness of software pipelines for this purpose, however, could certainly be improved. The SCIEX MS Data Converter remains in beta since 2012, and yet it represents the only way to produce high-quality ProteinPilot peak lists. ProteoWizard msConvert can produce Analyst centroid peak lists, but it

Author contributions

B.M. managed early data conversion steps for SCIEX WIFF files and visualized key statistics. J.O. contributed feedback on informatics pipeline design. O.N. and J.M.B. provided previously unpublished FFPE data. S.B. and S.S. provided previously unpublished FFPE data.

Declaration of competing interest

The authors declare no conflict of interest.

Acknowledgment

D.L.T. thanks the Laboratory of Alexey Nesvizhskii for discerning problems in the initial configurations of the MSFragger algorithm. The assistance of Hao Chi and Xin-Yi Xu was essential in pairing the Open pFind software with TripleTOF data. The authors are grateful to Nadia Sukusu Nielsen and Jan J. Enghild for making their WIFFs available as PXD000743, and we appreciate Lisa J. Zimmerman for making her FFPE tutorial data available as PXD001651. This work was supported by Research

References (38)

J.C. Schwartz et al.
A two-dimensional quadrupole ion trap mass spectrometer
J. Am. Soc. Mass Spectrom.
(2002)
S. Carr et al.
Working group on publication guidelines for peptide and protein identification data, the need for guidelines in publication of peptide and protein identification data: working group on publication guidelines for peptide and protein identification data, mol
Cell. Proteom.
(2004)
M. Schirle et al.
Profiling core proteomes of human cell lines by one-dimensional PAGE and liquid chromatography-tandem mass spectrometry
Mol. Cell. Proteom.
(2003)
B. Metz et al.
Identification of formaldehyde-induced modifications in proteins: reactions with model peptides
J. Biol. Chem.
(2004)
B.L. Hood et al.
Proteomic analysis of formalin-fixed prostate cancer tissue
Mol. Cell. Proteom.
(2005)
A.I. Nesvizhskii et al.
Dynamic spectrum quality assessment and iterative computational analysis of shotgun proteomic data: toward more efficient identification of post-translational modifications, sequence polymorphisms, and novel peptides
Mol. Cell. Proteom.
(2006)
R.W. Sprung et al.
Equivalence of protein inventories obtained from formalin-fixed paraffin-embedded and frozen tissue in multidimensional liquid chromatography-tandem mass spectrometry shotgun proteomic analysis
Mol. Cell. Proteom.
(2009)
D.L. Tabb
Quality assessment for clinical proteomics
Clin. Biochem.
(2013)
A.J. Link et al.
Direct analysis of protein complexes using mass spectrometry
Nat. Biotechnol.
(1999)
R.E. Banks et al.
The potential use of laser capture microdissection to selectively obtain distinct populations of cells for proteomic analysis--preliminary findings
Electrophoresis
(1999)

J.R. Yates et al.

Method to correlate tandem mass spectra of modified peptides to amino acid sequences in the protein database

Anal. Chem.

(1995)

S. Dasari et al.

Sequence tagging reveals unexpected modifications in toxicoproteomics

Chem. Res. Toxicol.

(2011)

E. Ahrné et al.

QuickMod: a tool for open modification spectrum library searches

J. Proteome Res.

(2011)

N. Bandeira et al.

Protein identification by spectral networks analysis

Proc. Natl. Acad. Sci. U.S.A.

(2007)

J.D. Holman et al.

Informatics of protein and posttranslational modification detection via shotgun proteomics

Methods Mol. Biol.

(2013)

Y. Zhang et al.

Unrestricted modification search reveals lysine methylation as major modification induced by tissue formalin fixation and paraffin embedding

Proteomics

(2015)

J.M. Chick et al.

A mass-tolerant database search identifies a large proportion of unassigned spectra in shotgun proteomics as modified peptides

Nat. Biotechnol.

(2015)

A.T. Kong et al.

MSFragger: ultrafast and comprehensive peptide identification in mass spectrometry-based proteomics

Nat. Methods

(2017)

H. Chi et al.

Comprehensive identification of peptides in tandem mass spectra using an efficient open search engine

Nat. Biotechnol.

(2018)

Cited by (8)

Concentrated ionic liquids for proteomics: Caveat emptor!
2023, International Journal of Biological Macromolecules
The use of concentrated ionic liquids (ILs) in the bioanalytical chemistry of proteins is sparse; typically, dilute aqueous IL solutions are used. Concentrated ILs have unique properties that may allow researchers to dissolve previously insoluble protein analytes, to increase the depth and robustness of sample preparation and the analysis of proteins. Previous research using concentrated ILs for this purpose is sparse and there is a need to systematically investigate the structure-activity relationship between the IL structure and its capacity to solubilise proteins. Here, bovine serum albumin was dissolved in various ionic liquids and monitored over time by light microscopy and SDS-PAGE. While qualitative, these measures provide a good estimate of, respectively, the dissolving power of an IL towards the given protein and the retained integrity of the protein. Hydrophilic ILs show the best solubilisation capacity and higher temperatures (in a restricted sense) improve the solubility of the protein. Higher temperatures and longer reaction times reduce the molecular weight of the protein, which could inhibit their applicability in proteomics, unless the conditions are judiciously controlled. Researchers should exercise caution when using concentrated ILs for protein analysis until the full scope and limitations are known, an aspect we are presently investigating.
Comparison of different digestion methods for proteomic analysis of isolated cells and FFPE tissue samples
2021, Talanta
Citation Excerpt :
Overall, as the most abundant modification authors identified +14 Da modification on lysine and N-term referring to methylation. On the other hand, +30 Da lysine modification in Tabb et al. dataset fell out TOP 10 PTMs probably due to wider range of investigated modifications [60]. Moreover, our data are in agreement with previously published work by Zhang et al. and Coscia et al. who reported increased on lysine methylation in FFPE tissue in terms of spectral counts and XIC-based label free quantitation [59,61].
Proteomics of human tissues and isolated cellular subpopulations create new opportunities for therapy and monitoring of a patients’ treatment in the clinic. Important considerations in such analysis include recovery of adequate amounts of protein for analysis and reproducibility in sample collection. In this study we compared several protocols for proteomic sample preparation: i) filter-aided sample preparation (FASP), ii) in-solution digestion (ISD) and iii) a pressure-assisted digestion (PCT) method. PCT method is known for already a decade [1], however it is not widely used in proteomic research. We assessed protocols for proteome profiling of isolated immune cell subsets and formalin-fixed paraffin embedded (FFPE) tissue samples. Our results show that the ISD method has very good efficiency of protein and peptide identification from the whole proteome, while the FASP method is particularly effective in identification of membrane proteins. Pressure-assisted digestion methods generally provide lower numbers of protein/peptide identifications, but have gained in popularity due to their shorter digestion time making them considerably faster than for ISD or FASP. Furthermore, PCT does not result in substantial sample loss when applied to samples of 50 000 cells. Analysis of FFPE tissues shows comparable results. ISD method similarly yields the highest number of identifications. Furthermore, proteins isolated from FFPE samples show a significant reduction of cleavages at lysine sites due to chemical modifications with formaldehyde-such as methylation (+14 Da) being among the most common. The data we present will be helpful for making decisions about the robust preparation of clinical samples for biomarker discovery and studies on pathomechanisms of various diseases.
Direct infusion–tandem mass spectrometry combining with data mining strategies enables rapid chemome characterization of medicinal plants: A case study of Polygala tenuifolia
2021, Journal of Pharmaceutical and Biomedical Analysis
Citation Excerpt :
When the replacement of LC with GPF theory, all components in the extract simultaneously arrive at the ion source, receive ionization, and enter the Q1 cell to undergo mass spectrometric separations. Because of the selectivity ability of Q1 cell for MS/MSALL ® program, only a small portion (1 Da mass window) of MS1 ion cohort is allowed to enter the collision chamber at one time to undergo collision with neutral gas to generate a set of fragment ion species that are afterwards transmitted into Tof-chamber (scan rate as 7.5 MS/MS sans per second [26]) to yield MS2 spectrum within approximately 0.13 s. Therefore, the detected fragment ion species are soured from the portion of ion current, and the data file is output as nominal MS1-MS2 style. In most cases, only a single signal is observed fortunately, in each unit mass window.
Data-independent MS² spectrum acquisition after fragmenting the precursor ion cohort with 1 Da bin, termed as MS/MS^{ALL ®}, offers an opportunity to achieve rapid chemome characterization when being coupled with direct infusion (DI). Some post-acquisition data processing strategies, such as mass defect filtering (MDF), diagnostic fragment ion filtering (DFIF), and neutral loss filtering (NLF), facilitate data extraction from massive dataset, and moreover, molecular weight (MW) imprinting allows rapid capturing those reported components. Here, DI–MS/MS^{ALL ®} was employed to acquire cubic spectral dataset, and the strategies such as MW imprinting, MDF, DFIF, and NLF, were subsequently applied to filter the structural information. The integrated pipeline was utilized for the chemome characterization of Polygala tenuifolia, a famous edible medicinal plant. To aid information filtering, an in-house chemical library was built by comprehensively collecting structural information from some available databases. A single analytical run was completed within 5 min. For MS¹ spectrum processing, MW imprinting was firstly applied to capture the compounds in the chemical library, and “five-point” MDF frames were employed to pursue saponins, oligosaccharide esters, and xanthones. Regarding MS² spectral plot, DFIF and NLF were deployed to search information-of-interest. Structural identification was accomplished by carefully correlating precursor ions and MS² spectra, applying the well-defined mass cracking rules, and referring to literature information as well as available databases. A total of 109 compounds, mainly saponins (40 ones), oligosaccharide esters (29 ones), and xanthones (19 ones), were captured and structurally annotated. MS¹ spectra were also implemented for chemome comparison between Polygala tenuifolia and several similar plants belonging to Polygala genus, resulting in the observation of significant inter- and intra-species differences. Above all, DI–MS/MS^{ALL ®} is a promising choice for high-throughput chemome profiling of, but not limited to, medicinal plants, in particular when being integrated with post-acquisition data processing strategies.
Shotgun chemome characterization of Artemisia rupestris L. Using direct infusion-MS/MS<sup>ALL</sup>
2021, Journal of Chromatography B: Analytical Technologies in the Biomedical and Life Sciences
Citation Excerpt :
Moreover, either data-dependent acquisition (DDA) [11] or data-independent acquisition (DIA) [12] mode can conveniently record the desired MS2 spectra under such situation, although the amount of MS2 spectra is usually limited by the scan rate of hybrid quadrupole-time of flight-MS (Qtof-MS) at DDA mode and annoying signal assignment task is always initiated by dissociating all concurrent precursor ions when applying DIA algorithm. In recent decades, great achievements have been reached in terms of the resolution (as great as 140 000 FWHM [13]) as well as the scan rate (as fast as 8.4 MS/MS scans per second [14]) for Qtof-MS, resulting in promising advantages in regards of selectivity and specificity. The so-called mass spectrometric separation indeed enables the discrimination of all components, except isomers.
In comparison of liquid chromatography, direct infusion is a superior choice to achieve high-throughput measurements. The specificity and selectivity of tandem mass spectrometry (MS/MS) actually result in a so-called MS separation potential when chemical characterization of herbal medicines. Here, a MS/MS^ALL program was introduced to promote DI–MS/MS to be an eligible tool for shotgun chemome characterization of Artemisia rupestris L. that is currently drawing worldwide interests because of the promising antiviral activity. After MS¹ spectral acquisition for the crude extract, the gas phase fractionation concept enabled the precursor ion cohort sequentially entered the collision cell with a stepped unit mass window (step-size as 1 Da) to generate MS² spectra, thus generating a unique property integrating the advantages of both data-dependent and data-independent acquisition manners. Even though being free of chromatographic separation, spectrometric separations were accomplished for by MS/MS^ALL program unless the components shared identical nominal molecular weights. Extensive efforts such as the correlations of MS¹ signals with MS² spectra, structural annotations of fragment ion species, information retrieval in some accessible databases, and referring to the literature data, were devoted for chemical characterization, and as a result, 44 compounds, in total, were structurally identified from 50% aqueous methanol exact of A. rupestris, including 8 caffeoyl quinic acid derivatives, 13 flavonoids, 15 monomeric and dimeric sesquiterpenoids, 4 fatty acids, 2 penylpropanoids, along with 2 other compounds. However, isomers were assigned as an isomeric mixture because their precursor ions always co-existed in a single mass window. Above all, DI–MS/MS^ALL provides an alternative tool for chemome characterization of herbal medicines, in particular when the great measurement workload for a large sample cohort, attributing to the high-throughput advantage.
Open search algorithms discover patterns of chemical modifications via LC-MS/MS
2021, Advances in Chemical Proteomics
The introduction of “open search” algorithms has enabled many researchers to discern which biological posttranslational modifications are hiding in their LC-MS/MS data. For chemical biologists, these tools are even more essential, since unanticipated side reactions and redox chemistry can dramatically expand the diversity of mass shifts present in these experiments. This guide walks through the steps necessary to employ the MSFragger and Philosopher tool sets and, more importantly, it assists readers in interpreting the results from open search for subsequent database search in MS-GF+. Four different chemical biology investigations are profiled from SCIEX TripleTOF and Thermo Orbitrap-class instruments: an investigation of organophosphate pesticides by Bui-Nguyen et al., an examination of alkylating reagents by Hains and Robinson, a model of acrylamide neurotoxicity by Prats et al., and a test of formalin fixation and paraffin embedding by Buthelezi et al.
PTM-shepherd: Analysis and summarization of post-translational and chemical modifications from open search results
2021, Molecular and Cellular Proteomics
Citation Excerpt :
FFPE samples are also typically analyzed after long-term storage, during which they could be exposed to high temperatures and sunlight (25). Although previous studies have examined which modifications should be included when analyzing proteins from FFPE samples (25), this was revisited recently by Tabb et al. (16) using a two-pass search. First, an open search was used to identify prevalent mass shifts.
Open searching has proven to be an effective strategy for identifying both known and unknown modifications in shotgun proteomics experiments. Rather than being limited to a small set of user-specified modifications, open searches identify peptides with any mass shift that may correspond to a single modification or a combination of several modifications. Here we present PTM-Shepherd, a bioinformatics tool that automates characterization of post-translational modification profiles detected in open searches based on attributes, such as amino acid localization, fragmentation spectra similarity, retention time shifts, and relative modification rates. PTM-Shepherd can also perform multiexperiment comparisons for studying changes in modification profiles, e.g., in data generated in different laboratories or under different conditions. We demonstrate how PTM-Shepherd improves the analysis of data from formalin-fixed and paraffin-embedded samples, detects extreme underalkylation of cysteine in some data sets, discovers an artifactual modification introduced during peptide synthesis, and uncovers site-specific biases in sample preparation artifacts in a multicenter proteomics profiling study.

View all citing articles on Scopus

View full text

Open search unveils modification patterns in formalin-fixed, paraffin-embedded thermo HCD and SCIEX TripleTOF shotgun proteomes

Highlights

Abstract

Graphical abstract

Introduction

Section snippets

Results and discussion

Data sets

Conclusion

Author contributions

Declaration of competing interest

Acknowledgment

J. Am. Soc. Mass Spectrom.

Cell. Proteom.

Mol. Cell. Proteom.

J. Biol. Chem.

Mol. Cell. Proteom.

Mol. Cell. Proteom.

Mol. Cell. Proteom.

Clin. Biochem.

Direct analysis of protein complexes using mass spectrometry

Nat. Biotechnol.

The potential use of laser capture microdissection to selectively obtain distinct populations of cells for proteomic analysis--preliminary findings

Electrophoresis

Method to correlate tandem mass spectra of modified peptides to amino acid sequences in the protein database

Anal. Chem.

Sequence tagging reveals unexpected modifications in toxicoproteomics

Chem. Res. Toxicol.

QuickMod: a tool for open modification spectrum library searches

J. Proteome Res.

Protein identification by spectral networks analysis

Proc. Natl. Acad. Sci. U.S.A.

Informatics of protein and posttranslational modification detection via shotgun proteomics

Methods Mol. Biol.

Unrestricted modification search reveals lysine methylation as major modification induced by tissue formalin fixation and paraffin embedding

Proteomics

A mass-tolerant database search identifies a large proportion of unassigned spectra in shotgun proteomics as modified peptides

Nat. Biotechnol.

MSFragger: ultrafast and comprehensive peptide identification in mass spectrometry-based proteomics

Nat. Methods

Comprehensive identification of peptides in tandem mass spectra using an efficient open search engine

Nat. Biotechnol.