An open invitation to the Understudied Proteins Initiative

Kustatscher, Georg; Collins, Tom; Gingras, Anne-Claude; Guo, Tiannan; Hermjakob, Henning; Ideker, Trey; Lilley, Kathryn S.; Lundberg, Emma; Marcotte, Edward M.; Ralser, Markus; Rappsilber, Juri

doi:10.1038/s41587-022-01316-z

Download PDF

Correspondence
Published: 09 May 2022

An open invitation to the Understudied Proteins Initiative

Nature Biotechnology volume 40, pages 815–817 (2022)Cite this article

15k Accesses
18 Citations
140 Altmetric
Metrics details

Subjects

To the Editor — Much of life science research revolves around understanding the biological function of proteins. Some proteins, such as the tumor suppressor p53, have been studied extensively¹. By contrast, thousands of human proteins remain ‘understudied’: their biological function is poorly understood and annotation of their molecular properties is scarce^2,3,4,5,6. However, without a minimal amount of molecular annotation, it is difficult to formulate effective research questions and design experiments to investigate the function of these proteins in mechanistic detail².

The disparity in how much we know about individual proteins leads to a phenomenon known as the ‘streetlight effect’ or the ‘rich-get-richer syndrome’, in which research in a field preferentially targets proteins that are already well-studied⁷. There are many reasons for this, including practical considerations (for example, the abundance, solubility and size of a protein), the ease of designing a research plan that depends on available knowledge (for example, knockout phenotype, molecular interactions) and the availability of tools such as antibodies. In addition, working on proteins that already receive a lot of attention (for example, some disease-associated proteins) increases the chances of high-impact publications and funding. Hypothesis-driven (rather than question-driven) research may also contribute, as hypothesizing about the potential function of a completely uncharacterized protein is nearly impossible. Finally, some proteins may remain understudied because they are not expressed or required in standard laboratory conditions. Ironically, some of this problem is caused by the global desire to make research more reproducible through the standardization of experimental conditions.

One counter-argument is that the important proteins are being studied and the others are not as important to pursue. The evidence suggests otherwise: genome-wide studies show that research attention bias does not reflect the importance of genes for cellular processes and human disease^2,5. For example, more than half of the host genes implicated in COVID-19 identified by genome-wide studies have not been pursued in more detail by targeted studies of the COVID-19 field⁸. Furthermore, the creation of a synthetic minimal bacterium required 149 proteins of unknown function⁹. If these proteins are crucial for the most-minimal cell possible to survive, they should be important to us.

As current approaches to study proteins often reinforce the streetlight effect, we seek to pursue a different approach. We propose that a coordinated effort of the functional proteomics field could be an effective way to systematically advance the basic molecular characterization of understudied proteins, such that detailed studies become more feasible. With the goal of openly discussing, coordinating and initiating efforts to address these challenges, we established the Understudied Proteins Initiative¹⁰, with participation of the Wellcome Trust (Fig. 1). In essence, for each understudied protein, we aim to provide enough molecular information (for example, protein interactions, colocalization or coexpression) that hypotheses about its putative function can be made. Importantly, this should make it clear which field or laboratory with a particular research focus would be best placed to carry out further detailed studies of the protein. Thus, the giant task of characterizing the many understudied proteins is split into two parts: a large-scale precharacterization by omics laboratories, and a focused detailed investigation by molecular biology laboratories.

**Fig. 1: Roadmap of the Understudied Proteins Initiative.**

Choosing the right tools and experiments for such a large-scale data-generation effort requires critical input before data collection begins. As a first step, we have recently launched an openly accessible survey to allow us to better understand which human proteins remain understudied, what the minimal information is that would kick-start their inclusion in mechanistic investigations and where this information should be available (https://understudiedproteins.org/survey). Scientists who engage in mechanistic investigations are best placed to define this.

As a second step, we will then gather experimentalists and computational experts interested in large-scale approaches at a conference (https://understudiedproteins.org/conference) to discuss and identify ways to deliver this information. Ultimately, individual researchers stand to gain from the results of this initiative whenever they face new proteins in an ongoing study and need to prioritize novel targets for further investigation.

Survey participants will be shown a randomly selected human protein and asked to assign it to one of three annotation levels. In addition, they will declare which tools and resources were used for that assessment and what information they regard as important before starting experimental work with a new protein. We envision that respondents will need no more than five minutes per protein. Each protein will be presented to multiple participants, allowing us to average responses and capture the range of different interpretations and assessments of a protein’s annotation level. In this way, the survey will deliver a manually curated assessment of the annotation level of human proteins. Although scores exist that express various aspects of protein annotation^3,6,11,12, our survey will return a score that specifically expresses how amenable a protein is to detailed mechanistic investigations.

Next, we will cross-reference this vote-based annotation score with the quantifiable annotation information available for the same protein and its homologs in publicly available resources named by participants and others, which could include PubMed, STRING, BioGRID, UniProt, Gene Cards, Wikipedia, Complex Portal and the Human Protein Atlas. This collated information will reveal key characteristics of understudied proteins, such as what type of quantifiable experimental evidence is available or lacking, and where it is accessible. Notably, this understanding is not limited to human proteins and guides the extension of our efforts toward other species.

The free-text answers from survey respondents will allow us to cross-check whether our data-based assessment agrees with what participants think regarding the minimal information that makes a protein a viable target of study, and where and how annotation should be accessible. In addition, on the basis of the annotation score and the cross-referenced quantifiable annotation information, we will train a machine-learning algorithm to automate the annotation scoring. An automated annotation scoring system allows us to keep scores up-to-date, assess proteins of other species and transparently monitor progress in protein annotation over time. Therefore, if a sizeable proportion of the community who reads this Correspondence and the paper in Nature Methods¹⁰ participates in the survey and shares it with colleagues, then we will build a community-driven foundation for the Understudied Proteins Initiative.

With a clear understanding of what constitutes the experimental information that would make an understudied protein amenable to study, we will then start a discussion with funding agencies on how to set up calls aimed at providing this information. A critical component will be the evaluation of the effect of different information sources, facilitated by our automated annotation scoring. We will reveal the benefit of the respective datasets and approaches by monitoring the rate of annotation of understudied proteins. Measuring the effect of large-scale data will inform the effective use of funding, but also highlight where technology developments are needed to fill any systematic gaps left by current tools. Instead of lots of data, we aim to generate meaningful data. Eventually, thousands of laboratories around the world will be able to add those currently understudied proteins that fall into their own fields of interest to ongoing and future mechanistic investigations, thereby ending the era of understudied proteins. Our initiative complements those that have a strong emphasis either on bacterial proteins (COMBREX¹³ and the Enzyme Function Initiative¹⁴) or on protein–small molecule interactions, such as the Structural Genomics Consortium^5,15, Open Targets¹⁶ and the Illuminating the Druggable Genome program⁶, which aims to improve our understanding of uncharacterized proteins within the three most commonly drug-targeted protein families (G-protein-coupled receptors, ion channels and protein kinases).

By providing a basic molecular characterization of all proteins, the Understudied Proteins Initiative will catalyze mechanistic investigations of understudied proteins, drive new biomedical research, and boost our understanding of the human proteome and its role in disease. We invite the community to get involved by participating in the survey and spreading the word.

References

Dolgin, E. Nature 551, 427–431 (2017).
Article CAS Google Scholar
Haynes, W. A., Tomczak, A. & Khatri, P. Sci. Rep. 8, 1362 (2018).
Article Google Scholar
Wood, V. et al. Open Biol. 9, 180241 (2019).
Article CAS Google Scholar
Stoger, T., Gerlach, M., Morimoto, R. I. & Nunes Amaral, L. A. PloS Biol. 16, e2006643 (2018).
Article Google Scholar
Edwards, A. M. et al. Nature 470, 163–165 (2011).
Article CAS Google Scholar
Oprea, T. I. et al. Nat. Rev. Drug Discov. 17, 317–332 (2018).
Article CAS Google Scholar
Dunham, I. PLoS Biol. 16, e3000034 (2018).
Article Google Scholar
Stoeger, T. & Nunes Amaral, L. A. eLife 9, e61981 (2020).
Article Google Scholar
Hutchison, C. A. III et al. Science 351, aad6253 (2016).
Article Google Scholar
Kustatscher, G. Nat. Methods https://doi.org/10.1038/s41592-022-01454-x (2022).
Article PubMed Google Scholar
Sinha, S., Eisenhaber, B., Jensen, L. J., Kalbuaji, B. & Eisenhaber, F. Proteomics 18, e1800093 (2018).
Article Google Scholar
UniProt Consortium. Nucleic Acids Res. 47, D506–D515 (2019).
Article Google Scholar
Anton, B. P. et al. PLoS Biol. 11, e1001638 (2013).
Article CAS Google Scholar
Gerlt, J. A. et al. Biochemistry 50, 9950–9962 (2011).
Article CAS Google Scholar
Williamson, A. R. Nat. Struct. Biol. 7, 953 (2000).
Article CAS Google Scholar
Koscielny, G. et al. Nucleic Acids Res. 45, D985–D994 (2017).
Article CAS Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Quantitative Biology, Biochemistry and Biotechnology, University of Edinburgh, Edinburgh, UK
Georg Kustatscher & Juri Rappsilber
Wellcome Trust, London, UK
Tom Collins
Lunenfeld–Tanenbaum Research Institute, Mount Sinai Hospital, Sinai Health System, Toronto, Ontario, Canada
Anne-Claude Gingras
Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
Anne-Claude Gingras
Zhejiang Provincial Laboratory of Life Sciences and Biomedicine, Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, Hangzhou, China
Tiannan Guo
Institute of Basic Medical Sciences, Westlake Institute for Advanced Study, Hangzhou, China
Tiannan Guo
European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
Henning Hermjakob
Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA, USA
Trey Ideker
Cambridge Centre for Proteomics, Department of Biochemistry, University of Cambridge, Cambridge, UK
Kathryn S. Lilley
Science for Life Laboratory, School of Engineering Sciences in Chemistry, Biotechnology and Health, KTH-Royal Institute of Technology, Stockholm, Sweden
Emma Lundberg
Department of Bioengineering, Stanford University, Stanford, CA, USA
Emma Lundberg
Department of Pathology, Stanford University, Stanford, CA, USA
Emma Lundberg
Chan Zuckerberg Biohub, San Francisco, CA, USA
Emma Lundberg
Department of Molecular Biosciences, Center for Systems and Synthetic Biology, University of Texas at Austin, Austin, TX, USA
Edward M. Marcotte
Department of Biochemistry, Charité University Medicine, Berlin, Germany
Markus Ralser
The Molecular Biology of Metabolism Laboratory, The Francis Crick Institute, London, UK
Markus Ralser
Bioanalytics, Institute of Biotechnology, Technische Universität Berlin, Berlin, Germany
Juri Rappsilber
Wellcome Centre for Cell Biology, University of Edinburgh, Edinburgh, UK
Juri Rappsilber

Authors

Georg Kustatscher
View author publications
You can also search for this author in PubMed Google Scholar
Tom Collins
View author publications
You can also search for this author in PubMed Google Scholar
Anne-Claude Gingras
View author publications
You can also search for this author in PubMed Google Scholar
Tiannan Guo
View author publications
You can also search for this author in PubMed Google Scholar
Henning Hermjakob
View author publications
You can also search for this author in PubMed Google Scholar
Trey Ideker
View author publications
You can also search for this author in PubMed Google Scholar
Kathryn S. Lilley
View author publications
You can also search for this author in PubMed Google Scholar
Emma Lundberg
View author publications
You can also search for this author in PubMed Google Scholar
Edward M. Marcotte
View author publications
You can also search for this author in PubMed Google Scholar
Markus Ralser
View author publications
You can also search for this author in PubMed Google Scholar
Juri Rappsilber
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Georg Kustatscher or Juri Rappsilber.

Ethics declarations

Competing interests

T.G. is a shareholder of Westlake Omics Inc. T.I. is a cofounder of Data4Cure, is on the Scientific Advisory Board and has an equity interest. T.I. is on the Scientific Advisory Board of Ideaya BioSciences and has an equity interest. E.L. is advisor for Pixelgen technologies and Moleculent. E.M.M. is a cofounder, shareholder and scientific board member of Erisyon, Inc. G.K., T.C., A.-C.G., H.H., K.S.L., M.R. and J.R. declare no competing interests.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kustatscher, G., Collins, T., Gingras, AC. et al. An open invitation to the Understudied Proteins Initiative. Nat Biotechnol 40, 815–817 (2022). https://doi.org/10.1038/s41587-022-01316-z

Download citation

Published: 09 May 2022
Issue Date: June 2022
DOI: https://doi.org/10.1038/s41587-022-01316-z

This article is cited by

Did the early full genome sequencing of yeast boost gene function discovery?
- Erwin Tantoso
- Birgit Eisenhaber
- Frank Eisenhaber
Biology Direct (2023)
Co-option of a non-retroviral endogenous viral element in planthoppers
- Hai-Jian Huang
- Yi-Yuan Li
- Jun-Min Li
Nature Communications (2023)
Understudied proteins: opportunities and challenges for functional proteomics
- Georg Kustatscher
- Tom Collins
- Juri Rappsilber
Nature Methods (2022)

An open invitation to the Understudied Proteins Initiative

Subjects

References

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Competing interests

Rights and permissions

About this article

Cite this article

This article is cited by

Did the early full genome sequencing of yeast boost gene function discovery?

Co-option of a non-retroviral endogenous viral element in planthoppers

Understudied proteins: opportunities and challenges for functional proteomics

Understudied proteins: opportunities and challenges for functional proteomics

Search

Quick links

Subjects

References

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Competing interests

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Did the early full genome sequencing of yeast boost gene function discovery?

Co-option of a non-retroviral endogenous viral element in planthoppers

Understudied proteins: opportunities and challenges for functional proteomics

Search

Quick links