Functionathon: a manual data mining workflow to generate functional hypotheses for uncharacterized human proteins and its application by undergraduate students

PubMed

2.

Paik

Y.-K.

,

Lane

L.

,

Kawamura

T.

et al. (

2018

)

Launching the C-HPP pilot project for functional characterization of identified proteins with no known function

.

J. Proteome Res.

,

17

,

4042

–

4050

.

3.

Duek

P.

,

Gateau

A.

,

Bairoch

A.

et al. (

2018

)

Exploring the uncharacterized human proteome using neXtProt

.

J. Proteome Res.

,

17

,

4211

–

4226

.

4.

Duek

P.

and

Lane

L.

(

2019

)

Worming into the uncharacterized human proteome

.

J. Proteome Res.

,

18

,

4143

–

4153

.

5.

Vandenbrouck

Y.

,

Pineau

C.

and

Lane

L.

(

2020

)

The functionally unannotated proteome of human male tissues: a shared resource to uncover new protein functions associated with reproductive biology

.

J. Proteome Res.

,

19

,

4782

–

4794

.

6.

Auchincloss

L.C.

,

Laursen

S.L.

,

Branchaw

J.L.

et al. (

2014

)

Assessment of course-based undergraduate research experiences: a meeting report

.

CBE Life Sci. Educ.

,

13

,

29

–

40

.

7.

Pope

W.H.

,

Bowman

C.A.

,

Russell

D.A.

et al. (

2015

)

Whole genome comparison of a large collection of mycobacteriophages reveals a continuum of phage genetic diversity

.

Elife

,

4

, e06416.

8.

Ramsey

J.

,

McIntosh

B.

,

Renfro

D.

et al. (

2021

)

Crowdsourcing biocuration: the Community Assessment of Community Annotation with Ontologies (CACAO)

.

bioRxiv

, 2021.04.30.440339.

9.

Bowling

B.V.

,

Schultheis

P.J.

and

Strome

E.D.

(

2016

)

Implementation and assessment of a yeast orphan gene research project: involving undergraduates in authentic research experiences and progressing our understanding of uncharacterized open reading frames

.

Yeast

,

33

,

43

–

53

.

10.

The UniProt Consortium

. (

2017

)

UniProt: the universal protein knowledgebase

.

Nucleic Acids Res.

,

45

,

D158

–

D169

.

PubMed

11.

Schreiber

F.

,

Patricio

M.

,

Muffato

M.

et al. (

2014

)

TreeFam v9: a new website, more species and orthology-on-the-fly

.

Nucleic Acids Res.

,

42

,

D922

–

D925

.

12.

Huerta-Cepas

J.

,

Szklarczyk

D.

,

Heller

D.

et al. (

2019

)

EggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses

.

Nucleic Acids Res.

,

47

,

D309

–

D314

.

13.

Altenhoff

A.M.

,

Levy

J.

,

Zarowiecki

M.

et al. (

2019

)

OMA standalone: orthology inference among public and custom genomes and transcriptomes

.

Genome Res.

,

29

,

1152

–

1163

.

14.

Zimmermann

L.

,

Stephens

A.

,

Nam

S.Z.

et al. (

2018

)

A completely reimplemented MPI bioinformatics toolkit with a new HHpred server at its core

.

J. Mol. Biol.

,

430

,

2237

–

2243

.

15.

Waterhouse

A.

,

Bertoni

M.

,

Bienert

S.

et al. (

2018

)

SWISS-MODEL: homology modelling of protein structures and complexes

.

Nucleic Acids Res.

,

46

,

W296

–

W303

.

16.

Madeira

F.

,

Park

Y.M.

,

Lee

J.

et al. (

2019

)

The EMBL-EBI search and sequence analysis tools APIs in2019

.

Nucleic Acids Res.

,

47

,

W636

–

W641

.

17.

Mitchell

A.L.

,

Attwood

T.K.

,

Babbitt

P.C.

et al. (

2019

)

InterPro in 2019: improving coverage, classification and access to protein sequence annotations

.

Nucleic Acids Res.

,

47

,

D351

–

D360

.

18.

Horton

P.

,

Park

K.J.

,

Obayashi

T.

et al. (

2007

)

WoLF PSORT: protein localization predictor

.

Nucleic Acids Res.

,

35

,

W585

–

W587

.

19.

Almagro Armenteros

J.J.

,

Sønderby

C.K.

,

Sønderby

S.K.

et al. (

2017

)

DeepLoc: prediction of protein subcellular localization using deep learning

.

Bioinformatics

,

33

,

3387

–

3395

.

20.

Krogh

A.

,

Larsson

B.

,

Von Heijne

G.

et al. (

2001

)

Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes

.

J. Mol. Biol.

,

305

,

567

–

580

.

21.

Käll

L.

,

Krogh

A.

and

Sonnhammer

E.L.L.

(

2004

)

A combined transmembrane topology and signal peptide prediction method

.

J. Mol. Biol.

,

338

,

1027

–

1036

.

22.

Armenteros

J.J.A.

,

Salvatore

M.

,

Emanuelsson

O.

et al. (

2019

)

Detecting sequence signals in targeting peptides using deep learning

.

Life Sci. Alliance

,

2

, e201900429.

23.

Bannai

H.

,

Tamada

Y.

,

Maruyama

O.

et al. (

2002

)

Extensive feature detection of N-terminal protein sorting signals

.

Bioinformatics

,

18

,

298

–

305

.

24.

Almagro Armenteros

J.J.

,

Tsirigos

K.D.

,

Sønderby

C.K.

et al. (

2019

)

SignalP 5.0 improves signal peptide predictions using deep neural networks

.

Nat. Biotechnol.

,

37

,

420

–

423

.

25.

Claros

M.G.

and

Vincens

P.

(

1996

)

Computational method to predict mitochondrially imported proteins and their targeting sequences

.

Eur. J. Biochem.

,

241

,

779

–

786

.

26.

Bendtsen

J.D.

,

Jensen

L.J.

,

Blom

N.

et al. (

2004

)

Feature-based prediction of non-classical and leaderless protein secretion

.

Protein Eng. Des. Sel.

,

17

,

349

–

356

.

27.

Nguyen Ba

A.N.

,

Pogoutse

A.

,

Provart

N.

et al. (

2009

)

NLStradamus: a simple hidden Markov model for nuclear localization signal prediction

.

BMC Bioinform.

,

10

, 202.

28.

Lin

J.R

and

Hu

J.

(

2013

)

SeqNLS: nuclear localization signal prediction based on frequent pattern mining and linear motif scoring

.

PLoS One

8

, e76864.

29.

La Cour

T.

,

Kiemer

L.

,

Mølgaard

A.

et al. (

2004

)

Analysis and prediction of leucine-rich nuclear export signals

.

Protein Eng. Des. Sel.

,

17

,

527

–

536

.

30.

Xu

D.

,

Marquis

K.

,

Pei

J.

et al. (

2015

)

LocNES: a computational tool for locating classical NESs in CRM1 cargo proteins

.

Bioinformatics

,

31

,

1357

–

1365

.

31.

Eisenhaber

B.

,

Bork

P.

and

Eisenhaber

F.

(

1999

)

Prediction of potential GPI-modification sites in proprotein sequences

.

J. Mol. Biol.

,

292

,

741

–

758

.

32.

Orchard

S.

,

Ammari

M.

,

Aranda

B.

et al. (

2014

)

The MIntAct project—IntAct as a common curation platform for 11 molecular interaction databases

.

Nucleic Acids Res.

,

42

,

D358

–

D363

.

33.

Uhlén

M.

,

Fagerberg

L.

,

Hallström

B.M.

et al. (

2015

)

Proteomics. Tissue-based map of the human proteome

.

Science

,

347

, 1260419.

34.

Hruz

T.

,

Laule

O.

,

Szabo

G.

et al. (

2008

)

Genevestigator V3: a reference expression database for the meta-analysis of transcriptomes

.

Adv. Bioinformatics

,

2008

,

1

–

5

.

35.

Zhu

Q.

,

Wong

A.K.

,

Krishnan

A.

et al. (

2015

)

Targeted exploration and analysis of large cross-platform human transcriptomic compendia

.

Nat. Methods

,

12

,

211

–

214

.

36.

Mi

H.

,

Muruganujan

A.

,

Ebert

D.

et al. (

2019

)

PANTHER version 14: more genomes, a new PANTHER GO-slim and improvements in enrichment analysis tools

.

Nucleic Acids Res.

,

47

,

D419

–

D426

.

37.

Bult

C.J.

,

Blake

J.A.

,

Smith

C.L.

et al. (

2019

)

Mouse Genome Database (MGD) 2019

.

Nucleic Acids Res.

,

47

,

D801

–

D806

.

38.

Howe

D.G.

,

Bradford

Y.M.

,

Conlin

T.

et al. (

2013

)

ZFIN, the Zebrafish model organism database: increased support for mutants and transgenics

.

Nucleic Acids Res.

,

41

,

D854

–

60

.

39.

Nenni

M.J.

,

Fisher

M.E.

,

James-Zorn

C.

et al. (

2019

)

Xenbase: facilitating the use of Xenopus to model human disease

.

Front. Physiol.

,

10

, 154.

40.

Larkin

A.

,

Marygold

S.J.

,

Antonazzo

G.

et al. (

2021

)

FlyBase: updates to the Drosophila melanogaster knowledge base

.

Nucleic Acids Res.

,

49

,

D899

–

D907

.

41.

Harris

T.W.

,

Arnaboldi

V.

,

Cain

S.

et al. (

2020

)

WormBase: a modern model organism information resource

.

Nucleic Acids Res.

,

48

,

D762

–

D767

.

PubMed

42.

Birling

M.C.

,

Yoshiki

A.

,

Adams

D.J.

et al. (

2019

)

A resource of targeted mutant mouse lines for 5,061 genes

.

bioRxiv

,

53

,

416

–

419

.

43.

Ashburner

M.

,

Ball

C.A.

,

Blake

J.A.

et al. (

2000

)

Gene Ontology: tool for the unification of biology

.

Nat. Genet.

,

25

,

25

–

29

.

44.

Huntley

R.P.

,

Sawford

T.

,

Mutowo-Meullenet

P.

et al. (

2015

)

The GOA database: gene ontology annotation updates for 2015

.

Nucleic Acids Res.

,

43

,

D1057

–

D1063

.

45.

Carbon

S.

,

Ireland

A.

,

Mungall

C.J.

et al. (

2009

)

AmiGO: online access to ontology and annotation data

.

Bioinformatics

,

25

,

288

–

289

.

46.

Giglio

M.

,

Tauber

R.

,

Nadendla

S.

et al. (

2019

)

Eco, the evidence & conclusion ontology: community standard for evidence information

.

Nucleic Acids Res.

,

47

,

D1186

–

D1194

.

47.

Merchant

S.S.

,

Prochnik

S.E.

,

Vallon

O.

et al. (

2007

)

The Chlamydomonas genome reveals the evolution of key animal and plant functions

.

Science (80-)

,

318

,

245

–

251

.

48.

Okamura

Y.

,

Aoki

Y.

,

Obayashi

T.

et al. (

2015

)

COXPRESdb in 2015: coexpression database for animal species by DNA-microarray and RNAseq-based expression data with multiple quality assessment systems

.

Nucleic Acids Res.

,

43

,

D82

–

D86

.

49.

Erdmann

V.A.

,

Szymanski

M.

,

Hochberg

A.

et al. (

2000

)

Non-coding, mRNA-like RNAs database Y2K

.

Nucleic Acids Res.

,

28

,

197

–

200

.

50.

Skarnes

W.C.

,

Rosen

B.

,

West

A.P.

et al. (

2011

)

A conditional knockout resource for the genome-wide study of mouse gene function

.

Nature

,

474

,

337

–

344

.

51.

Radivojac

P.

,

Clark

W.T.

,

Oron

T.R.

et al. (

2013

)

A large-scale evaluation of computational protein function prediction

.

Nat. Methods

,

10

,

221

–

227

.

52.

Ran

F.A.

,

Hsu

P.D.

,

Wright

J.

et al. (

2013

)

Genome engineering using the CRISPR-Cas9 system

.

Nat. Protoc.

,

8

,

2281

–

2308

.

53.

Firth

A.L.

,

Dargitz

C.T.

,

Qualls

S.J.

et al. (

2014

)

Generation of multiciliated cells in functional airway epithelia from human induced pluripotent stem cells

.

Proc. Natl. Acad. Sci. USA

,

111

, E1723–E1730.

54.

Chu

H.W.

,

Rios

C.

,

Huang

C.

et al. (

2015

)

CRISPR-Cas9-mediated gene knockout in primary human airway epithelial cells reveals a proinflammatory role for MUC18

.

Gene Ther.

,

22

,

822

–

829

.

55.

Radford

R.

,

Slattery

C.

,

Jennings

P.

et al. (

2012

)

Carcinogens induce loss of the primary cilium in human renal proximal tubular epithelial cells independently of effects on the cell cycle

.

Am. J. Physiol. - Ren. Physiol.

,

302

,

F905

–

F916

.

56.

Norris

D.P.

and

Grimes

D.T.

(

2012

)

Mouse models of ciliopathies: the state of the art

.

DMM Dis. Model. Mech.

,

5

,

299

–

312

.

57.

Jamsai

D.

and

O’Bryan

M.K.

(

2011

)

Mouse models in male fertility research

.

Asian J. Androl.

,

13

,

139

–

151

.

58.

Tamowski

S.

,

Aston

K.I.

and

Carrell

D.T.

(

2010

)

The use of transgenic mouse models in the study of male infertility

.

Syst. Biol. Reprod. Med.

,

56

,

260

–

273

.

59.

Werner

M.E.

and

Mitchell

B.J.

(

2013

)

Using Xenopus skin to study cilia development and function

.

Methods Enzymol.

,

525

,

191

–

217

.

60.

Walentek

P.

and

Quigley

I.K.

(

2017

)

What we can learn from a tadpole about ciliopathies and airway diseases: using systems biology in Xenopus to study cilia and mucociliary epithelia

.

Genesis

55

.

61.

Choksi

S.P.

,

Babu

D.

,

Lau

D.

et al. (

2014

)

Systematic discovery of novel ciliary genes through functional genomics in the zebrafish

.

Development

,

141

,

3410

–

3419

.

62.

Sheppard

E.C.

,

Rogers

S.

,

Harmer

N.J.

et al. (

2019

)

A universal fluorescence-based toolkit for real-time quantification of DNA and RNA nuclease activity

.

Sci. Rep.

,

9

, 8853.

63.

Franz-Wachtel

M.

,

Eisler

S.A.

,

Krug

K.

et al. (

2012

)

Global detection of protein kinase d-dependent phosphorylation events in nocodazole-treated human cells

.

Mol. Cell. Proteomics

,

11

,

160

–

170

.

64.

Woo

K.

,

Kim

T.

,

Lee

K.

et al. (

2011

)

Modulation of exosome‐mediated mRNA turnover by interaction of GTP‐binding protein 1 (GTPBP1) with its target mRNAs

.

FASEB J.

,

25

,

2757

–

2769

.

65.

Chassé

H.

,

Boulben

S.

,

Costache

V.

et al. (

2017

)

Analysis of translation using polysome profiling

.

Nucleic Acids Res.

,

45

, e15.

66.

Dominguez

D.

,

Tsai

Y.H.

,

Weatheritt

R.

et al. (

2016

)

An extensive program of periodic alternative splicing linked to cell cycle progression

.

Elife

,

5

, e10288.

67.

McPheeters

D.S.

and

Wise

J.A.

(

2013

)

Measurement of in vivo RNA synthesis rates

.

Meth. Enzymol.

,

530

,

117

–

135

.

68.

Guo

L.

,

Iida

A.

,

Bhavani

G.S.L.

et al. (

2021

)

Deficiency of TMEM53 causes a previously unknown sclerosing bone disorder by dysregulation of BMP-SMAD signaling

.

Nat. Commun.

,

12

, 2046.

69.

Gaudet

P.

,

Livstone

M.S.

,

Lewis

S.E.

et al. (

2011

)

Phylogenetic-based propagation of functional annotations within the Gene Ontology consortium

.

Brief. Bioinform.

,

12

,

449

–

462

.

70.

Rafi

Z.

and

Greenland

S.

(

2020

)

Semantic and cognitive tools to aid statistical science: replace confidence and significance by compatibility and surprise

.

BMC Med. Res. Methodol.

20

, 244.

71.

Buniello

A.

,

Macarthur

J.A.L.

,

Cerezo

M.

et al. (

2019

)

The NHGRI-EBI GWAS catalog of published genome-wide association studies, targeted arrays and summary statistics 2019

.

Nucleic Acids Res.

,

47

,

D1005

–

D1012

.

72.

Watanabe

K.

,

Stringer

S.

,

Frei

O.

et al. (

2019

)

A global overview of pleiotropy and genetic architecture in complex traits

.

Nat. Genet.

,

51

,

1339

–

1348

.

73.

Zahn-Zabal

M.

,

Attwood

T.K.

,

Foundation

T.G.

et al. (

2019

)

A critical guide to the neXtProt knowledgebase: querying using SPARQL

.

F1000Research

,

8

, 791.

74.

Mendes de Farias

T.

,

Sima

A.C.

,

Dessimoz

C.

et al. (

2020

)

A hands-on introduction to querying evolutionary relationships across multiple data sources using SPARQL

.

F1000Research

,

8

, 1822.

75.

Zhou

N.

,

Jiang

Y.

,

Bergquist

T.R.

et al. (

2019

)

The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens

.

Genome Biol.

,

20

, 244.

76.

Zhao

B.

,

Zhang

Z.

,

Jiang

M.

et al. (

2020

)

NPF: network propagation for protein function prediction

.

BMC Bioinform.

,

21

, 355.

77.

Zhang

C.

,

Lane

L.

,

Omenn

G.S.

et al. (

2019

)

Blinded testing of function annotation for uPE1 proteins by I-TASSER/COFACTOR pipeline using the 2018–2019 additions to neXtProt and the CAFA3 challenge

.

J. Proteome Res.

,

18

,

4154

–

4166

.

78.

Balakrishnan

R.

,

Harris

M.A.

,

Huntley

R.

et al. (

2013

)

A guide to best practices for Gene Ontology (GO) manual annotation

.

Database

,

2013

, bat054.

79.

Melaine

N.

,

Com

E.

,

Bellaud

P.

et al. (

2018

)

Deciphering the dark proteome: use of the testis and characterization of two dark proteins

.

J. Proteome Res.

,

17

,

4197

–

4210

.

80.

Bontems

F.

,

Fish

R.J.

,

Borlat

I.

et al. (

2014

)

C2orf62 and TTC17 are involved in actin organization and ciliogenesis in zebrafish and human

.

PLoS One

,

9

, e86476.