Special Section on 3DOR 2020SHREC 2020: Multi-domain protein shape retrieval challenge
Graphical abstract
Introduction
Proteins are complex macro-molecular molecules with various shapes and sizes ranging from hundreds to millions of atoms [1]. The 3D arrangement of protein atoms is directly linked to specific functions that are mostly mediated through the protein surface. Protein surfaces are of great interest in drug discovery pipelines, adverse drug reaction or the characterization of cellular processes at the molecular level. However, challenges in protein surfaces comparison may arise from (a) the dynamical, non-rigid nature of the proteins that allows protein conformational changes, i.e., surficial modifications and therefore specific functions, (b) the intrinsic structure of multi-domain proteins, i.e., the fusion of multiple, individual domains into one protein throughout evolution, and (c) the similarity between distinct protein structures and surfaces inherited from their evolutionary relationships.
The SHape REtrieval Challenges (SHREC) are time-restricted challenges, which aim to evaluate the effectiveness of 3D-shape retrieval algorithms. Typically, a challenge is opened by proposing a dataset of related shapes to participants while retaining the class membership. In the SHape REtrieval Challenge 2020 (SHREC2020) track on multi-domain protein shapes, the participants had 7 weeks from the dataset publication to send their results with a description of the methods used to generate the results (see Section 4). This SHREC2020 track on multi-domain protein shapes evaluates the current ability of shape comparison methods proposed by 6 different groups to tackle the protein surface comparison problem. The participants were asked to send their results in the form of matrices containing all-to-all dissimilarity scores. The results were analyzed and the overall retrieval performances are presented here.
The dataset includes 588 proteins consisting of two domains (the functional units of the proteins); only the corresponding triangulated meshes of their solvent-excluded surfaces (SES) [2] were provided as input to the participants. We then evaluated the retrieval performance of each method to retrieve the evolutionary relationships between orthologous proteins (proteins that have the same function in different organisms), and to retrieve the different conformations of an individual protein. Here, we present the results of all the participants and methods, and briefly discuss the trade-off between performance in retrieval and computational cost of each method.
Section snippets
Dataset
Proteins are linear polymers (the so-called protein chains) made of amino-acid residues (up to several hundreds), which fold into a specific, well-defined 3D structure. Furthermore, many proteins need to form a complex of several chains to become functional. For instance, the human heamoglobin requires two α-globin and two β-globin chains to be fully functional. Domains define the functional units of the proteins, and are usually associated with a specific function and/or interaction; it is
Evaluation
Analyses were performed with scikit-learn [13] and numpy [14], and Figs. 4 and 5 were produced using matplotlib [15].
Nearest Neighbor, First-tier and Second-tier These retrieval metrics measure the ratio of models that belong to the same class as the query. For Nearest Neighbor (NN), the first match only is considered (the identity is not considered), while the and first matches, where |C| denotes the size of the query’s class, are considered for First-tier (T1) and Second-tier
Participants & methods
Six groups from five different countries registered for the track and submitted 15 dissimilarity matrices in the requested time (8 weeks) along with the description of their protocol. To ease the reading, we have assigned each group a short name for referencing in the following text.
- 1.
CODSEQ by Halim Benhabiles, Karim Hammoudi, Adnane Cabani, Feryal Windal, Mahmoud Melkemi (Section 4.1),
- 2.
3DZ by Tunde Aderinwale, Genki Terashi, Charles Christoffer, Daisuke Kihara (Section 4.2),
- 3.
WKS/SGWS by Yuxu
Results & discussion
In this section, we assess quantitatively the performance of each method described in Section 4. We analyzed the performance at the protein (Fig. 4 and Table 6) and the species (Fig. 5 and Table 7) levels as described in Section 3.
Protein level
At the protein level, the 588 shapes were gathered into 7 classes of multi-domain orthologous proteins; among each class, all members share at least one common domain while the other domains are different.
This feature allows the methods for having
Conclusion
In the present work, we have presented a dataset of shapes from multi-domain proteins. Six groups, among which three used machine learning approaches in their respective work-flows, submitted 15 sets of results. The performances were assessed at the protein and species levels of the SCOPe database.
Shape retrieval methods displayed high-quality results at the protein level. We observed a significant decrease in the performances of all the methods at the species level. These results indicate that
CRediT authorship contribution statement
Florent Langenfeld: Conceptualization, Data curation, Formal analysis, Investigation, Writing - original draft, Supervision. Yuxu Peng: Software, Investigation, Resources, Writing - review & editing. Yu-Kun Lai: Software, Investigation, Resources, Writing - review & editing. Paul L. Rosin: Software, Investigation, Resources, Writing - review & editing. Tunde Aderinwale: Software, Investigation, Resources, Writing - review & editing. Genki Terashi: Software, Investigation, Resources, Writing -
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
Yuxu Peng was supported by the Young teachers growth plan project (2019QJCZ014) funded by Changsha University of Science & Technology.
Stelios Mylonas, Apostolos Axenopoulos and Petros Daras were supported by the ATXN1-MED15 PPI project funded by the GSRT - Hellenic Foundation for Research and Innovation.
Matthieu Montes and Florent Langenfeld were supported by the European Research Council Executive Agency under the research grant number 640283.
References (35)
- et al.
Scope: manual curation and artifact removal in the structural classification of proteins extended database
J Mol Biol
(2017) - et al.
M2DP: a novel 3D point cloud descriptor and its application in loop closure detection
Proceedings of the 2016 IEEE/RSJ international conference on intelligent robots and systems (IROS)
(2016) - et al.
Biochemistry
(2019) Analytical molecular surface calculation
J Appl Crystallogr
(1983)- et al.
SCOPe: Structural classification of proteins—extended, integrating SCOP and ASTRAL data and classification of new structures
Nucl Acids Res
(2013) - et al.
Scope: classification of large macromolecular structures in the structural classification of proteins extended database
Nucl Acids Res
(2018) - et al.
The protein data bank
Nucl Acids Res
(2000) - et al.
Generating triangulated macromolecular surfaces by euclidean distance transform
PLoS One
(2009) - et al.
Improved treatment of ligands and coupling effects in empirical calculation and rationalization of PKA values
J Chem Theory Comput
(2011) - et al.
Propka3: consistent treatment of internal and surface residues in empirical pka predictions
J Chem Theory Comput
(2011)
Shrec 2018 protein shape retrieval
Proceedings of the Eurographics workshop on 3D object retrieval
Protein shape retrieval contest
Deciphering interaction fingerprints from protein molecular surfaces using geometric deep learning
Nat Methods
Scikit-learn: machine learning in Python
J Mach Learn Res
Matplotlib: a 2D graphics environment
Comput Sci Eng
Inception-v4, inception-resnet and the impact of residual connections on learning
Proceedings of the 31st AAAI conference on artificial intelligence, AAAI17
Cited by (16)
PLO3S: Protein LOcal Surficial Similarity Screening
2024, Computational and Structural Biotechnology JournalGEO-Nav: A geometric dataset of voltage-gated sodium channels
2023, Computers and Graphics (Pergamon)SHREC 2022: Protein–ligand binding site recognition
2022, Computers and Graphics (Pergamon)Citation Excerpt :These types of approaches, and especially the combination with ML (without any use of chemical information), are still relatively poorly explored for the pocket detection task. This SHREC contest differs from previous SHREC contests related to proteins retrieval and classification, e.g., [23,24], because the focus here is the identification of delimited binding sites rather than the comparison of the whole molecular surface or its domains. Moreover, it also differs from contests on the classification of cryo-electron tomograms, e.g., [25], because the structures we consider are obtained at a finer level of resolution, and we are not focusing in the interaction of a complex system of thousands of proteins.
Surface-based protein domains retrieval methods from a SHREC2021 challenge
2022, Journal of Molecular Graphics and ModellingCitation Excerpt :Overall, the results are decreased compared to similar past tracks [34]. Indeed, two methods based on descriptors similar to 3DZD and APPFD-FK-GMM (3DZD and HAPPS, respectively) were presented in the SHREC′20 contest and performed very well (e.g both methods exceeding 0.95 for the NN metric) on a problem similar to the shape-only problem (see Tables 6 and 7 of [34]). However, the adapted versions (3DZD and APPFD-FK-GMM) did not reach the same level of performance by exploiting this new, particular dataset of proteins.
Convexity aware Signed Graph Convolutional Neural Network for 3D Object Retrieval
2022, Procedia Computer ScienceSHREC 2021 Track: Retrieval and classification of protein surfaces equipped with physical and chemical properties
2021, Computers and Graphics (Pergamon)Citation Excerpt :We trained two types of neural network, visually depicted in Fig. 6, to output a score that measures the dissimilarity between a pair of protein shapes, encoded via the 3DZDs. The first framework (Extractor model) was previously used in a SHREC track on multi-domain protein shape retrieval, see [11]. The network is structured into multiple layers: an encoder layer, which converts 3DZD to a vector of 150 features, has 3 hidden units of size 250, 200, and 150, respectively; a feature comparator layer that computes the Euclidean distance, the cosine distance, the element-wise absolute difference, and product; and a fully connected layer with 2 hidden units of size 100 and 50, respectively.
- 1
Track organizers and corresponding authors.