当前位置: X-MOL 学术Curr. Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
GASPIDs Versus Non-GASPIDs - Differentiation Based on Machine Learning Approach
Current Bioinformatics ( IF 4 ) Pub Date : 2020-10-31 , DOI: 10.2174/1574893615999200425225729
Fawad Ahmad 1 , Saima Ikram 1 , Jamshaid Ahmad 1 , Waseem Ullah 2 , Fahad Hassan 1 , Saeed Ullah Khattak 1 , Irshad Ur Rehman 1
Affiliation  

Background: Peptidases are a group of enzymes which catalyze the cleavage of peptide bonds. Around 2-3% of the whole genome codes for proteases and about one-third of all known proteases are serine proteases which are divided into 13 clans and 40 families. They are involved in diverse physiological roles such as digestion, coagulation of blood, fibrinolysis, processing of proteins and prohormones, signaling pathways, complement fixation, and have a vital role in the immune defense system. Based on their functions, they can broadly be divided into two classes; GASPIDs (Granule Associated Serine Peptidases involved in Immune Defense System) and Non- GASPIDs. GASPIDs, in particular are involved in immune-associated functions i.e. initiating apoptosis to kill virally infected and cancerous cells, cytokine modulation for the generation of inflammatory responses, and direct killing of pathogens through phagosomes.

Methods: In this study, sequence-based characterization of these two types of serine proteases is performed. We first identified sequences by analyzing multiple online databases as well as by analyzing whole genomes of different species from different orthologous and non-orthologous species. Sequences were identified by devising a distinct criterion to differentiate GASPIDs from Non-GASPIDs. The translated version of these sequences was then subjected to feature extraction. Using these distinctive features, we differentiated GASPIDs from Non-GASPIDs by applying multiple supervised machine learning models.

Results and Conclusion: Our results show that, among the three classifiers used in this study, SVM classifier coupled with tripeptide as feature method has shown the best accuracy in classification of sequences as GASPIDs and Non-GASPIDs.



中文翻译:

GASPID非GASPID-基于机器学习方法的区分

背景:肽酶是一组催化肽键裂解的酶。整个基因组中约有2-3%编码蛋白酶,所有已知蛋白酶中约有三分之一是丝氨酸蛋白酶,分为13个氏族和40个家族。它们参与多种生理作用,例如消化,血液凝结,纤维蛋白溶解,蛋白质和激素原的加工,信号传导途径,补体固定,并且在免疫防御系统中起着至关重要的作用。根据它们的功能,它们可以大致分为两类。GASPIDs(涉及免疫防御系统的颗粒相关丝氨酸肽酶)和非GASPIDs。GASPID特别涉及免疫相关功能,即启动凋亡以杀死病毒感染和癌细胞,

方法:在这项研究中,对这两种类型的丝氨酸蛋白酶进行基于序列的表征。我们首先通过分析多个在线数据库以及分析来自不同直系同源和非直系同源物种的不同物种的整个基因组来鉴定序列。通过设计区分GASPID与非GASPID的独特标准来鉴定序列。然后对这些序列的翻译版本进行特征提取。利用这些独特的功能,我们通过应用多个监督的机器学习模型将GASPID与非GASPID进行区分。

结果与结论:我们的结果表明,在这项研究中使用的三个分类器中,结合三肽作为特征方法的SVM分类器在将序列分类为GASPID和Non-GASPID方面显示出最佳的准确性。

更新日期:2020-10-31
down
wechat
bug