当前位置: X-MOL 学术Curr. Comput.-Aided Drug Des. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Clustering of Zika Viruses Originating from Different Geographical Regions using Computational Sequence Descriptors
Current Computer-Aided Drug Design ( IF 1.7 ) Pub Date : 2021-03-31 , DOI: 10.2174/1573409916666191226110936
Marjan Vračko 1 , Subhash C Basak 2 , Dwaipayan Sen 3 , Ashesh Nandy 3
Affiliation  

Background: In this report, we consider a data set, which consists of 310 Zika virus genome sequences taken from different continents, Africa, Asia and South America. The sequences, which were compiled from GenBank, were derived from the host cells of different mammalian species (Simiiformes, Aedes opok, Aedes africanus, Aedes luteocephalus, Aedes dalzieli, Aedes aegypti, and Homo sapiens).

Methods: For chemometrical treatment, the sequences have been represented by sequence descriptors derived from their graphs or neighborhood matrices. The set was analyzed with three chemometrical methods: Mahalanobis distances, principal component analysis (PCA) and self organizing maps (SOM). A good separation of samples with respect to the region of origin was observed using these three methods.

Results: Study of 310 Zika virus genome sequences from different continents. To characterize and compare Zika virus sequences from around the world using alignment-free sequence comparison and chemometrical methods.

Conclusion: Mahalanobis distance analysis, self organizing maps, principal components were used to carry out the chemometrical analyses of the Zika sequence data. Genome sequences are clustered with respect to the region of origin (continent, country). Africa samples are well separated from Asian and South American ones.



中文翻译:

使用计算序列描述符对源自不同地理区域的寨卡病毒进行聚类

背景:在本报告中,我们考虑了一个数据集,其中包含来自不同大陆、非洲、亚洲和南美洲的 310 个寨卡病毒基因组序列。这些序列是从 GenBank 编译的,来自不同哺乳动物物种(拟形目、伊蚊、非洲伊蚊、黄头伊蚊、达尔兹伊蚊、埃及伊蚊和智人)的宿主细胞。

方法:对于化学计量学处理,序列由从它们的图或邻域矩阵导出的序列描述符表示。使用三种化学计量学方法分析该集合:马氏距离、主成分分析 (PCA) 和自组织图 (SOM)。使用这三种方法观察到样品相对于来源区域的良好分离。

结果:研究了来自不同大陆的 310 个寨卡病毒基因组序列。使用无对齐序列比较和化学计量学方法表征和比较来自世界各地的寨卡病毒序列。

结论:利用马氏距离分析、自组织图谱、主成分对寨卡病毒序列数据进行化学计量分析。基因组序列相对于起源地区(大陆、国家)进行聚类。非洲样本与亚洲和南美样本完全分开。

更新日期:2021-05-18
down
wechat
bug