Bioinformatics Protocols for Quickly Obtaining Large-Scale Data Sets for Phylogenetic Inferences.,Interdisciplinary Sciences: Computational Life Sciences

当前位置： X-MOL 学术 › Interdiscip. Sci. Comput. Life Sci. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Bioinformatics Protocols for Quickly Obtaining Large-Scale Data Sets for Phylogenetic Inferences.
Interdisciplinary Sciences: Computational Life Sciences ( IF 4.8 ) Pub Date : 2018-12-05 , DOI: 10.1007/s12539-018-0312-5
Hugo López-Fernández _{1,

2,

3,

4,

5} , Pedro Duque _{4,

5,

6} , Sílvia Henriques _{4,

5} , Noé Vázquez _{1,

2} , Florentino Fdez-Riverola _{1,

2,

3} , Cristina P Vieira _{4,

5} , Miguel Reboiro-Jato _{1,

2,

3} , Jorge Vieira _{4,

5}

Affiliation

Useful insight into the evolution of genes and gene families can be provided by the analysis of all available genome datasets rather than just a few, which are usually those of model species. Handling and transforming such datasets into the desired format for downstream analyses is, however, often a difficult and time-consuming task for researchers without a background in informatics. Therefore, we present two simple and fast protocols for data preparation, using an easy-to-install, open-source, cross-platform software application with user-friendly, rich graphical user interface (SEDA; http://www.sing-group.org/seda/index.html ). The first protocol is a substantial improvement over one recently published (López-Fernández et al. Practical applications of computational biology and bioinformatics, 12th International conference. Springer, Cham, pp 88-96 (2019)[1]), which was used to study the evolution of GULO, a gene that encodes the enzyme responsible for the last step of vitamin C synthesis. In this paper, we show how the sequence data file used for the phylogenetic analyses can now be obtained much faster by changing the way coding sequence isoforms are removed, using the newly implemented SEDA operation "Remove isoforms". This protocol can be used to easily show that putative functional GULO genes are present in several Prostotomian groups such as Molluscs, Priapulida and Arachnida. Such findings could have been easily missed if only a few Protostomian model species had been used. The second protocol allowed us to identify positively selected amino acid sites in a set of 19 primate HLA immunity genes. Interestingly, the proteins encoded by MHC class II genes can show just as many positively selected amino acid sites as those encoded by classical MHC class I genes. Although a significant percentage of codons, which can be as high as 14.8%, are evolving under positive selection, the main mode of evolution of HLA immunity genes is purifying selection. Using a large number of primate species, the probability of missing the identification of positively selected amino acid sites is lower. Both projects were performed in less than one week, and most of the time was spent running the analyses rather than preparing the files. Such protocols can be easily adapted to answer many other questions using a phylogenetic approach.

中文翻译：

快速获取系统发生推断的大规模数据集的生物信息学协议。

通过分析所有可用的基因组数据集，而不是少数几个通常是模型物种的数据集，可以提供有关基因和基因家族进化的有用见解。但是，对于没有信息学背景的研究人员而言，将此类数据集处理并转换为所需格式以进行下游分析通常是一项艰巨且耗时的任务。因此，我们提供了两种简单而快速的数据准备协议，即使用易于安装的开放源代码，跨平台的软件应用程序，以及易于使用的丰富图形用户界面（SEDA； http：//www.sing- group.org/seda/index.html）。第一项协议是对最近发表的协议的重大改进（López-Fernández等人，第12届国际会议，计算生物学和生物信息学的实际应用。pp 88-96（2019）[1]），用于研究GULO的进化，GULO是编码负责维生素C合成最后一步的酶的基因。在本文中，我们展示了如何使用新实现的SEDA操作“删除同工型”，通过更改删除编码同工型的方式，现在可以更快地获得用于系统发育分析的序列数据文件。该协议可用于轻松显示推定的功能性GULO基因存在于多个Protomtomian组中，例如软体动物，Priapulida和Arachnida。如果只使用了少数的原生动物模型物种，这些发现很容易被遗漏。第二种方案使我们能够在一组19个灵长类HLA免疫基因中鉴定出阳性选择的氨基酸位点。有趣的是 MHC II类基因编码的蛋白质可以显示出与经典I类MHC基因编码的氨基酸一样多的阳性选择位点。尽管在正选择下进化了很大比例的密码子（可能高达14.8％），但HLA免疫基因进化的主要方式是纯化选择。使用大量的灵长类动物，错过鉴定阳性选择的氨基酸位点的可能性较低。这两个项目都在不到一周的时间内完成，并且大部分时间都花在了运行分析上，而不是准备文件。使用系统发育方法，可以轻松地将此类协议用于回答许多其他问题。正选择下进化率高达14.8％，HLA免疫基因进化的主要方式是纯化选择。使用大量的灵长类动物，错过鉴定阳性选择的氨基酸位点的可能性较低。这两个项目都在不到一周的时间内完成，并且大部分时间都花在了运行分析上，而不是准备文件。使用系统发育方法，可以轻松地将此类协议用于回答许多其他问题。正选择下进化率高达14.8％，HLA免疫基因进化的主要方式是纯化选择。使用大量的灵长类动物，错过鉴定阳性选择的氨基酸位点的可能性较低。这两个项目都在不到一周的时间内完成，并且大部分时间都花在了运行分析上，而不是准备文件。使用系统发育方法，可以轻松地将此类协议用于回答许多其他问题。而且大部分时间都花在了运行分析上，而不是准备文件。使用系统发育方法，可以轻松地将此类协议用于回答许多其他问题。而且大部分时间都花在了运行分析上，而不是准备文件。使用系统发育方法，可以轻松地将此类协议用于回答许多其他问题。

更新日期：2019-11-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>