当前位置: X-MOL 学术Biosemiotics › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
On the Verge of Life: Distribution of Nucleotide Sequences in Viral RNAs
Biosemiotics ( IF 2.1 ) Pub Date : 2021-02-17 , DOI: 10.1007/s12304-021-09403-5
Mykola Husev 1 , Andrij Rovenchak 1
Affiliation  

The aim of the study is to analyze viruses using parameters obtained from distributions of nucleotide sequences in the viral RNA. Seeking for the input data homogeneity, we analyze single-stranded RNA viruses only. Two approaches are used to obtain the nucleotide sequences; In the first one, chunks of equal length (four nucleotides) are considered. In the second approach, the whole RNA genome is divided into parts by adenine or the most frequent nucleotide as a “space”. Rank–frequency distributions are studied in both cases. The defined nucleotide sequences are signs comparable to a certain extent to syllables or words as seen from the nature of their rank–frequency distributions. Within the first approach, the Pólya and the negative hypergeometric distribution yield the best fit. For the distributions obtained within the second approach, we have calculated a set of parameters, including entropy, mean sequence length, and its dispersion. The calculated parameters became the basis for the classification of viruses. We observed that proximity of viruses on planes spanned on various pairs of parameters corresponds to related species. In certain cases, such a proximity is observed for unrelated species as well calling thus for the expansion of the set of parameters used in the classification. We also observed that the fifth most frequent nucleotide sequences obtained within the second approach are of different nature in case of human coronaviruses (different nucleotides for MERS, SARS-CoV, and SARS-CoV-2 versus identical nucleotides for four other coronaviruses). We expect that our findings will be useful as a supplementary tool in the classification of diseases caused by RNA viruses with respect to severity and contagiousness.



中文翻译:

在生命的边缘:病毒RNA中核苷酸序列的分布

该研究的目的是使用从病毒 RNA 中核苷酸序列分布获得的参数来分析病毒。为了寻求输入数据的同质性​​,我们只分析单链 RNA 病毒。使用两种方法来获得核苷酸序列;在第一个中,考虑了等长(四个核苷酸)的块。在第二种方法中,整个 RNA 基因组通过腺嘌呤或最常见的核苷酸作为“空间”分成几部分。在这两种情况下都研究了秩频率分布。从排序频率分布的性质来看,定义的核苷酸序列是在一定程度上与音节或单词相当的符号。在第一种方法中,Pólya 和负超几何分布产生最佳拟合。对于在第二种方法中获得的分布,我们计算了一组参数,包括熵、平均序列长度及其离散度。计算出来的参数成为病毒分类的基础。我们观察到,跨越各种参数对的平面上病毒的接近度对应于相关物种。在某些情况下,对于不相关的物种也可以观察到这种接近性,因此也需要扩展分类中使用的参数集。我们还观察到,在第二种方法中获得的第五个最常见的核苷酸序列在人类冠状病毒的情况下具有不同的性质(MERS、SARS-CoV 和 SARS-CoV-2 的核苷酸不同,而其他四种冠状病毒的核苷酸相同)。

更新日期:2021-02-17
down
wechat
bug