当前位置: X-MOL 学术Sci. Program. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Analyzing Influenza Virus Sequences using Binary Encoding Approach
Scientific Programming ( IF 1.672 ) Pub Date : 2012 , DOI: 10.3233/spr-2012-334
Ham Ching Lam, Srinand Sreevatsan, Daniel Boley

Capturing mutation patterns of each individual influenza virus sequence is often challenging; in this paper, we demonstrated that using a binary encoding scheme coupled with dimension reduction technique, we were able to capture the intrinsic mutation pattern of the virus. Our approach looks at the variance between sequences instead of the commonly used p-distance or Hamming distance. We first convert the influenza genetic sequences to a binary strings and form a binary sequence alignment matrix and then apply Principal Component Analysis (PCA) to this matrix. PCA also provides identification power to identify reassortant virus by using data projection technique. Due to the sparsity of the binary string, we were able to analyze large volume of influenza sequence data in a very short time. For protein sequences, our scheme also allows the incorporation of biophysical properties of each amino acid. Here, we present various encouraging results from analyzing influenza nucleotide, protein and genome sequences using the proposed approach.

中文翻译:

使用二进制编码方法分析流感病毒序列

捕获每种流感病毒序列的突变模式通常是一项挑战。在本文中,我们证明了使用二进制编码方案结合降维技术,我们能够捕获病毒的固有突变模式。我们的方法着眼于序列之间的方差,而不是常用的p距离或汉明距离。我们首先将流感遗传序列转换为二进制字符串,并形成二进制序列比对矩阵,然后将主成分分析(PCA)应用于此矩阵。PCA还通过使用数据投影技术提供了识别能力,以识别重配病毒。由于二进制字符串的稀疏性,我们能够在很短的时间内分析大量流感序列数据。对于蛋白质序列,我们的方案还允许整合每种氨基酸的生物物理特性。在这里,我们通过使用提出的方法分析流感核苷酸,蛋白质和基因组序列,提出了各种令人鼓舞的结果。
更新日期:2020-09-25
down
wechat
bug