当前位置: X-MOL 学术Genet. Epidemiol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Modelling RNA-Seq data with a zero-inflated mixture Poisson linear model.
Genetic Epidemiology ( IF 1.7 ) Pub Date : 2019-07-22 , DOI: 10.1002/gepi.22246
Siyun Liu 1 , Yuan Jiang 2 , Tao Yu 1
Affiliation  

RNA sequencing (RNA-Seq) has been frequently used in genomic studies and has generated a vast amount of data. The RNA-Seq data are composed of two parts: (a) a sequence of nucleotides of the genome; and (b) a corresponding sequence of counts, standing for the number of short reads whose mapped positions start at each position of the genome. One common feature of these count data is that they are typically nonuniform; recent studies have revealed that the nonuniformity is partially owing to a systematic bias resulted from the sequencing preference. Existing works in the literature model the nonuniformity with a single component Poisson linear model that incorporates the effects of the sequencing preference. However, we observe consistently that the short reads mapped to a gene may have a mixture structure and can be zero-inflated. A single component model may not suffice to model the complexity of such data. In this paper, we propose a zero-inflated mixture Poisson linear model for the RNA-Seq count data and derive a fast expectation-maximisation-based algorithm for estimating the unknown parameters. Numerical studies are conducted to illustrate the effectiveness of our method.

中文翻译:

使用零膨胀混合Poisson线性模型对RNA-Seq数据进行建模。

RNA测序(RNA-Seq)已广泛用于基因组研究,并已产生了大量数据。RNA-Seq数据由两部分组成:(a)基因组核苷酸序列;(b)相应的计数序列,代表其映射位置始于基因组每个位置的短读的数目。这些计数数据的一个共同特征是它们通常是不一致的。最近的研究表明,不均匀性部分是由于测序偏好导致的系统性偏差。文献中的现有工作使用单组分Poisson线性模型对非均匀性进行建模,该模型包含了测序偏好的影响。但是,我们始终观察到,映射到基因的短读段可能具有混合结构,并且可以零膨胀。单个组件模型可能不足以对此类数据的复杂性进行建模。在本文中,我们为RNA-Seq计数数据提出了一种零膨胀混合Poisson线性模型,并推导了基于快速期望最大化的算法来估计未知参数。进行了数值研究,以说明我们方法的有效性。
更新日期:2019-11-01
down
wechat
bug