当前位置: X-MOL 学术Int. J. Comput. Sci. Eng. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
OutPyR: Bayesian inference for RNA-Seq outlier detection
Journal of Computational Science ( IF 3.3 ) Pub Date : 2020-10-31 , DOI: 10.1016/j.jocs.2020.101245
Edin Salkovic , Mostafa M. Abbas , Samir Brahim Belhaouari , Khaoula Errafii , Halima Bensmail

High-throughput RNA sequencing technologies (RNA-Seq) have recently started being used as a tool for helping diagnose rare genetic disorders, as they can indicate abnormal gene expression counts — a telltale sign of genetic pathology. Existing solutions either require a large number of samples or do not provide proper statistical significance testing.

We present a Bayesian model (OutPyR) for identifying abnormal RNA-Seq gene expression counts in datasets, particularly those with a small number of samples. The model incorporates recently introduced data-augmentation techniques to efficiently and accurately infer parameters of the underlying negative binomial process, while also assessing the uncertainty of the inference, and giving the possibility to generate simulated data. The model's software implementation is object oriented and thus easily extensible, provides parameter-trace exploration, fault-tolerance and recovery during the parameter estimation process. We also develop a p-value based outlier score that naturally stems from our model. We apply the model to real and simulated datasets, for different organisms and tissues, and present comparisons with existing models.

Our model is implemented purely in Python and its standalone source code is available at https://github.com/esalkovic/outpyr.



中文翻译:

OutPyR:用于RNA-Seq离群值检测的贝叶斯推断

高通量RNA测序技术(RNA-Seq)最近开始被用作帮助诊断罕见遗传疾病的工具,因为它们可以指示异常的基因表达计数-这是遗传病理学的一个好兆头。现有解决方案要么需要大量样本,要么不提供适当的统计显着性检验。

我们提出了一种贝叶斯模型(OutPyR),用于识别数据集中的异常RNA-Seq基因表达计数,尤其是那些样本数量较少的数据。该模型结合了最近引入的数据增强技术,可以高效,准确地推断出潜在的负二项式过程的参数,同时还可以评估推断的不确定性,并可以生成模拟数据。该模型的软件实现是面向对象的,因此易于扩展,可在参数估计过程中提供参数跟踪,容错和恢复。我们还开发了p基于值的离群值自然源自我们的模型。我们将该模型应用于不同生物和组织的真实和模拟数据集,并与现有模型进行比较。

我们的模型完全是用Python实现的,其独立的源代码可从https://github.com/esalkovic/outpyr获得。

更新日期:2020-11-06
down
wechat
bug