当前位置: X-MOL 学术BMC Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Data-based RNA-seq simulations by binomial thinning.
BMC Bioinformatics ( IF 2.9 ) Pub Date : 2020-05-24 , DOI: 10.1186/s12859-020-3450-9
David Gerard 1
Affiliation  

BACKGROUND With the explosion in the number of methods designed to analyze bulk and single-cell RNA-seq data, there is a growing need for approaches that assess and compare these methods. The usual technique is to compare methods on data simulated according to some theoretical model. However, as real data often exhibit violations from theoretical models, this can result in unsubstantiated claims of a method's performance. RESULTS Rather than generate data from a theoretical model, in this paper we develop methods to add signal to real RNA-seq datasets. Since the resulting simulated data are not generated from an unrealistic theoretical model, they exhibit realistic (annoying) attributes of real data. This lets RNA-seq methods developers assess their procedures in non-ideal (model-violating) scenarios. Our procedures may be applied to both single-cell and bulk RNA-seq. We show that our simulation method results in more realistic datasets and can alter the conclusions of a differential expression analysis study. We also demonstrate our approach by comparing various factor analysis techniques on RNA-seq datasets. CONCLUSIONS Using data simulated from a theoretical model can substantially impact the results of a study. We developed more realistic simulation techniques for RNA-seq data. Our tools are available in the seqgendiff R package on the Comprehensive R Archive Network: https://cran.r-project.org/package=seqgendiff.

中文翻译:


通过二项式细化进行基于数据的 RNA-seq 模拟。



背景随着设计用于分析大量和单细胞RNA-seq数据的方法数量的爆炸式增长,对评估和比较这些方法的方法的需求日益增长。通常的技术是根据某种理论模型来比较模拟数据的方法。然而,由于实际数据经常与理论模型存在偏差,这可能会导致对方法性能的说法未经证实。结果在本文中,我们不是从理论模型生成数据,而是开发向真实 RNA-seq 数据集添加信号的方法。由于生成的模拟数据不是从不切实际的理论模型生成的,因此它们表现出真实数据的现实(烦人)属性。这使得 RNA-seq 方法开发人员可以在非理想(模型违规)场景中评估他们的程序。我们的程序可应用于单细胞和批量 RNA 测序。我们表明,我们的模拟方法会产生更真实的数据集,并且可以改变差异表达分析研究的结论。我们还通过比较 RNA-seq 数据集上的各种因素分析技术来展示我们的方法。结论 使用理论模型模拟的数据可以极大地影响研究结果。我们为 RNA-seq 数据开发了更真实的模拟技术。我们的工具可在综合 R 存档网络上的 seqgendiff R 包中找到:https://cran.r-project.org/package=seqgendiff。
更新日期:2020-05-24
down
wechat
bug