当前位置: X-MOL 学术BMC Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Methylation data imputation performances under different representations and missingness patterns.
BMC Bioinformatics ( IF 3 ) Pub Date : 2020-06-29 , DOI: 10.1186/s12859-020-03592-5
Pietro Di Lena 1 , Claudia Sala 2 , Andrea Prodi 3 , Christine Nardini 4
Affiliation  

High-throughput technologies enable the cost-effective collection and analysis of DNA methylation data throughout the human genome. This naturally entails missing values management that can complicate the analysis of the data. Several general and specific imputation methods are suitable for DNA methylation data. However, there are no detailed studies of their performances under different missing data mechanisms –(completely) at random or not- and different representations of DNA methylation levels (β and M-value). We make an extensive analysis of the imputation performances of seven imputation methods on simulated missing completely at random (MCAR), missing at random (MAR) and missing not at random (MNAR) methylation data. We further consider imputation performances on the popular β- and M-value representations of methylation levels. Overall, β-values enable better imputation performances than M-values. Imputation accuracy is lower for mid-range β-values, while it is generally more accurate for values at the extremes of the β-value range. The MAR values distribution is on the average more dense in the mid-range in comparison to the expected β-value distribution. As a consequence, MAR values are on average harder to impute. The results of the analysis provide guidelines for the most suitable imputation approaches for DNA methylation data under different representations of DNA methylation levels and different missing data mechanisms.

中文翻译:

不同表征和缺失模式下的甲基化数据归因性能。

高通量技术可以经济高效地收集和分析整个人类基因组中的DNA甲基化数据。这自然需要缺失值管理,这会使数据分析变得复杂。几种通用和特定的估算方法适用于DNA甲基化数据。但是,对于在不同的缺失数据机制下(完全)随机(或不随机)和DNA甲基化水平(β和M值)的不同表示形式,没有对其性能进行详细研究。我们对七种插补方法的插补性能进行了广泛的分析,模拟插补完全随机(MCAR),随机插补(MAR)和非随机插补(MNAR)甲基化数据。我们进一步考虑了甲基化水平的常用β值和M值表示法上的归因性能。总体,β值比M值具有更好的插补性能。中值β值的插补精度较低,而对于β值范围的极限值,插补精度通常更高。与预期的β值分布相比,MAR值分布平均在中间范围内更密集。结果,MAR值平均难以估算。分析结果为在DNA甲基化水平的不同表示形式和不同的缺失数据机制下,最合适的DNA甲基化数据的估算方法提供了指导。与预期的β值分布相比,MAR值分布平均在中间范围内更密集。结果,MAR值平均难以估算。分析结果为在DNA甲基化水平的不同表示形式和不同的缺失数据机制下,最合适的DNA甲基化数据的估算方法提供了指导。与预期的β值分布相比,MAR值分布平均在中间范围内更密集。结果,MAR值平均难以估算。分析结果为在DNA甲基化水平的不同表示形式和不同的缺失数据机制下,最合适的DNA甲基化数据的估算方法提供了指导。
更新日期:2020-06-29
down
wechat
bug