当前位置: X-MOL 学术BMC Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Guidelines for cell-type heterogeneity quantification based on a comparative analysis of reference-free DNA methylation deconvolution software.
BMC Bioinformatics ( IF 3 ) Pub Date : 2020-01-13 , DOI: 10.1186/s12859-019-3307-2
Clémentine Decamps 1 , Florian Privé 1 , Raphael Bacher 1 , Daniel Jost 1 , Arthur Waguet 1 , , Eugene Andres Houseman 2 , Eugene Lurie 3 , Pavlo Lutsik 4 , Aleksandar Milosavljevic 3 , Michael Scherer 5 , Michael G B Blum 1 , Magali Richard 1
Affiliation  

BACKGROUND Cell-type heterogeneity of tumors is a key factor in tumor progression and response to chemotherapy. Tumor cell-type heterogeneity, defined as the proportion of the various cell-types in a tumor, can be inferred from DNA methylation of surgical specimens. However, confounding factors known to associate with methylation values, such as age and sex, complicate accurate inference of cell-type proportions. While reference-free algorithms have been developed to infer cell-type proportions from DNA methylation, a comparative evaluation of the performance of these methods is still lacking. RESULTS Here we use simulations to evaluate several computational pipelines based on the software packages MeDeCom, EDec, and RefFreeEWAS. We identify that accounting for confounders, feature selection, and the choice of the number of estimated cell types are critical steps for inferring cell-type proportions. We find that removal of methylation probes which are correlated with confounder variables reduces the error of inference by 30-35%, and that selection of cell-type informative probes has similar effect. We show that Cattell's rule based on the scree plot is a powerful tool to determine the number of cell-types. Once the pre-processing steps are achieved, the three deconvolution methods provide comparable results. We observe that all the algorithms' performance improves when inter-sample variation of cell-type proportions is large or when the number of available samples is large. We find that under specific circumstances the methods are sensitive to the initialization method, suggesting that averaging different solutions or optimizing initialization is an avenue for future research. CONCLUSION Based on the lessons learned, to facilitate pipeline validation and catalyze further pipeline improvement by the community, we develop a benchmark pipeline for inference of cell-type proportions and implement it in the R package medepir.

中文翻译:

基于无参​​考DNA甲基化反卷积软件的比较分析的细胞类型异质性定量指南。

背景技术肿瘤的细胞类型异质性是肿瘤进展和对化学疗法反应的关键因素。肿瘤细胞类型的异质性定义为肿瘤中各种细胞类型的比例,可以从手术标本的DNA甲基化推断出来。但是,已知与甲基化值相关的混杂因素,例如年龄和性别,使得对细胞类型比例的准确推断变得复杂。尽管已经开发了无参考算法来从DNA甲基化推断细胞类型的比例,但是仍然缺乏对这些方法的性能的比较评估。结果在这里,我们使用仿真来评估基于MeDeCom,EDec和RefFreeEWAS软件包的多个计算管道。我们确定混杂因素,功能选择,选择估计的细胞类型是推断细胞类型比例的关键步骤。我们发现删除与混杂变量相关的甲基化探针可将推理误差降低30-35%,并且选择细胞型信息探针具有相似的作用。我们证明了基于卵石图的卡特尔定律是确定细胞类型数量的有力工具。一旦完成了预处理步骤,则三种反卷积方法将提供可比较的结果。我们观察到,当单元格类型比例的样本间差异较大或可用样本数较大时,所有算法的性能都会提高。我们发现,在特定情况下,这些方法对初始化方法很敏感,表明平均不同的解决方案或优化初始化是未来研究的途径。结论基于所汲取的经验教训,为了促进管道验证并促进社区对管道的进一步改进,我们开发了用于推断细胞类型比例的基准管道,并在R包中进行了应用。
更新日期:2020-01-13
down
wechat
bug