当前位置: X-MOL 学术bioRxiv. Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
DNA methylation-based sex classifier to predict sex and identify sex chromosome aneuploidy
bioRxiv - Bioinformatics Pub Date : 2020-10-19 , DOI: 10.1101/2020.10.19.345090
Yucheng Wang , Eilis Hannon , Olivia A Grant , Tyler J Gorrie-Stone , Meena Kumari , Jonathan Mill , Xiaojun Zhai , Klaus D McDonald-Maier , Leonard C Schalkwyk

Sex is an important covariate of epigenome-wide association studies due to its strong influence on DNA methylation patterns across numerous genomic positions. Nevertheless, many samples on the Gene Expression Omnibus (GEO) frequently lack a sex annotation or are incorrectly labelled. Considering the influence that sex imposes on DNA methylation patterns, it is necessary to ensure that methods for filtering poor samples and checking of sex assignment are accurate and widely applicable. In this paper, we presented a novel method to predict sex using only DNA methylation density signals, which can be readily applied to almost all DNA methylation datasets of different formats (raw IDATs or text files with only density signals) uploaded to GEO. We identified 4345 significantly (p < 0.01) sex-associated CpG sites present on both 450K and EPIC arrays, and constructed a sex classifier based on the two first components of PCAs from the two sex chromosomes. The proposed method is constructed using whole blood samples and exhibits good performance across a wide range of tissues. We further demonstrated that our method can be used to identify samples with sex chromosome aneuploidy, this function is validated by five Turner syndrome cases and one Klinefelter syndrome case. The proposed method has been integrated into the wateRmelon Bioconductor package.

中文翻译:

基于DNA甲基化的性别分类器可预测性别并鉴定性染色体非整倍性

性别是表观基因组范围内关联研究的重要协变量,因为它对众多基因组位置的DNA甲基化模式有很强的影响。但是,基因表达综合(GEO)上的许多样品经常缺少性别注释或标签不正确。考虑到性别对DNA甲基化模式的影响,有必要确保过滤不良样本和检查性别分配的方法准确且广泛适用。在本文中,我们提出了一种仅使用DNA甲基化密度信号预测性别的新颖方法,该方法可以轻松应用于几乎所有上载到GEO的不同格式的DNA甲基化数据集(原始IDAT或仅包含密度信号的文本文件)。我们在450K和EPIC阵列上发现了4345个显着(p <0.01)性别相关的CpG位点,并根据来自两个性染色体的PCA的前两个成分,构建了一个性别分类器。所提出的方法是使用全血样本构建的,并且在广泛的组织中显示出良好的性能。我们进一步证明了我们的方法可用于鉴定具有性染色体非整倍性的样品,该功能已通过5例Turner综合征和1例Klinefelter综合征得到验证。拟议的方法已被整合到wateRmelon生物导体包装中。5例特纳综合征和1例Klinefelter综合征证实了这一功能。拟议的方法已被整合到wateRmelon生物导体包装中。5例特纳综合征和1例Klinefelter综合征证实了这一功能。拟议的方法已被整合到wateRmelon生物导体包装中。
更新日期:2020-10-20
down
wechat
bug