PopInf: An Approach for Reproducibly Visualizing and Assigning Population Affiliation in Genomic Samples of Uncertain Origin,Journal of Computational Biology

当前位置： X-MOL 学术 › J. Comput. Biol. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

PopInf: An Approach for Reproducibly Visualizing and Assigning Population Affiliation in Genomic Samples of Uncertain Origin
Journal of Computational Biology ( IF 1.4 ) Pub Date : 2021-03-04 , DOI: 10.1089/cmb.2019.0434
Angela M Taravella Oill ₁ , Anagha J Deshpande ₁ , Heini M Natri ₁ , Melissa A Wilson ₁

Affiliation

Germline genetic variation contributes to cancer etiology, but self-reported race is not always consistent with genetic ancestry, and samples may not have identifying ancestry information. In this study, we describe a flexible computational pipeline, PopInf, to visualize principal component analysis output and assign ancestry to samples with unknown genetic ancestry, given a reference population panel of known origins. PopInf is implemented as a reproducible workflow in Snakemake with a tutorial on GitHub. We provide a preprocessed reference population panel that can be quickly and efficiently implemented in cancer genetics studies. We ran PopInf on The Cancer Genome Atlas (TCGA) liver cancer data and identify discrepancies between reported race and inferred genetic ancestry. The PopInf pipeline facilitates visualization and identification of genetic ancestry across samples, so that this ancestry can be accounted for in studies of disease risk.

中文翻译：

PopInf：一种在来源不明的基因组样本中可重复地可视化和分配种群隶属关系的方法

种系遗传变异有助于癌症病因学，但自我报告的种族并不总是与遗传血统一致，并且样本可能没有识别血统信息。在这项研究中，我们描述了一个灵活的计算管道 PopInf，以可视化主成分分析输出并将祖先分配给具有未知遗传祖先的样本，给定一个已知来源的参考群体面板。PopInf 在 Snakemake 中实现为可重现的工作流程，并在 GitHub 上提供了教程。我们提供了一个预处理的参考人群面板，可以在癌症遗传学研究中快速有效地实施。我们在癌症基因组图谱 (TCGA) 肝癌数据上运行 PopInf，并确定报告的种族和推断的遗传血统之间的差异。

更新日期：2021-03-05

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11