当前位置: X-MOL 学术Genome Res. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
PhISCS: a combinatorial approach for subperfect tumor phylogeny reconstruction via integrative use of single-cell and bulk sequencing data.
Genome Research ( IF 6.2 ) Pub Date : 2019-10-18 , DOI: 10.1101/gr.234435.118
Salem Malikic 1 , Farid Rashidi Mehrabadi 2, 3 , Simone Ciccolella 4, 5 , Md Khaledur Rahman 2 , Camir Ricketts 5, 6 , Ehsan Haghshenas 1 , Daniel Seidman 6 , Faraz Hach 1, 7, 8 , Iman Hajirasouliha 5, 9 , S Cenk Sahinalp 3
Affiliation  

Available computational methods for tumor phylogeny inference via single-cell sequencing (SCS) data typically aim to identify the most likely perfect phylogeny tree satisfying the infinite sites assumption (ISA). However, the limitations of SCS technologies including frequent allele dropout and variable sequence coverage may prohibit a perfect phylogeny. In addition, ISA violations are commonly observed in tumor phylogenies due to the loss of heterozygosity, deletions, and convergent evolution. In order to address such limitations, we introduce the optimal subperfect phylogeny problem which asks to integrate SCS data with matching bulk sequencing data by minimizing a linear combination of potential false negatives (due to allele dropout or variance in sequence coverage), false positives (due to read errors) among mutation calls, and the number of mutations that violate ISA (real or because of incorrect copy number estimation). We then describe a combinatorial formulation to solve this problem which ensures that several lineage constraints imposed by the use of variant allele frequencies (VAFs, derived from bulk sequence data) are satisfied. We express our formulation both in the form of an integer linear program (ILP) and-as a first in tumor phylogeny reconstruction-a Boolean constraint satisfaction problem (CSP) and solve them by leveraging state-of-the-art ILP/CSP solvers. The resulting method, which we name PhISCS, is the first to integrate SCS and bulk sequencing data while accounting for ISA violating mutations. In contrast to the alternative methods, typically based on probabilistic approaches, PhISCS provides a guarantee of optimality in reported solutions. Using simulated and real data sets, we demonstrate that PhISCS is more general and accurate than all available approaches.

中文翻译:

PhISCS:通过整合使用单细胞和大量测序数据来进行次完美肿瘤系统发育重建的组合方法。

通过单细胞测序(SCS)数据进行肿瘤系统发育推断的可用计算方法通常旨在确定满足无限位点假设(ISA)的最可能的理想系统发育树。但是,SCS技术的局限性包括频繁的等位基因缺失和可变的序列覆盖可能会阻碍系统发育。另外,由于杂合性的丧失,缺失和趋同的进化,通常在肿瘤系统发育中观察到违反ISA的情况。为了解决这些限制,我们引入了最佳的亚完美系统发育问题,该问题要求通过最小化潜在的假阴性(由于等位基因缺失或序列覆盖范围的变化),假阳性(由于以读取错误)以及违反ISA的突变数(真实的或由于错误的拷贝数估计而导致的变异)。然后,我们描述了一个组合公式来解决此问题,该问题可确保满足使用变体等位基因频率(VAF,从批量序列数据中得出)而施加的几个谱系约束。我们以整数线性程序(ILP)的形式以及作为肿瘤系统发育重建的第一个布尔约束满足问题(CSP)来表达我们的制剂,并利用最新的ILP / CSP求解器解决它们。由此产生的方法(我们称为PhISCS)是第一个将SCS和批量测序数据整合在一起,同时解决ISA违反突变的方法。与通常基于概率方法的替代方法相反,PhISCS提供了所报告解决方案最优性的保证。
更新日期:2019-11-01
down
wechat
bug