当前位置: X-MOL 学术Algorithms Mol. Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Detecting transcriptomic structural variants in heterogeneous contexts via the Multiple Compatible Arrangements Problem.
Algorithms for Molecular Biology ( IF 1 ) Pub Date : 2020-05-15 , DOI: 10.1186/s13015-020-00170-5
Yutong Qiu 1 , Cong Ma 1 , Han Xie 1 , Carl Kingsford 1
Affiliation  

Transcriptomic structural variants (TSVs)—large-scale transcriptome sequence change due to structural variation - are common in cancer. TSV detection from high-throughput sequencing data is a computationally challenging problem. Among all the confounding factors, sample heterogeneity, where each sample contains multiple distinct alleles, poses a critical obstacle to accurate TSV prediction. To improve TSV detection in heterogeneous RNA-seq samples, we introduce the Multiple Compatible Arrangements Problem (MCAP), which seeks k genome arrangements that maximize the number of reads that are concordant with at least one arrangement. This models a heterogeneous or diploid sample. We prove that MCAP is NP-complete and provide a $$\frac{1}{4}$$-approximation algorithm for $$k=1$$ and a $$\frac{3}{4}$$-approximation algorithm for the diploid case ($$k=2$$) assuming an oracle for $$k=1$$. Combining these, we obtain a $$\frac{3}{16}$$-approximation algorithm for MCAP when $$k=2$$ (without an oracle). We also present an integer linear programming formulation for general k. We characterize the conflict structures in the graph that require $$k>1$$ alleles to satisfy read concordancy and show that such structures are prevalent. We show that the solution to MCAP accurately addresses sample heterogeneity during TSV detection. Our algorithms have improved performance on TCGA cancer samples and cancer cell line samples compared to a TSV calling tool, SQUID. The software is available at https://github.com/Kingsford-Group/diploidsquid.

中文翻译:

通过多重兼容排列问题检测异质环境中的转录组结构变异。

转录组结构变异(TSV)——由于结构变异导致的大规模转录组序列变化——在癌症中很常见。从高通量测序数据中检测 TSV 是一个计算上具有挑战性的问题。在所有混杂因素中,样本异质性(每个样本包含多个不同的等位基因)对 TSV 的准确预测构成了关键障碍。为了改善异质 RNA-seq 样本中的 TSV 检测,我们引入了多重兼容排列问题 (MCAP),该问题寻求 k 个基因组排列,以最大化与至少一种排列一致的读数数量。这对异质或二倍体样本进行建模。我们证明 MCAP 是 NP 完全的,并为 $$k=1$$ 提供 $$\frac{1}{4}$$ 近似算法和 $$\frac{3}{4}$$ 近似算法二倍体情况 ($$k=2$$) 的算法假设 $$k=1$$ 的预言。结合这些,当 $$k=2$$(没有预言机)时,我们获得了 MCAP 的 $$\frac{3}{16}$$ 近似算法。我们还提出了一般 k 的整数线性规划公式。我们描述了图中需要 $$k>1$$ 等位基因才能满足读取一致性的冲突结构,并表明这种结构很普遍。我们证明 MCAP 解决方案可以准确地解决 TSV 检测过程中的样本异质性。与 TSV 调用工具 SQUID 相比,我们的算法提高了 TCGA 癌症样本和癌细胞系样本的性能。该软件可从 https://github.com/Kingsford-Group/diploidsquid 获取。
更新日期:2020-05-15
down
wechat
bug