Identification of Enriched Regions in ChIP-Seq Data via a Linear-Time Multi-Level Thresholding Algorithm,IEEE/ACM Transactions on Computational Biology and Bioinformatics

当前位置： X-MOL 学术 › IEEE/ACM Trans. Comput. Biol. Bioinform. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Identification of Enriched Regions in ChIP-Seq Data via a Linear-Time Multi-Level Thresholding Algorithm
IEEE/ACM Transactions on Computational Biology and Bioinformatics ( IF 4.5 ) Pub Date : 2021-08-16 , DOI: 10.1109/tcbb.2021.3104734
Musab Naik ₁ , Luis Rueda ₁ , Akram Vasighizaker ₁

Affiliation

Chromatin immunoprecipitation (ChIP–Seq) has emerged as a superior alternative to microarray technology as it provides higher resolution, less noise, greater coverage and wider dynamic range. While ChIP-Seq enables probing of DNA-protein interaction over the entire genome, it requires the use of sophisticated tools to recognize hidden patterns and extract meaningful data. Over the years, various attempts have resulted in several algorithms making use of different heuristics to accurately determine individual peaks corresponding to unique DNA-protein. However, finding all the significant peaks with high accuracy in a reasonable time is still a challenge. In this work, we propose the use of Multi-level thresholding algorithm, which we call LinMLTBS, used to identify the enriched regions on ChIP-Seq data. Although various suboptimal heuristics have been proposed for multi-level thresholding, we emphasize on the use of an algorithm capable of obtaining an optimal solution, while maintaining linear-time complexity. Testing various algorithm on various ENCODE project datasets shows that our approach attains higher accuracy relative to previously proposed peak finders while retaining a reasonable processing speed.

中文翻译：

通过线性时间多级阈值算法识别 ChIP-Seq 数据中的富集区域

染色质免疫沉淀 (ChIP-Seq) 已成为微阵列技术的卓越替代品，因为它提供更高的分辨率、更低的噪音、更大的覆盖范围和更宽的动态范围。虽然 ChIP-Seq 能够探测整个基因组中的 DNA-蛋白质相互作用，但它需要使用复杂的工具来识别隐藏模式并提取有意义的数据。多年来，各种尝试已经产生了几种算法，利用不同的启发式方法来准确地确定对应于独特 DNA 蛋白质的各个峰。然而，在合理的时间内以高精度找到所有重要的峰值仍然是一个挑战。在这项工作中，我们建议使用多级阈值算法，我们称之为 LinMLTBS，用于识别 ChIP-Seq 数据上的富集区域。尽管已经针对多级阈值提出了各种次优启发式算法，但我们强调使用能够获得最优解的算法，同时保持线性时间复杂度。在各种 ENCODE 项目数据集上测试各种算法表明，我们的方法相对于先前提出的峰值查找器获得了更高的准确性，同时保持了合理的处理速度。

更新日期：2021-08-16

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>