当前位置: X-MOL 学术IEEE/ACM Trans. Comput. Biol. Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Predicting TF-DNA Binding Motifs from ChIP-seq Datasets Using the Bag-Based Classifier Combined With a Multi-Fold Learning Scheme
IEEE/ACM Transactions on Computational Biology and Bioinformatics ( IF 4.5 ) Pub Date : 2020-09-18 , DOI: 10.1109/tcbb.2020.3025007
Qinhu Zhang , Dailun Wang , Kyungsook Han , De-Shuang Huang

The rapid development of high-throughput sequencing technology provides unique opportunities for studying of transcription factor binding sites, but also brings new computational challenges. Recently, a series of discriminative motif discovery (DMD) methods have been proposed and offer promising solutions for addressing these challenges. However, because of the huge computation cost, most of them have to choose approximate schemes that either sacrifice the accuracy of motif representation or tune motif parameter indirectly. In this paper, we propose a bag-based classifier combined with a multi-fold learning scheme (BCMF) to discover motifs from ChIP-seq datasets. First, BCMF formulates input sequences as a labeled bag naturally. Then, a bag-based classifier, combining with a bag feature extracting strategy, is applied to construct the objective function, and a multi-fold learning scheme is used to solve it. Compared with the existing DMD tools, BCMF features three improvements: 1) Learning position weight matrix (PWM) directly in a continuous space; 2) Proposing to represent a positive bag with a feature fused by its k “most positive” patterns. 3) Applying a more advanced learning scheme. The experimental results on 134 ChIP-seq datasets show that BCMF substantially outperforms existing DMD methods (including DREME, HOMER, XXmotif, motifRG, EDCOD and our previous work).

中文翻译:

使用基于袋的分类器结合多重学习方案从 ChIP-seq 数据集中预测 TF-DNA 结合基序

高通量测序技术的快速发展为转录因子结合位点的研究提供了独特的机会,但也带来了新的计算挑战。最近,已经提出了一系列判别基序发现(DMD)方法,并为解决这些挑战提供了有希望的解决方案。然而,由于计算成本巨大,他们中的大多数不得不选择近似方案,要么牺牲motif表示的准确性,要么间接调整motif参数。在本文中,我们提出了一种基于袋子的分类器与多重学习方案 (BCMF) 相结合,以从 ChIP-seq 数据集中发现基序。首先,BCMF 自然地将输入序列公式化为一个带标签的包。然后,一个基于袋子的分类器,结合袋子特征提取策略,用于构建目标函数,并使用多折学习方案来解决它。与现有的 DMD 工具相比,BCMF 具有三个改进: 1)直接在连续空间中学习位置权重矩阵(PWM);2) 提议代表一个正包,其特征融合了它的k“最积极”的模式。3) 应用更高级的学习方案。在 134 个 ChIP-seq 数据集上的实验结果表明,BCMF 大大优于现有的 DMD 方法(包括 DREME、HOMER、XXmotif、motifRG、EDCOD 和我们之前的工作)。
更新日期:2020-09-18
down
wechat
bug