当前位置: X-MOL 学术Bioinformatics › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
MTTFsite: cross-cell type TF binding site prediction by using multi-task learning
Bioinformatics ( IF 4.4 ) Pub Date : 2019-06-04 , DOI: 10.1093/bioinformatics/btz451
Jiyun Zhou 1, 2 , Qin Lu 2 , Lin Gui 3 , Ruifeng Xu 1 , Yunfei Long 2 , Hongpeng Wang 1
Affiliation  

Motivation
The prediction of transcription factor binding sites (TFBSs) is crucial for gene expression analysis. Supervised learning approaches for TFBS predictions require large amounts of labeled data. However, many TFs of certain cell types either do not have sufficient labeled data or do not have any labeled data.
Results
In this paper, a multi-task learning framework (called MTTFsite) is proposed to address the lack of labeled data problem by leveraging on labeled data available in cross-cell types. The proposed MTTFsite contains a shared CNN to learn common features for all cell types and a private CNN for each cell type to learn private features. The common features are aimed to help predicting TFBSs for all cell types especially those cell types that lack labeled data. MTTFsite is evaluated on 241 cell type TF pairs and compared with a baseline method without using any multi-task learning model and a fully shared multi-task model that uses only a shared CNN and do not use private CNNs. For cell types with insufficient labeled data, results show that MTTFsite performs better than the baseline method and the fully shared model on more than 89% pairs. For cell types without any labeled data, MTTFsite outperforms the baseline method and the fully shared model by more than 80 and 93% pairs, respectively. A novel gene expression prediction method (called TFChrome) using both MTTFsite and histone modification features is also presented. Results show that TFBSs predicted by MTTFsite alone can achieve good performance. When MTTFsite is combined with histone modification features, a significant 5.7% performance improvement is obtained.
Availability and implementation
The resource and executable code are freely available at http://hlt.hitsz.edu.cn/MTTFsite/ and http://www.hitsz-hlt.com:8080/MTTFsite/.
Supplementary information
Supplementary dataSupplementary data are available at Bioinformatics online.


中文翻译:

MTTFsite:使用多任务学习的跨细胞类型TF结合位点预测

动机
转录因子结合位点(TFBSs)的预测对于基因表达分析至关重要。TFBS预测的监督学习方法需要大量的标记数据。但是,某些单元格类型的许多TF没有足够的标记数据或没有任何标记数据。
结果
在本文中,提出了一种多任务学习框架(称为MTTFsite),以利用跨单元格类型中可用的标记数据来解决标记数据不足的问题。拟议的MTTFsite包含一个共享的CNN,用于学习所有小区类型的通用功能;一个私有CNN,用于每种小区类型,以学习专用功能。共同特征旨在帮助预测所有细胞类型的TFBS,尤其是那些缺少标记数据的细胞类型。根据241个单元格类型TF对对MTTFsite进行评估,并与不使用任何多任务学习模型的基线方法进行比较,以及仅使用共享CNN而不使用私有CNN的完全共享的多任务模型。对于标记数据不足的细胞类型,结果表明,MTTFsite的性能优于基线方法,并且在89%以上的对上具有完全共享的模型。对于没有任何标记数据的细胞类型,MTTFsite比基线方法和完全共享模型的性能分别高80%和93%以上。还提出了同时使用MTTFsite和组蛋白修饰功能的新型基因表达预测方法(称为TFChrome)。结果表明,仅由MTTFsite预测的TFBS可以达到良好的性能。当MTTFsite与组蛋白修饰功能结合使用时,可显着提高5.7%的性能。
可用性和实施
该资源和可执行代码可从http://hlt.hitsz.edu.cn/MTTFsite/和http://www.hitsz-hlt.com:8080/MTTFsite/免费获得。
补充资料
补充数据补充数据可从Bioinformatics在线获得。
更新日期:2020-01-13
down
wechat
bug