Abstract
The DNA methyltransferases (DNMTs) (DNMT3A, DNMT3B, and DNMT3L) are primarily responsible for the establishment of genomic locus-specific DNA methylation patterns, which play an important role in gene regulation and animal development. However, this important protein family’s binding mechanism, i.e., how and where the DNMTs bind to genome, is still missing in most tissues and cell lines. This motivates us to explore DNMTs and TF’s cooperation and develop a network regularized logistic regression model, GuidingNet, to predict DNMTs’ genome-wide binding by integrating gene expression, chromatin accessibility, sequence, and protein-protein interaction data. GuidingNet accurately predicted methylation experimental data validated DNMTs’ binding, outperformed single data source based method and sparsity regularized methods, and performed well in within and across tissue prediction for several DNMTs in both human and mouse. Importantly, GuidingNet can reveal transcription co-factors assisting DNMTs for methylation establishment. This provides biological understanding in the DNMTs’ binding specificity in different tissues and demonstrate the advantage of network regularization. In addition, GuidingNet achieves good performance for chromatin regulators’ binding other than DNMTs and serves as a useful method for studying chromatin regulator binding and function. The GuidingNet is freely available at https://github.com/AMSSwanglab/GuidingNet.
Author summary DNA methyltransferases (DNMTs) are in charge of the addition of methyl groups to cytosine residues by binding to DNA specific region. However, DNMTs do not have DNA binding domains to recognize specific DNA sequences and an urging question is how DNMTs recognize their binding sites in the genome in different tissues. Here, we propose a network regularized logistic regression model, GuidingNet, for predicting DNMT’ genome-wide binding by integrating gene expression, chromatin accessibility, sequence, and protein-protein interaction data. The main contribution is to hypothesize that DNMTs interact with transcription factors and are guided by the TF network to bind to DNA, and methylate DNA GuidingNet captures the mechanism of DNMT binding in different tissue contexts and predict DNMTs’ binding well.