当前位置: X-MOL 学术Curr. Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
An Information Gain-based Method for Evaluating the Classification Power of Features Towards Identifying Enhancers
Current Bioinformatics ( IF 4 ) Pub Date : 2020-06-30 , DOI: 10.2174/1574893614666191120141032
Tianjiao Zhang 1 , Rongjie Wang 1 , Qinghua Jiang 2 , Yadong Wang 1
Affiliation  

Background: Enhancers are cis-regulatory elements that enhance gene expression on DNA sequences. Since most of enhancers are located far from transcription start sites, it is difficult to identify them. As other regulatory elements, the regions around enhancers contain a variety of features, which can help in enhancer recognition.

Objective: The classification power of features differs significantly, the performances of existing methods that use one or a few features for identifying enhancer vary greatly. Therefore, evaluating the classification power of each feature can improve the predictive performance of enhancers.

Methods: We present an evaluation method based on Information Gain (IG) that captures the entropy change of enhancer recognition according to features. To validate the performance of our method, experiments using the Single Feature Prediction Accuracy (SFPA) were conducted on each feature.

Results: The average IG values of the sequence feature, transcriptional feature and epigenetic feature are 0.068, 0.213, and 0.299, respectively. Through SFPA, the average AUC values of the sequence feature, transcriptional feature and epigenetic feature are 0.534, 0.605, and 0.647, respectively. The verification results are consistent with our evaluation results.

Conclusion: This IG-based method can effectively evaluate the classification power of features for identifying enhancers. Compared with sequence features, epigenetic features are more effective for recognizing enhancers.



中文翻译:

基于信息增益的特征对增强子识别能力的评估方法

背景:增强子是顺式调节元件,可增强DNA序列上的基因表达。由于大多数增强子都位于远离转录起始位点的地方,因此很难识别它们。作为其他调控元件,增强子周围的区域包含多种功能,可帮助增强子识别。

目的:特征的分类能力差异很大,使用一种或几种特征识别增强子的现有方法的性能差异很大。因此,评估每个功能的分类能力可以提高增强子的预测性能。

方法:我们提出了一种基于信息增益(IG)的评估方法,该方法根据特征捕获增强子识别的熵变化。为了验证我们方法的性能,对每个特征使用单特征预测准确性(SFPA)进行了实验。

结果:序列特征,转录特征和表观遗传特征的平均IG值分别为0.068、0.213和0.299。通过SFPA,序列特征,转录特征和表观遗传特征的平均AUC值分别为0.534、0.605和0.647。验证结果与我们的评估结果一致。

结论:这种基于IG的方法可以有效地评估用于识别增强子的特征的分类能力。与序列特征相比,表观遗传特征对于识别增强子更有效。

更新日期:2020-06-30
down
wechat
bug