当前位置: X-MOL 学术IEEE/ACM Trans. Comput. Biol. Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
An Approach of Epistasis Detection Using Integer Linear Programming Optimizing Bayesian Network
IEEE/ACM Transactions on Computational Biology and Bioinformatics ( IF 3.6 ) Pub Date : 2021-06-28 , DOI: 10.1109/tcbb.2021.3092719
Xuan Yang 1 , Chen Yang 1 , Jimeng Lei 1 , Jianxiao Liu 1
Affiliation  

Proposing a more effective and accurate epistatic loci detection method in large-scale genomic data has important research significance for improving crop quality, disease treatment, etc. Due to the characteristics of high accuracy and processing non-linear relationship, Bayesian network (BN) has been widely used in constructing the network of SNPs and phenotype traits and thus to mine epistatic loci. However, the shortcoming of BN is that it is easy to fall into local optimum and unable to process large-scale of SNPs. In this work, we transform the problem of learning Bayesian network into the optimization of integer linear programming (ILP). We use the algorithms of branch-and-bound and cutting planes to get the global optimal Bayesian network (ILPBN), and thus to get epistatic loci influencing specific phenotype traits. In order to handle large-scale of SNP loci and further to improve efficiency, we use the method of optimizing Markov blanket to reduce the number of candidate parent nodes for each node. In addition, we use α-BIC that is suitable for processing the epistatis mining to calculate the BN score. We use four properties of BN decomposable scoring functions to further reduce the number of candidate parent sets for each node. Experiment results show that ILPBN can not only process 2-locus and 3-locus epistasis mining, but also realize multi-locus epistasis detection. Finally, we compare ILPBN with several popular epistasis mining algorithms by using simulated and real Age-related macular disease (AMD) dataset. Experiment results show that ILPBN has better epistasis detection accuracy, F1-score and false positive rate in premise of ensuring the efficiency compared with other methods. Availability: Codes and dataset are available at: http://122.205.95.139/ILPBN/.

中文翻译:


一种利用整数线性规划优化贝叶斯网络的上位检测方法



提出一种在大规模基因组数据中更有效、更准确的上位位点检测方法,对于提高作物品质、疾病治疗等具有重要的研究意义。由于贝叶斯网络(BN)具有精度高、处理非线性关系的特点,被广泛用于构建 SNP 和表型性状网络,从而挖掘上位基因座。但BN的缺点是容易陷入局部最优,无法处理大规模的SNP。在这项工作中,我们将贝叶斯网络的学习问题转化为整数线性规划(ILP)的优化问题。我们利用分支定界和割平面算法得到全局最优贝叶斯网络(ILPBN),从而得到影响特定表型性状的上位位点。为了处理大规模的SNP位点并进一步提高效率,我们采用优化马尔可夫毯子的方法来减少每个节点的候选父节点的数量。此外,我们使用适合处理上位挖掘的α-BIC来计算BN分数。我们使用 BN 可分解评分函数的四个属性来进一步减少每个节点的候选父集的数量。实验结果表明,ILPBN不仅可以处理2位点和3位点上位挖掘,还可以实现多位点上位检测。最后,我们通过使用模拟和真实的年龄相关性黄斑疾病(AMD)数据集将 ILPBN 与几种流行的上位挖掘算法进行比较。实验结果表明,与其他方法相比,ILPBN在保证效率的前提下具有更好的上位检测精度、F1分数和误报率。可用性:代码和数据集可从以下网址获取:http://122.205.95。139/ILPBN/。
更新日期:2021-06-28
down
wechat
bug