当前位置: X-MOL 学术Math. Biosci. Eng. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Worry less about the algorithm, more about the sequence of events
Mathematical Biosciences and Engineering ( IF 2.6 ) Pub Date : 2020-09-28 , DOI: 10.3934/mbe.2020342
Farrokh Alemi ,

Background: Many algorithms exist for learning network structure and parameters from data. Some of these algorithms include Grow Shrink, Incremental Association Markov Blanket, IAMB, Fast IAMB, and Interleaved IAMB, Hill Climbing, Restricted Maximization and Maximum-Minimum Hill Climbing. These algorithms optimize the fit to the data, while ignoring the order of occurrences of the variables in the network structure. Objective: This paper examines if sequence information (i.e., one variable occurs before another) can make algorithms for learning directed acyclical graph networks more accurate. Methods: A 13- variable network was simulated, where information on sequence of occurrence of some of the variables was assumed to be known. In each simulation 10,000 observations were generated. These observations were used by 4 conditional dependency and 4 search and score algorithms to discover the network from the simulated data. Partial sequence was used to prohibit a directed arc from a later variable to an earlier one. The Area under the Receiver Operating Curve (AROC) was used to compare the accuracy of the sequence-constrained and unconstrained algorithms in predicting the last node in the network. In addition, we examined the performance of sequence constrained algorithms in a real data set. We analyzed 1.3 million disability assessments done on 296,051 residents in Veterans Affairs nursing homes; where the sequence of occurrence of variables was inferred from the average age of occurrence of disabilities. We constructed three networks using Grow-Shrink algorithm, one without and the other two use two permutation of the observed sequence. The fit of these three models to data was examined using Bayesian Information Criterion (BIC). Results: In simulated data, algorithms that used sequenced constraints (AROC = 0.94, confidence intervals, C.I. = 0.86 to 1) were significantly more accurate than the same algorithm without use of sequence constraints (AROC = 0.74, C.I. = 0.65 to 0.83). The agreement between discovered and observed networks improved from range of 0.54 to 0.97 to range of 0.88 to 1. In the real data set, the Bayesian network constructed with use of sequence had 6% lower BIC scores. Conclusions: Sequence information improved accuracy of all eight learning algorithms and should be routinely examined in learning network structure from data.

中文翻译:

少担心算法,多担心事件的顺序

背景技术:存在许多用于从数据中学习网络结构和参数的算法。这些算法中的一些算法包括“增长收缩”,“增量关联马尔可夫毯子”,IAMB,“快速IAMB”和“交错IAMB”,“爬山”,“限制最大化”和“最大最小爬山”。这些算法优化了对数据的拟合,同时忽略了网络结构中变量的出现顺序。目的:研究序列信息(即一个变量先于另一个变量)是否可以使用于学习有向无环图网络的算法更加准确。方法:模拟了一个13变量网络,其中假定某些变量的发生顺序信息是已知的。在每个模拟中,产生了10,000个观察值。这些观察结果被4个条件依赖项和4个搜索和评分算法用来从模拟数据中发现网络。部分序列用于禁止有向弧从较晚的变量到较早的变量。接收器工作曲线下的面积(AROC)用于比较序列约束和无约束算法在预测网络中的最后一个节点时的准确性。此外,我们在实际数据集中检查了序列约束算法的性能。我们分析了在退伍军人事务疗养院对296,051位居民进行的130万残疾评估;从残疾发生的平均年龄推断出变量发生的顺序。我们使用Grow-Shrink算法构造了三个网络,一个不使用网络,另外两个使用观察到的序列的两个置换。使用贝叶斯信息准则(BIC)检查了这三个模型对数据的拟合度。结果:在模拟数据中,使用排序约束的算法(AROC = 0.94,置信区间,CI = 0.86至1)比不使用序列约束的相同算法(AROC = 0.74,CI = 0.65至0.83)的准确性要高得多。发现和观察到的网络之间的一致性从0.54到0.97的范围提高到0.88到1的范围。在实际数据集中,使用序列构建的贝叶斯网络的BIC得分降低了6%。结论:序列信息提高了所有八种学习算法的准确性,在从数据中学习网络结构时应常规检查。
更新日期:2020-09-28
down
wechat
bug