当前位置: X-MOL 学术Data Min. Knowl. Discov. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Grafting for combinatorial binary model using frequent itemset mining
Data Mining and Knowledge Discovery ( IF 2.8 ) Pub Date : 2019-10-28 , DOI: 10.1007/s10618-019-00657-9
Taito Lee , Shin Matsushima , Kenji Yamanishi

We consider the class of linear predictors over all logical conjunctions of binary attributes, which we refer to as the class of combinatorial binary models (CBMs) in this paper. CBMs are of high knowledge interpretability but naïve learning of them from labeled data requires exponentially high computational cost with respect to the length of the conjunctions. On the other hand, in the case of large-scale datasets, long conjunctions are effective for learning predictors. To overcome this computational difficulty, we propose an algorithm, GRAfting for Binary datasets (GRAB), which efficiently learns CBMs within the \(L_1\)-regularized loss minimization framework. The key idea of GRAB is to adopt weighted frequent itemset mining for the most time-consuming step in the grafting algorithm, which is designed to solve large-scale \(L_1\)-RERM problems by an iterative approach. Furthermore, we experimentally showed that linear predictors of CBMs are effective in terms of prediction accuracy and knowledge discovery.

中文翻译:

使用频繁项集挖掘的组合二进制模型嫁接

我们考虑二进制属性的所有逻辑连接上的线性预测变量的类别,在本文中将其称为组合二进制模型(CBM)的类别。CBM具有很高的知识可解释性,但是从标记数据中幼稚地学习它们就连词的长度而言需要成倍的计算成本。另一方面,在大规模数据集的情况下,长连词对于学习预测变量有效。为克服此计算难题,我们提出了一种算法,即用于二进制数据集的GRAfting(GRAB),该算法可有效地学习\(L_1 \)内的CBM -正规化的损失最小化框架。GRAB的关键思想是在嫁接算法中最耗时的步骤采用加权频繁项集挖掘,该算法旨在通过迭代方法解决大规模\(L_1 \)- RERM问题。此外,我们通过实验表明,CBM的线性预测器在预测准确性和知识发现方面是有效的。
更新日期:2019-10-28
down
wechat
bug