当前位置: X-MOL 学术J. Comput. Syst. Sci. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Regularizing conjunctive features for classification
Journal of Computer and System Sciences ( IF 1.1 ) Pub Date : 2021-02-17 , DOI: 10.1016/j.jcss.2021.01.003
Pablo Barceló , Alexander Baumgartner , Victor Dalmau , Benny Kimelfeld

We consider the feature-generation task wherein we are given a database with entities labeled as positive and negative examples, and we want to find feature queries that linearly separate the two sets of examples. We focus on conjunctive feature queries, and explore two problems: (a) deciding if separating feature queries exist (separability), and (b) generating such queries when they exist. To restrict the complexity of the generated classifiers, we explore various ways of regularizing them by limiting their dimension, the number of joins in feature queries, and their generalized hypertreewidth (ghw). We show that the separability problem is tractable for bounded ghw; yet, the generation problem is not because feature queries might be too large. So, we explore a third problem: classifying new entities without necessarily generating the feature queries. Interestingly, in the case of bounded ghw we can efficiently classify without explicitly generating such queries.



中文翻译:

正则化联合特征以进行分类

我们考虑特征生成任务,在该任务中,我们获得了一个带有标记为正例和负例的实体的数据库,并且我们希望找到线性分离两组实例的特征查询。我们关注于联合特征查询,并探讨两个问题:(a)确定是否存在分离的特征查询(可分离性),以及(b)在存在这些查询时生成此类查询。为了限制生成的分类器的复杂性,我们通过限制其维数,特征查询中的联接数以及它们的广义超树宽度(ghw),探索了各种规范化它们的方法。我们证明了有界ghw的可分离性问题是可解决的。但是,生成问题并不是因为要素查询可能太大。因此,我们探讨了第三个问题:对新实体进行分类,而不必生成特征查询。有趣的是,在ghw受限的情况下,我们可以有效地进行分类,而无需显式生成此类查询。

更新日期:2021-02-19
down
wechat
bug