当前位置: X-MOL 学术Mach. Learn. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Propositionalization and embeddings: two sides of the same coin
Machine Learning ( IF 4.3 ) Pub Date : 2020-06-28 , DOI: 10.1007/s10994-020-05890-8
Nada Lavrač 1, 2 , Blaž Škrlj 3 , Marko Robnik-Šikonja 4
Affiliation  

Data preprocessing is an important component of machine learning pipelines, which requires ample time and resources. An integral part of preprocessing is data transformation into the format required by a given learning algorithm. This paper outlines some of the modern data processing techniques used in relational learning that enable data fusion from different input data types and formats into a single table data representation, focusing on the propositionalization and embedding data transformation approaches. While both approaches aim at transforming data into tabular data format, they use different terminology and task definitions, are perceived to address different goals, and are used in different contexts. This paper contributes a unifying framework that allows for improved understanding of these two data transformation techniques by presenting their unified definitions, and by explaining the similarities and differences between the two approaches as variants of a unified complex data transformation task. In addition to the unifying framework, the novelty of this paper is a unifying methodology combining propositionalization and embeddings, which benefits from the advantages of both in solving complex data transformation and learning tasks. We present two efficient implementations of the unifying methodology: an instance-based PropDRM approach, and a feature-based PropStar approach to data transformation and learning, together with their empirical evaluation on several relational problems. The results show that the new algorithms can outperform existing relational learners and can solve much larger problems. N. Lavrač Jožef Stefan Institute, Jamova 39, 1000 Ljubljana, Slovenia University of Nova Gorica, Glavni trg 8, 5271 Vipava, Slovenia E-mail: nada.lavrac@ijs.si B. Škrlj International Postgraduate School Jožef Stefan, and Jožef Stefan Institute, Jamova 39, 1000 Ljubljana, Slovenia E-mail: blaz.skrlj@ijs.si M. Robnik-Šikonja University of Ljubljana, Faculty of Computer and Information Science, Večna pot 113, 1000 Ljubljana, Slovenia E-mail: marko.robnik@fri.uni-lj.si ar X iv :2 00 6. 04 41 0v 1 [ cs .L G ] 8 J un 2 02 0 2 Nada Lavrač et al.

中文翻译:

命题化和嵌入:同一枚硬币的两个方面

数据预处理是机器学习管道的重要组成部分,需要充足的时间和资源。预处理的一个组成部分是将数据转换为给定学习算法所需的格式。本文概述了关系学习中使用的一些现代数据处理技术,这些技术可以将不同输入数据类型和格式的数据融合到单个表数据表示中,重点介绍命题化和嵌入数据转换方法。虽然这两种方法都旨在将数据转换为表格数据格式,但它们使用不同的术语和任务定义,被认为解决不同的目标,并在不同的上下文中使用。本文提供了一个统一的框架,通过提出它们的统一定义,并通过将两种方法之间的异同解释为统一复杂数据转换任务的变体,从而提高对这两种数据转换技术的理解。除了统一框架之外,本文的新颖之处在于结合命题化和嵌入的统一方法,它受益于两者在解决复杂数据转换和学习任务方面的优势。我们提出了统一方法的两种有效实现:基于实例的 PropDRM 方法和基于特征的 PropStar 数据转换和学习方法,以及它们对几个关系问题的实证评估。结果表明,新算法可以胜过现有的关系学习器,并且可以解决更大的问题。N. Lavrač Jožef Stefan Institute, Jamova 39, 1000 Ljubljana, Slovenia University of Nova Gorica, Glavni trg 8, 5271 Vipava, Slovenia E-mail: nada.lavrac@ijs.si B. Škrlj International Postgraduate School Jožef Stefan, and Jožef Stefan Institute, Jamova 39, 1000 Ljubljana, Slovenia 电子邮件:blaz.skrlj@ijs.si M. Robnik-Šikonja 卢布尔雅那大学,计算机和信息科学学院,Večna pot 113, 1000 Ljubljana, Slovenia 电子邮件:marko。 robnik@fri.uni-lj.si ar X iv :2 00 6. 04 41 0v 1 [ cs .LG ] 8 Jun 2 02 0 2 Nada Lavrač 等人。
更新日期:2020-06-28
down
wechat
bug