Towards a Flexible Embedding Learning Framework,arXiv - CS - Information Retrieval

当前位置： X-MOL 学术 › arXiv.cs.IR › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Towards a Flexible Embedding Learning Framework
arXiv - CS - Information Retrieval Pub Date : 2020-09-23 , DOI: arxiv-2009.10989
Chin-Chia Michael Yeh, Dhruv Gelda, Zhongfang Zhuang, Yan Zheng, Liang Gou, Wei Zhang

Representation learning is a fundamental building block for analyzing entities in a database. While the existing embedding learning methods are effective in various data mining problems, their applicability is often limited because these methods have pre-determined assumptions on the type of semantics captured by the learned embeddings, and the assumptions may not well align with specific downstream tasks. In this work, we propose an embedding learning framework that 1) uses an input format that is agnostic to input data type, 2) is flexible in terms of the relationships that can be embedded into the learned representations, and 3) provides an intuitive pathway to incorporate domain knowledge into the embedding learning process. Our proposed framework utilizes a set of entity-relation-matrices as the input, which quantifies the affinities among different entities in the database. Moreover, a sampling mechanism is carefully designed to establish a direct connection between the input and the information captured by the output embeddings. To complete the representation learning toolbox, we also outline a simple yet effective post-processing technique to properly visualize the learned embeddings. Our empirical results demonstrate that the proposed framework, in conjunction with a set of relevant entity-relation-matrices, outperforms the existing state-of-the-art approaches in various data mining tasks.

中文翻译：

迈向灵活的嵌入学习框架

表示学习是分析数据库中实体的基本构建块。虽然现有的嵌入学习方法在各种数据挖掘问题中是有效的，但它们的适用性通常是有限的，因为这些方法对学习到的嵌入所捕获的语义类型有预先确定的假设，并且这些假设可能与特定的下游任务不太吻合。在这项工作中，我们提出了一个嵌入学习框架，它 1) 使用与输入数据类型无关的输入格式，2) 在可以嵌入到学习表示中的关系方面是灵活的，3) 提供了一个直观的途径将领域知识纳入嵌入学习过程。我们提出的框架利用一组实体关系矩阵作为输入，它量化了数据库中不同实体之间的亲和力。此外，采样机制经过精心设计，以在输入和输出嵌入捕获的信息之间建立直接连接。为了完成表征学习工具箱，我们还概述了一种简单而有效的后处理技术，以正确地可视化学习到的嵌入。我们的实证结果表明，所提出的框架与一组相关的实体关系矩阵相结合，在各种数据挖掘任务中优于现有的最先进方法。我们还概述了一种简单而有效的后处理技术，以正确地可视化学习到的嵌入。我们的实证结果表明，所提出的框架与一组相关的实体关系矩阵相结合，在各种数据挖掘任务中优于现有的最先进方法。我们还概述了一种简单而有效的后处理技术，以正确地可视化学习到的嵌入。我们的实证结果表明，所提出的框架与一组相关的实体关系矩阵相结合，在各种数据挖掘任务中优于现有的最先进方法。

更新日期：2020-09-24

点击分享查看原文

点击收藏

阅读更多本刊最新论文