当前位置: X-MOL 学术New Gener. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
EmbNum+: Effective, Efficient, and Robust Semantic Labeling for Numerical Values
New Generation Computing ( IF 2.0 ) Pub Date : 2019-11-04 , DOI: 10.1007/s00354-019-00076-w
Phuc Nguyen , Khai Nguyen , Ryutaro Ichise , Hideaki Takeda

In recent years, there has been an increasing interest in numerical semantic labeling, in which the meaning of an unknown numerical column is assigned by the label of the most relevant columns in predefined knowledge bases. Previous methods used the p value of a statistical hypothesis test to estimate the relevance and thus strongly depend on the distribution and data domain. In other words, they are unstable for general cases, when such knowledge is undefined. Our goal is solving semantic labeling without using such information while guaranteeing high accuracy. We propose EmbNum+, a neural numerical embedding for learning both discriminant representations and a similarity metric from numerical columns. EmbNum+ maps lists of numerical values of columns into feature vectors in an embedding space, and a similarity metric can be calculated directly on these feature vectors. Evaluations on many datasets of various domains confirmed that EmbNum+ consistently outperformed other state-of-the-art approaches in terms of accuracy. The compact embedding representations also made EmbNum+ significantly faster than others and enable large-scale semantic labeling. Furthermore, attribute augmentation can be used to enhance the robustness and unlock the portability of EmbNum+, making it possible to be trained on one domain but applicable to many different domains.

中文翻译:

EmbNum+:有效、高效且稳健的数值语义标签

近年来,人们对数值语义标签越来越感兴趣,其中未知数值列的含义由预定义知识库中最相关列的标签分配。以前的方法使用统计假设检验的 p 值来估计相关性,因此强烈依赖于分布和数据域。换句话说,当这些知识未定义时,它们在一般情况下是不稳定的。我们的目标是在不使用此类信息的情况下解决语义标签问题,同时保证高精度。我们提出了 EmbNum+,这是一种神经数值嵌入,用于从数值列中学习判别式表示和相似性度量。EmbNum+ 将列的数值列表映射到嵌入空间中的特征向量中,并且可以直接在这些特征向量上计算相似性度量。对各个领域的许多数据集的评估证实,EmbNum+ 在准确性方面始终优于其他最先进的方法。紧凑的嵌入表示也使 EmbNum+ 比其他方法快得多,并支持大规模语义标记。此外,属性增强可用于增强稳健性并解锁 EmbNum+ 的可移植性,使其可以在一个领域进行训练,但适用于许多不同的领域。紧凑的嵌入表示也使 EmbNum+ 比其他方法快得多,并支持大规模语义标记。此外,属性增强可用于增强稳健性并解锁 EmbNum+ 的可移植性,使其可以在一个领域进行训练,但适用于许多不同的领域。紧凑的嵌入表示也使 EmbNum+ 比其他方法快得多,并支持大规模语义标记。此外,属性增强可用于增强稳健性并解锁 EmbNum+ 的可移植性,使其可以在一个领域进行训练,但适用于许多不同的领域。
更新日期:2019-11-04
down
wechat
bug