MEET-LM: A method for embeddings evaluation for taxonomic data in the labour market,Computers in Industry

当前位置： X-MOL 学术 › Comput. Ind. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

MEET-LM: A method for embeddings evaluation for taxonomic data in the labour market
Computers in Industry ( IF 8.2 ) Pub Date : 2020-11-23 , DOI: 10.1016/j.compind.2020.103341
Lorenzo Malandri , Fabio Mercorio , Mario Mezzanzanica , Navid Nobani

Taxonomies are the mainstay of the semantic web as they aim at organising knowledge in concepts linked by IS-A relationships. However, keeping such hierarchies updated and able to represent the domain from which they have been drawn is still a time-consuming, costly and error prone activity. Here, word embeddings have proven to be effective in catching lexicon and semantic similarities to enrich taxonomies from text data. This, in turn, would require to evaluate the generated embeddings to estimate the extent to which they encode the semantic similarity derived from the hierarchy itself. In this paper, we propose and implement MEET-LM, a methodology that aims at generating and evaluating embeddings from a text corpus preserving the co-hyponymy relations synthesised from a domain-specific taxonomy. We apply MEET-LM to a real-life dataset of 2M+ vacancies related to ICT-jobs, framed within the research activities of an EU project that collects millions of Online Job Vacancies and classifies them within the European standard hierarchy ESCO. To show MEET-LM is useful in practice, we also trained a neural network to classify co-hyponym relations using the selected embeddings as features. Our experiments reach $99.4 %$ of accuracy and $86.5 %$ of f1-score.

中文翻译：

MEET-LM：劳动力市场中分类数据的嵌入评估方法

分类法是语义网的支柱，因为它们旨在组织由IS-A关系链接的概念中的知识。但是，保持此类层次结构的更新并能够表示从中提取层次结构仍然是一项耗时，昂贵且容易出错的活动。在这里，单词嵌入已被证明可有效地捕获词典和语义相似性，以丰富文本数据中的分类法。反过来，这将需要评估生成的嵌入，以估计它们对从层次结构本身派生的语义相似性进行编码的程度。在本文中，我们提出并实现了MEET-LM，该方法旨在从文本语料库中生成并评估嵌入，以保留同义。从特定领域的分类法合成的关系。我们将MEET-LM应用于与ICT工作相关的2M +职位空缺的真实数据集，该数据集是在一个欧盟项目的研究活动中进行的，该项目收集了数百万个在线职位空缺并将其分类为欧洲标准层次ESCO。为了显示MEET-LM在实践中的有用性，我们还训练了一个神经网络，以使用选定的嵌入作为特征对同义关系进行分类。我们的实验达到 $99.4 ％$ 的准确性和 $86.5 ％$ f1得分。

更新日期：2020-11-25

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11