A domain categorisation of vocabularies based on a deep learning classifier,Journal of Information Science

当前位置： X-MOL 学术 › J. Inf. Sci. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A domain categorisation of vocabularies based on a deep learning classifier
Journal of Information Science ( IF 1.8 ) Pub Date : 2021-05-23 , DOI: 10.1177/01655515211018170
Alberto Nogales ₁ , Miguel-Angel Sicilia ₂ , Álvaro J García-Tejedor ₁

Affiliation

The publication of large amounts of open data is an increasing trend. This is a consequence of initiatives like Linked Open Data (LOD) that aims at publishing and linking data sets published in the World Wide Web. Linked Data publishers should follow a set of principles for their task. This information is described in a 2011 document that includes the consideration of reusing vocabularies as key. The Linked Open Vocabularies (LOV) project attempts to collect the vocabularies and ontologies commonly used in LOD. These ontologies have been classified by domain following the criteria of LOV members, thus having the disadvantage of introducing personal biases. This article presents an automatic classifier of ontologies based on the main categories appearing in Wikipedia. For that purpose, word-embedding models are used in combination with deep learning techniques. Results show that with a hybrid model of regular Deep Neural Networks (DNNs), Recurrent Neural Network (RNN) and Convolutional Neural Network (CNN), classification could be made with an accuracy of 93.57%. A further evaluation of the domain matchings between LOV and the classifier brings possible matchings in 79.8% of the cases.

中文翻译：

基于深度学习分类器的词汇领域分类

大量开放数据的发布是一种增长的趋势。这是诸如链接开放数据（LOD）之类的举措的结果，该举措旨在发布和链接在万维网上发布的数据集。链接数据发布者应遵循一组原则来执行任务。此信息在2011年文档中进行了描述，其中包括考虑将词汇重用作为关键。链接开放词汇表（LOV）项目尝试收集LOD中常用的词汇表和本体。这些本体已按照LOV成员的标准按领域进行了分类，因此具有引入个人偏见的缺点。本文介绍了基于Wikipedia中出现的主要类别的本体的自动分类器。为了这个目的，词嵌入模型与深度学习技术结合使用。结果表明，使用常规深层神经网络（DNN），递归神经网络（RNN）和卷积神经网络（CNN）的混合模型，可以进行分类，准确性为93.57％。LOV和分类器之间的域匹配的进一步评估带来了79.8％的情况下可能的匹配。

更新日期：2021-05-24

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11