Knowledge base enrichment by relation learning from social tagging data,Information Sciences

当前位置： X-MOL 学术 › Inform. Sci. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Knowledge base enrichment by relation learning from social tagging data
Information Sciences Pub Date : 2020-04-06 , DOI: 10.1016/j.ins.2020.04.002
Hang Dong , Wei Wang , Frans Coenen , Kaizhu Huang

There has been considerable interest in transforming unstructured social tagging data into structured knowledge for semantic-based retrieval and recommendation. Research in this line mostly exploits data co-occurrence and often overlooks the complex and ambiguous meanings of tags. Furthermore, there have been few comprehensive evaluation studies regarding the quality of the discovered knowledge. We propose a supervised learning method to discover subsumption relations from tags. The key to this method is quantifying the probabilistic association among tags to better characterise their relations. We further develop an algorithm to organise tags into hierarchies based on the learned relations. Experiments were conducted using a large, publicly available dataset, Bibsonomy, and three popular, human-engineered or data-driven knowledge bases: DBpedia, Microsoft Concept Graph, and ACM Computing Classification System. We performed a comprehensive evaluation using different strategies: relation-level, ontology-level, and knowledge base enrichment based evaluation. The results clearly show that the proposed method can extract knowledge of better quality than the existing methods against the gold standard knowledge bases. The proposed approach can also enrich knowledge bases with new subsumption relations, having the potential to significantly reduce time and human effort for knowledge base maintenance and ontology evolution.

中文翻译：

通过从社会标签数据中进行关系学习来丰富知识库

将非结构化社会标签数据转换为结构化知识以进行基于语义的检索和推荐已经引起了极大的兴趣。在这方面的研究主要利用数据的同时出现，并且常常忽略标签的复杂和含糊的含义。此外，关于发现的知识质量的综合评估研究很少。我们提出了一种监督学习的方法来发现标签中的包含关系。该方法的关键是量化标签之间的概率关联，以更好地表征它们之间的关系。我们进一步开发了一种算法，可根据学习到的关系将标签组织到层次结构中。实验是使用大型的公开数据集Bibsonomy和三个流行的，人工设计的或数据驱动的知识库进行的：DBpedia，Microsoft概念图和ACM计算分类系统。我们使用不同的策略进行了全面评估：关系级，本体级和基于知识库丰富的评估。结果清楚地表明，相对于现有方法，该方法可以针对黄金标准知识库提取质量更高的知识。所提出的方法还可以通过新的包含关系来丰富知识库，从而有可能显着减少知识库维护和本体演化的时间和人力。结果清楚地表明，相对于黄金标准知识库，该方法可以比现有方法提取质量更高的知识。所提出的方法还可以通过新的包含关系来丰富知识库，从而有可能显着减少知识库维护和本体演化的时间和人力。结果清楚地表明，相对于黄金标准知识库，该方法可以比现有方法提取质量更高的知识。所提出的方法还可以通过新的包含关系来丰富知识库，从而有可能显着减少知识库维护和本体演化的时间和人力。

更新日期：2020-04-06

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11