当前位置: X-MOL 学术Decis. Support Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Methodology for refining subject terms and supporting subject indexing with taxonomy: A case study of the APO digital repository
Decision Support Systems ( IF 6.7 ) Pub Date : 2021-03-13 , DOI: 10.1016/j.dss.2021.113542
Yong-Bin Kang , Jihoon Woo , Les Kneebone , Timos Sellis

In digital repositories, it is crucial to refine existing subject terms and exploit a taxonomy with subject terms, in order to promote information retrieval tasks such as indexing, cataloging and searching of digital documents. In this paper, we address how to refine an existing set of subject terms, often containing irrelevant ones or creating noise, that are used to index digital documents. Further, we present how to automatically induce a subject term taxonomy to capture and utilise the semantic relations among subject terms. Most related works have little studied these problems, focusing mostly on creating subject terms or building a taxonomy of key terms from text documents. We propose a methodology2 for refining an existing set of subject terms in a digital repository by identifying their semantics, as well as inducing a taxonomy with subject terms by analysing their mutual usages, maximising their semantic relatedness. Then, we present a case study using the (Analysis & Policy Observatory) APO digital repository to analyse the proposed methodology and demonstrate its applicability. Further, to validate the generalisability of the proposed taxonomy inducing method, we evaluate it using a gold-standard taxonomy in life sciences, Medical Subject Headings (MeSH), in comparison with the state–of-the-art taxonomy inducing method, TaxoFinder. Our evaluation shows that our methodology has a high potential for refining an existing set of subject terms and capturing their semantic relationships by inducing a subject term taxonomy.



中文翻译:

精炼主题词并通过分类法支持主题索引的方法:APO数字存储库的案例研究

在数字存储库中,至关重要的是完善现有的主题词并利用主题词进行分类,以促进信息检索任务,例如索引,分类和搜索数字文档。在本文中,我们将探讨如何完善一组现有的主题词,这些主题词通常包含不相关的主题词或制造噪音,这些主题词用于索引数字文档。此外,我们提出了如何自动归纳主题词分类法以捕获和利用主题词之间的语义关系。大多数相关的著作很少研究这些问题,主要集中在创建主题词或从文本文档中建立关键术语的分类法。我们提出一种方法2通过识别主题语义来完善数字存储库中现有的一组主题词,以及通过分析主题词的相互用法,最大程度地提高它们的语义相关性来对主题词进行分类。然后,我们使用(分析和政策观察台)APO数字存储库进行案例研究,以分析所提出的方法并证明其适用性。此外,为了验证所提出的分类法诱导方法的通用性,我们与生命科学中的金标准分类法医学主题词(MeSH)进行了比较,并与最新的分类法诱导方法TaxoFinder进行了比较。我们的评估表明,我们的方法论具有很高的潜力,可以通过归纳主题词分类法来完善现有的主题词集并捕获其语义关系。

更新日期:2021-05-15
down
wechat
bug