当前位置: X-MOL 学术Scientometrics › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Approximate matching-based unsupervised document indexing approach: application to biomedical domain
Scientometrics ( IF 3.5 ) Pub Date : 2020-05-07 , DOI: 10.1007/s11192-020-03474-w
Kabil Boukhari , Mohamed Nazih Omri

Document indexing is considered as a crucial phase in the information retrieval field because textual information is constantly increasing. With this accumulation of documents, the satisfaction of user needs becomes more and more complex. For these reasons, several information retrieval systems have been designed in order to respond to user requests. The main contribution of the current work resides in the suggestion of a novel hybrid approach for biomedical document indexing. We improve the estimation of the correspondence between a document and a given concept using two methods: vector space model (VSM) and description logics (DL). VSM performs partial matching between documents and external resource terms. DL allows representing knowledge in a relevant manner for better matching. The proposed contribution reduces the limitation of exact matching. It serves to index documents by exploiting medical subject headings (MeSH) thesaurus services with approximate matching. The latter partially matches document terms with biomedical vocabularies to extract other morphological variants in that resource. It also generates irrelevant concepts. The filtering step solves this problem and grants the selection of the most important concepts by exploiting the knowledge provided by MeSH. The experiments, carried out on different corpora, show encouraging results of around 25% improvement in average accuracy compared to other approaches studied in the literature.

中文翻译:

基于近似匹配的无监督文档索引方法:在生物医学领域的应用

文档索引被认为是信息检索领域的关键阶段,因为文本信息在不断增加。随着文档的积累,用户需求的满足变得越来越复杂。由于这些原因,已经设计了多种信息检索系统来响应用户请求。当前工作的主要贡献在于提出了一种用于生物医学文档索引的新型混合方法。我们使用两种方法改进文档和给定概念之间的对应关系的估计:向量空间模型 (VSM) 和描述逻辑 (DL)。VSM 执行文档和外部资源术语之间的部分匹配。DL 允许以相关方式表示知识以获得更好的匹配。提议的贡献减少了精确匹配的限制。它通过利用具有近似匹配的医学主题词 (MeSH) 词库服务来索引文档。后者将文档术语与生物医学词汇部分匹配,以提取该资源中的其他形态变体。它还产生不相关的概念。过滤步骤解决了这个问题,并通过利用 MeSH 提供的知识来选择最重要的概念。在不同语料库上进行的实验显示,与文献中研究的其他方法相比,平均准确率提高了约 25% 的令人鼓舞的结果。后者将文档术语与生物医学词汇部分匹配,以提取该资源中的其他形态变体。它还产生不相关的概念。过滤步骤解决了这个问题,并通过利用 MeSH 提供的知识来选择最重要的概念。在不同语料库上进行的实验显示,与文献中研究的其他方法相比,平均准确率提高了约 25%,令人鼓舞。后者将文档术语与生物医学词汇部分匹配,以提取该资源中的其他形态变体。它还产生不相关的概念。过滤步骤解决了这个问题,并通过利用 MeSH 提供的知识来选择最重要的概念。在不同语料库上进行的实验显示,与文献中研究的其他方法相比,平均准确率提高了约 25% 的令人鼓舞的结果。
更新日期:2020-05-07
down
wechat
bug