当前位置: X-MOL 学术Scientometrics › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A document-structure-based complex network model for extracting text keywords
Scientometrics ( IF 3.5 ) Pub Date : 2020-06-17 , DOI: 10.1007/s11192-020-03542-1
YiJun Liu , Li Zhang , Xiaoli Lian

Keywords serving a dense summary of documents, are widely used in search engine and library to do information retrieval, content classification, speech recognition and automated text summarization. However, massive documents are lack of keywords, and the rapid generation of the large amount of content every day makes the human annotation really time-consuming. Lots of researches show that network-based approaches have remarkable performance for extracting text keywords. Traditionally, words are connected based upon their occurrence in documents. One recent work shows the significant influence of sentences on keywords extraction beyond the traditional methods only considering words. While in addition to words and sentences, chapters are the essential parts that are organized as the higher level semantic logic of the documents. Inspired by this idea, we therefore assume that chapters should contribute to the keyword extraction too. We further add the chapter factor to build a three-layer network model and propose a Word-Sentence-Chapter network-based approach for keywords extraction. Two experiments with Chinese and English documents respectively indicate that our approach outperforms the state of arts.

中文翻译:

一种基于文档结构的文本关键词提取复杂网络模型

关键字服务于文档的密集摘要,广泛用于搜索引擎和图书馆,以进行信息检索、内容分类、语音识别和自动文本摘要。但是海量文档缺少关键词,每天大量内容的快速生成,使得人工标注非常耗时。大量研究表明,基于网络的方法在提取文本关键字方面具有显着的性能。传统上,单词是根据它们在文档中的出现来连接的。最近的一项工作表明,句子对关键字提取的显着影响超出了仅考虑单词的传统方法。而除了单词和句子之外,章节是组织为文档的更高级别语义逻辑的必要部分。受到这个想法的启发,因此,我们假设章节也应该有助于关键字提取。我们进一步添加章节因子来构建三层网络模型,并提出一种基于 Word-Sentence-Chapter 网络的关键字提取方法。分别用中文和英文文档进行的两次实验表明,我们的方法优于现有技术。
更新日期:2020-06-17
down
wechat
bug