当前位置: X-MOL 学术International Journal on Digital Libraries › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Open information extraction as an intermediate semantic structure for Persian text summarization
International Journal on Digital Libraries Pub Date : 2018-06-28 , DOI: 10.1007/s00799-018-0244-z
Mahmoud Rahat , Alireza Talebpour

Semantic applications typically exploit structures such as dependency parse trees, phrase-chunking, semantic role labeling or open information extraction. In this paper, we introduce a novel application of Open IE as an intermediate layer for text summarization. Text summarization is an important method for providing relevant information in large digital libraries. Open IE is referred to the process of extracting machine-understandable structural propositions from text. We use these propositions as a building block to shorten the sentence and generate a summary of the text. The proposed system offers a new form of summarization that is able to break the structure of the sentence and extract the most significant sub-sentential elements. Other advantages include the ability to identify and eliminate less important sections of the sentence (such as adverbs, adjectives, appositions or dependent clauses), or duplicate pieces of sentences which in turn opens up the space for entering more sentences in the summary to enhance the coverage and coherency of it. The proposed system is localized for Persian language; however, it can be adopted to other languages. Experiments performed on a standard data set “Pasokh” with a standard comparison tool showed promising results for the proposed approach. We used summaries produced by the system in a real-world application in the virtual library of Shahid Beheshti University and received good feedbacks from users.

中文翻译:

开放信息提取作为波斯文本摘要的中间语义结构

语义应用程序通常利用诸如依存关系分析树,短语组块,语义角色标签或开放信息提取之类的结构。在本文中,我们介绍了Open IE作为文本摘要中间层的一种新颖应用。文本摘要是在大型数字图书馆中提供相关信息的重要方法。开放式IE是指从文本中提取机器可理解的结构命题的过程。我们使用这些命题作为构建模块来缩短句子并生成文本摘要。所提出的系统提供了一种新的摘要形式,能够打破句子的结构并提取出最重要的子句元素。其他优势包括能够识别和消除句子中不重要的部分(例如副词,形容词,介词或从属从句)或重复的句子,从而为在摘要中输入更多句子提供了空间,从而增强了句法它的覆盖范围和一致性。拟议的系统已针对波斯语言进行了本地化;但是,它可以被其他语言采用。使用标准比较工具在标准数据集“ Pasokh”上进行的实验表明,该方法具有可喜的结果。我们在Shahid Beheshti University虚拟图书馆的真实应用程序中使用了系统生成的摘要,并收到了用户的良好反馈。或重复的句子片段,从而打开了在摘要中输入更多句子的空间,以增强其覆盖范围和连贯性。拟议的系统已针对波斯语言进行了本地化;但是,它可以被其他语言采用。使用标准比较工具在标准数据集“ Pasokh”上进行的实验表明,该方法具有可喜的结果。我们在Shahid Beheshti University虚拟图书馆的真实应用程序中使用了系统生成的摘要,并收到了用户的良好反馈。或重复的句子片段,从而打开了在摘要中输入更多句子的空间,以增强其覆盖范围和连贯性。拟议的系统已针对波斯语言进行了本地化;但是,它可以被其他语言采用。使用标准比较工具在标准数据集“ Pasokh”上进行的实验表明,该方法具有可喜的结果。我们在Shahid Beheshti University虚拟图书馆的真实应用程序中使用了系统生成的摘要,并收到了用户的良好反馈。使用标准比较工具在标准数据集“ Pasokh”上进行的实验表明,该方法具有可喜的结果。我们在Shahid Beheshti University虚拟图书馆的真实应用程序中使用了系统生成的摘要,并收到了用户的良好反馈。使用标准比较工具在标准数据集“ Pasokh”上进行的实验表明,该方法具有可喜的结果。我们在Shahid Beheshti University虚拟图书馆的真实应用程序中使用了系统生成的摘要,并收到了用户的良好反馈。
更新日期:2018-06-28
down
wechat
bug