当前位置: X-MOL 学术Inf. Process. Manag. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Candidate sentence selection for extractive text summarization
Information Processing & Management ( IF 8.6 ) Pub Date : 2020-08-12 , DOI: 10.1016/j.ipm.2020.102359
Begum Mutlu , Ebru A. Sezer , M. Ali Akcayol

Text summarization is a process of generating a brief version of documents by preserving the fundamental information of documents as much as possible. Although most of the text summarization research has been focused on supervised learning solutions, there are a few datasets indeed generated for summarization tasks, and most of the existing summarization datasets do not have human-generated goal summaries which are vital for both summary generation and evaluation. Therefore, a new dataset was presented for abstractive and extractive summarization tasks in this study. This dataset contains academic publications, the abstracts written by the authors, and extracts in two sizes, which were generated by human readers in this research. Then, the resulting extracts were evaluated to ensure the validity of the human extract production process. Moreover, the extractive summarization problem was reinvestigated on the proposed summarization dataset. Here the main point taken into account was to analyze the feature vector to generate more informative summaries. To that end, a comprehensive syntactic feature space was generated for the proposed dataset, and the impact of these features on the informativeness of the resulting summary was investigated. Besides, the summarization capability of semantic features was experienced by using GloVe and word2vec embeddings. Finally, the use of ensembled feature space, which corresponds to the joint use of syntactic and semantic features, was proposed on a long short-term memory-based neural network model. ROUGE metrics evaluated the model summaries, and the results of these evaluations showed that the use of the proposed ensemble feature space remarkably improved the single-use of syntactic or semantic features. Additionally, the resulting summaries of the proposed approach on ensembled features prominently outperformed or provided comparable performance than summaries obtained by state-of-the-art models for extractive summarization.



中文翻译:

候选句子选择,用于提取文本摘要

文本摘要是通过尽可能保留文档的基本信息来生成文档的简短版本的过程。尽管大多数文本摘要研究都集中在有监督的学习解决方案上,但确实为摘要任务生成了一些数据集,并且大多数现有的摘要数据集都没有人为生成的目标摘要,这对于摘要的生成和评估都是至关重要的。因此,本研究提出了一个用于抽象和提取摘要任务的新数据集。该数据集包含学术出版物,作者撰写的摘要以及两种大小的摘录,这些摘录是由本研究中的人类读者生成的。然后,对所得提取物进行评估以确保人类提取物生产过程的有效性。此外,在建议的摘要数据集上对提取摘要问题进行了重新研究。这里要考虑的重点是分析特征向量以生成更多信息摘要。为此,为拟议的数据集生成了全面的句法特征空间,并研究了这些特征对结果摘要的信息性的影响。此外,通过使用GloVe和word2vec嵌入体验了语义特征的汇总能力。最后,在长的基于短期记忆的神经网络模型上,提出了组合特征空间的使用,该组合空间对应于句法和语义特征的联合使用。ROUGE指标评估了模型摘要,这些评估的结果表明,所提出的集成特征空间的使用显着改善了句法或语义特征的单次使用。另外,与通过最新的模型进行摘要提取所获得的摘要相比,所提出的方法在汇总特征方面的摘要明显优于或提供了可比的性能。

更新日期:2020-08-14
down
wechat
bug