当前位置: X-MOL 学术Scientometrics › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Exploiting pivot words to classify and summarize discourse facets of scientific papers
Scientometrics ( IF 3.9 ) Pub Date : 2020-06-13 , DOI: 10.1007/s11192-020-03532-3
Moreno La Quatra , Luca Cagliero , Elena Baralis

The ever-increasing number of published scientific articles has prompted the need for automated, data-driven approaches to summarizing the content of scientific articles. The Computational Linguistics Scientific Document Summarization Shared Task (CL-SciSumm 2019) has recently fostered the study and development of new text mining and machine learning solutions to the summarization problem customized to the academic domain. In CL-SciSumm, a Reference Paper (RP) is associated with a set of Citing Papers (CPs), all containing citations to the RP. In each CP, the text spans (i.e., citances) have been identified that pertain to a particular citation to the RP. The task of identifying the spans of text in the RP that most accurately reflect the citance is addressed using supervised approaches. This paper proposes a new, more effective solution to the CL-SciSumm discourse facet classification task, which entails identifying for each cited text span what facet of the paper it belongs to from a predefined set of facets. It proposes also to extend the set of traditional CL-SciSumm tasks with a new one, namely the discourse facet summarization task. The idea behind is to extract facet-specific descriptions of each RP consisting of a fixed-length collection of RP’s text spans. To tackle both the standard and the new tasks, we propose machine learning supported solutions based on the extraction of a selection of discriminating words, called pivot words . Predictive features based on pivot words are shown to be of great importance to rate the pertinence and relevance of a text span to a given facet. The newly proposed facet classification method performs significantly better than the best performing CL-SciSumm 2019 participant (i.e., the classification accuracy has increased by + 8%), whereas regression methods achieved promising results for the newly proposed summarization task.

中文翻译:

利用关键词对科学论文的话语方面进行分类和总结

已发表的科学文章数量不断增加,这促使需要采用自动化的、数据驱动的方法来总结科学文章的内容。计算语言学科学文档摘要共享任务(CL-SciSumm 2019)最近促进了针对学术领域定制的摘要问题的新文本挖掘和机器学习解决方案的研究和开发。在 CL-SciSumm 中,参考论文 (RP) 与一组引用论文 (CP) 相关联,所有这些都包含对 RP 的引用。在每个 CP 中,文本跨度(即引用)已被确定为与 RP 的特定引用有关。识别 RP 中最准确地反映引用的文本跨度的任务是使用监督方法解决的。本文提出了一种新的、CL-SciSumm 话语方面分类任务的更有效解决方案,该任务需要从预定义的一组方面中识别每个引用的文本跨越它属于论文的哪个方面。它还建议用一个新的任务来扩展传统的 CL-SciSumm 任务集,即话语方面的摘要任务。背后的想法是提取每个 RP 的特定方面描述,这些描述由固定长度的 RP 文本跨度集合组成。为了同时处理标准任务和新任务,我们提出了机器学习支持的解决方案,该解决方案基于提取一系列有区别的词,称为枢轴词。事实证明,基于枢纽词的预测特征对于评估文本跨度与给定方面的相关性和相关性非常重要。
更新日期:2020-06-13
down
wechat
bug