当前位置: X-MOL 学术J. Biomed. Inform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Summarization of biomedical articles using domain-specific word embeddings and graph ranking.
Journal of Biomedical informatics ( IF 4.5 ) Pub Date : 2020-05-19 , DOI: 10.1016/j.jbi.2020.103452
Milad Moradi 1 , Maedeh Dashti 2 , Matthias Samwald 1
Affiliation  

Text summarization tools can help biomedical researchers and clinicians reduce the time and effort needed for acquiring important information from numerous documents. It has been shown that the input text can be modeled as a graph, and important sentences can be selected by identifying central nodes within the graph. However, the effective representation of documents, quantifying the relatedness of sentences, and selecting the most informative sentences are main challenges that need to be addressed in graph-based summarization. In this paper, we address these challenges in the context of biomedical text summarization. We evaluate the efficacy of a graph-based summarizer using different types of context-free and contextualized embeddings. The word representations are produced by pre-training neural language models on large corpora of biomedical texts. The summarizer models the input text as a graph in which the strength of relations between sentences is measured using the domain specific vector representations. We also assess the usefulness of different graph ranking techniques in the sentence selection step of our summarization method. Using the common Recall-Oriented Understudy for Gisting Evaluation (ROUGE) metrics, we evaluate the performance of our summarizer against various comparison methods. The results show that when the summarizer utilizes proper combinations of context-free and contextualized embeddings, along with an effective ranking method, it can outperform the other methods. We demonstrate that the best settings of our graph-based summarizer can efficiently improve the informative content of summaries and decrease the redundancy.

中文翻译:

使用特定领域的词嵌入和图排名对生物医学文章进行总结。

文本摘要工具可以帮助生物医学研究人员和临床医生减少从大量文档中获取重要信息所需的时间和精力。已经表明,可以将输入文本建模为图形,并且可以通过识别图形内的中心节点来选择重要的句子。然而,文档的有效表示,量化句子的相关性以及选择信息量最大的句子是基于图形的摘要中需要解决的主要挑战。在本文中,我们在生物医学文本摘要的背景下解决了这些挑战。我们使用不同类型的上下文无关和上下文化嵌入来评估基于图的摘要程序的功效。通过在大型生物医学文本上预训练神经语言模型来生成单词表示。摘要器将输入文本建模为图形,其中使用特定于域的矢量表示来测量句子之间的关系强度。我们还在摘要方法的句子选择步骤中评估了不同图形排名技术的有用性。使用针对召回评估的常见的面向召回的调查研究(ROUGE)指标,我们针对各种比较方法来评估汇总器的性能。结果表明,当汇总器利用上下文无关和上下文化嵌入的适当组合以及有效的排序方法时,其性能将优于其他方法。
更新日期:2020-05-19
down
wechat
bug