Mitigating Data Scarceness through Data Synthesis, Augmentation and Curriculum for Abstractive Summarization,arXiv - CS - Computation and Language

当前位置： X-MOL 学术 › arXiv.cs.CL › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Mitigating Data Scarceness through Data Synthesis, Augmentation and Curriculum for Abstractive Summarization
arXiv - CS - Computation and Language Pub Date : 2021-09-17 , DOI: arxiv-2109.08569
Ahmed Magooda, Diane Litman

This paper explores three simple data manipulation techniques (synthesis, augmentation, curriculum) for improving abstractive summarization models without the need for any additional data. We introduce a method of data synthesis with paraphrasing, a data augmentation technique with sample mixing, and curriculum learning with two new difficulty metrics based on specificity and abstractiveness. We conduct experiments to show that these three techniques can help improve abstractive summarization across two summarization models and two different small datasets. Furthermore, we show that these techniques can improve performance when applied in isolation and when combined.

中文翻译：

通过数据合成、扩充和抽象总结课程减轻数据稀缺性

本文探讨了三种简单的数据操作技术（合成、扩充、课程），用于改进抽象摘要模型，而无需任何额外数据。我们介绍了一种带有释义的数据合成方法、一种带有样本混合的数据增强技术，以及基于特殊性和抽象性的两个新难度指标的课程学习。我们进行实验以表明这三种技术可以帮助改进跨两个摘要模型和两个不同小数据集的抽象摘要。此外，我们表明这些技术在单独应用和组合使用时可以提高性能。

更新日期：2021-09-20

点击分享查看原文

点击收藏

阅读更多本刊最新论文