Deep Learning Based Multi-Label Text Classification of UNGA Resolutions,arXiv - CS - Information Retrieval

当前位置： X-MOL 学术 › arXiv.cs.IR › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Deep Learning Based Multi-Label Text Classification of UNGA Resolutions
arXiv - CS - Information Retrieval Pub Date : 2020-04-01 , DOI: arxiv-2004.03455
Francesco Sovrano, Monica Palmirani, Fabio Vitali

The main goal of this research is to produce a useful software for United Nations (UN), that could help to speed up the process of qualifying the UN documents following the Sustainable Development Goals (SDGs) in order to monitor the progresses at the world level to fight poverty, discrimination, climate changes. In fact human labeling of UN documents would be a daunting task given the size of the impacted corpus. Thus, automatic labeling must be adopted at least as a first step of a multi-phase process to reduce the overall effort of cataloguing and classifying. Deep Learning (DL) is nowadays one of the most powerful tools for state-of-the-art (SOTA) AI for this task, but very often it comes with the cost of an expensive and error-prone preparation of a training-set. In the case of multi-label text classification of domain-specific text it seems that we cannot effectively adopt DL without a big-enough domain-specific training-set. In this paper, we show that this is not always true. In fact we propose a novel method that is able, through statistics like TF-IDF, to exploit pre-trained SOTA DL models (such as the Universal Sentence Encoder) without any need for traditional transfer learning or any other expensive training procedure. We show the effectiveness of our method in a legal context, by classifying UN Resolutions according to their most related SDGs.

中文翻译：

联合国大会决议基于深度学习的多标签文本分类

这项研究的主要目标是为联合国 (UN) 制作一个有用的软件，这有助于加快按照可持续发展目标 (SDG) 对联合国文件进行资格审查的过程，以监测世界层面的进展与贫困、歧视和气候变化作斗争。事实上，考虑到受影响的语料库的规模，对联合国文件进行人工标记将是一项艰巨的任务。因此，至少必须采用自动标记作为多阶段过程的第一步，以减少编目和分类的整体工作。深度学习 (DL) 是当今最先进 (SOTA) AI 用于此任务的最强大工具之一，但它通常伴随着昂贵且容易出错的训练集准备成本. 在特定领域文本的多标签文本分类的情况下，如果没有足够大的特定领域训练集，我们似乎无法有效地采用 DL。在本文中，我们表明这并不总是正确的。事实上，我们提出了一种新方法，它能够通过像 TF-IDF 这样的统计数据来利用预训练的 SOTA DL 模型（例如通用句子编码器），而无需传统的迁移学习或任何其他昂贵的训练过程。我们通过根据最相关的可持续发展目标对联合国决议进行分类来展示我们方法在法律背景下的有效性。利用预训练的 SOTA DL 模型（例如通用句子编码器），无需传统的迁移学习或任何其他昂贵的训练程序。我们通过根据最相关的可持续发展目标对联合国决议进行分类来展示我们方法在法律背景下的有效性。利用预训练的 SOTA DL 模型（例如通用句子编码器），无需传统的迁移学习或任何其他昂贵的训练程序。我们通过根据最相关的可持续发展目标对联合国决议进行分类来展示我们方法在法律背景下的有效性。

更新日期：2020-04-08

点击分享查看原文

点击收藏

阅读更多本刊最新论文