当前位置: X-MOL 学术arXiv.cs.CL › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Improving Robustness and Generality of NLP Models Using Disentangled Representations
arXiv - CS - Computation and Language Pub Date : 2020-09-21 , DOI: arxiv-2009.09587
Jiawei Wu, Xiaoya Li, Xiang Ao, Yuxian Meng, Fei Wu and Jiwei Li

Supervised neural networks, which first map an input $x$ to a single representation $z$, and then map $z$ to the output label $y$, have achieved remarkable success in a wide range of natural language processing (NLP) tasks. Despite their success, neural models lack for both robustness and generality: small perturbations to inputs can result in absolutely different outputs; the performance of a model trained on one domain drops drastically when tested on another domain. In this paper, we present methods to improve robustness and generality of NLP models from the standpoint of disentangled representation learning. Instead of mapping $x$ to a single representation $z$, the proposed strategy maps $x$ to a set of representations $\{z_1,z_2,...,z_K\}$ while forcing them to be disentangled. These representations are then mapped to different logits $l$s, the ensemble of which is used to make the final prediction $y$. We propose different methods to incorporate this idea into currently widely-used models, including adding an $L$2 regularizer on $z$s or adding Total Correlation (TC) under the framework of variational information bottleneck (VIB). We show that models trained with the proposed criteria provide better robustness and domain adaptation ability in a wide range of supervised learning tasks.

中文翻译:

使用解缠结表示提高 NLP 模型的鲁棒性和通用性

有监督的神经网络首先将输入 $x$ 映射到单个表示 $z$,然后将 $z$ 映射到输出标签 $y$,在广泛的自然语言处理 (NLP) 任务中取得了显着的成功. 尽管取得了成功,但神经模型缺乏鲁棒性和通用性:对输入的微小扰动可能导致完全不同的输出;当在另一个域上测试时,在一个域上训练的模型的性能急剧下降。在本文中,我们提出了从分离表示学习的角度提高 NLP 模型鲁棒性和通用性的方法。所提出的策略不是将 $x$ 映射到单个表示 $z$,而是将 $x$ 映射到一组表示 $\{z_1,z_2,...,z_K\}$,同时强制它们解开。然后将这些表示映射到不同的 logits $l$s,其集成用于进行最终预测 $y$。我们提出了不同的方法将这个想法融入当前广泛使用的模型中,包括在 $z$s 上添加 $L$2 正则化器或在变分信息瓶颈 (VIB) 的框架下添加总相关 (TC)。我们表明,使用所提出的标准训练的模型在广泛的监督学习任务中提供了更好的鲁棒性和领域适应能力。
更新日期:2020-09-22
down
wechat
bug