当前位置: X-MOL 学术arXiv.cs.LG › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Mode recovery in neural autoregressive sequence modeling
arXiv - CS - Machine Learning Pub Date : 2021-06-10 , DOI: arxiv-2106.05459
Ilia Kulikov, Sean Welleck, Kyunghyun Cho

Despite its wide use, recent studies have revealed unexpected and undesirable properties of neural autoregressive sequence models trained with maximum likelihood, such as an unreasonably high affinity to short sequences after training and to infinitely long sequences at decoding time. We propose to study these phenomena by investigating how the modes, or local maxima, of a distribution are maintained throughout the full learning chain of the ground-truth, empirical, learned and decoding-induced distributions, via the newly proposed mode recovery cost. We design a tractable testbed where we build three types of ground-truth distributions: (1) an LSTM based structured distribution, (2) an unstructured distribution where probability of a sequence does not depend on its content, and (3) a product of these two which we call a semi-structured distribution. Our study reveals both expected and unexpected findings. First, starting with data collection, mode recovery cost strongly relies on the ground-truth distribution and is most costly with the semi-structured distribution. Second, after learning, mode recovery cost from the ground-truth distribution may increase or decrease compared to data collection, with the largest cost degradation occurring with the semi-structured ground-truth distribution. Finally, the ability of the decoding-induced distribution to recover modes from the learned distribution is highly impacted by the choices made earlier in the learning chain. We conclude that future research must consider the entire learning chain in order to fully understand the potentials and perils and to further improve neural autoregressive sequence models.

中文翻译:

神经自回归序列建模中的模式恢复

尽管它被广泛使用,但最近的研究揭示了以最大似然训练的神经自回归序列模型的意外和不受欢迎的特性,例如训练后对短序列和解码时对无限长序列的不合理的高亲和力。我们建议研究这些现象,通过新提出的模式恢复成本,研究分布的模式或局部最大值如何在真实、经验、学习和解码诱导分布的整个学习链中保持。我们设计了一个易于处理的测试平台,我们在其中构建了三种类型的真实分布:(1)基于 LSTM 的结构化分布,(2)非结构化分布,其中序列的概率不取决于其内容,(3) 这两者的乘积,我们称之为半结构化分布。我们的研究揭示了预期和意外的发现。首先,从数据收集开始,模式恢复成本强烈依赖于真实分布,而半结构化分布的成本最高。其次,在学习之后,与数据收集相比,来自地面实况分布的模式恢复成本可能会增加或减少,其中半结构化地面实况分布的成本降幅最大。最后,解码诱导分布从学习分布中恢复模式的能力受到学习链早期选择的高度影响。
更新日期:2021-06-11
down
wechat
bug