当前位置: X-MOL 学术J. Am. Stat. Assoc. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Coupled Generation
Journal of the American Statistical Association ( IF 3.0 ) Pub Date : 2021-01-04 , DOI: 10.1080/01621459.2020.1844719
Ben Dai 1 , Xiaotong Shen 1 , Wing Wong 2
Affiliation  

Abstract

Instance generation creates representative examples to interpret a learning model, as in regression and classification. For example, representative sentences of a topic of interest describe the topic specifically for sentence categorization. In such a situation, a large number of unlabeled observations may be available in addition to labeled data, for example, many unclassified text corpora (unlabeled instances) are available with only a few classified sentences (labeled instances). In this article, we introduce a novel generative method, called a coupled generator, producing instances given a specific learning outcome, based on indirect and direct generators. The indirect generator uses the inverse principle to yield the corresponding inverse probability, enabling to generate instances by leveraging an unlabeled data. The direct generator learns the distribution of an instance given its learning outcome. Then, the coupled generator seeks the best one from the indirect and direct generators, which is designed to enjoy the benefits of both and deliver higher generation accuracy. For sentence generation given a topic, we develop an embedding-based regression/classification in conjuncture with an unconditional recurrent neural network for the indirect generator, whereas a conditional recurrent neural network is natural for the corresponding direct generator. Moreover, we derive finite-sample generation error bounds for the indirect and direct generators to reveal the generative aspects of both methods thus explaining the benefits of the coupled generator. Finally, we apply the proposed methods to a real benchmark of abstract classification and demonstrate that the coupled generator composes reasonably good sentences from a dictionary to describe a specific topic of interest. Supplementary materials for this article are available online.



中文翻译:

 耦合发电

 抽象的


实例生成创建代表性示例来解释学习模型,如回归和分类。例如,感兴趣的主题的代表性句子专门描述该主题,用于句子分类。在这种情况下,除了标记数据之外,还可以使用大量未标记的观察结果,例如,许多未分类的文本语料库(未标记的实例)仅具有少量分类的句子(标记的实例)。在本文中,我们介绍了一种新颖的生成方法,称为耦合生成器,基于间接和直接生成器,在给定特定学习结果的情况下生成实例。间接生成器利用逆原理产生相应的逆概率,从而能够利用未标记的数据生成实例。直接生成器根据给定的学习结果来学习实例的分布。然后,耦合发电机从间接发电机和直接发电机中寻找最好的一个,旨在享受两者的好处并提供更高的发电精度。对于给定主题的句子生成,我们与间接生成器的无条件循环神经网络结合开发了基于嵌入的回归/分类,而条件循环神经网络对于相应的直接生成器来说是自然的。此外,我们推导了间接和直接生成器的有限样本生成误差界限,以揭示两种方法的生成方面,从而解释耦合生成器的好处。 最后,我们将所提出的方法应用于抽象分类的真实基准,并证明耦合生成器从字典中组成相当好的句子来描述感兴趣的特定主题。本文的补充材料可在线获取。

更新日期:2021-01-04
down
wechat
bug