Enhancing Out-Of-Domain Utterance Detection with Data Augmentation Based on Word Embeddings,arXiv - CS - Computation and Language

当前位置： X-MOL 学术 › arXiv.cs.CL › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Enhancing Out-Of-Domain Utterance Detection with Data Augmentation Based on Word Embeddings
arXiv - CS - Computation and Language Pub Date : 2019-11-24 , DOI: arxiv-1911.10439
Yueqi Feng, Jiali Lin

For most intelligent assistant systems, it is essential to have a mechanism that detects out-of-domain (OOD) utterances automatically to handle noisy input properly. One typical approach would be introducing a separate class that contains OOD utterance examples combined with in-domain text samples into the classifier. However, since OOD utterances are usually unseen to the training datasets, the detection performance largely depends on the quality of the attached OOD text data with restricted sizes of samples due to computing limits. In this paper, we study how augmented OOD data based on sampling impact OOD utterance detection with a small sample size. We hypothesize that OOD utterance samples chosen randomly can increase the coverage of unknown OOD utterance space and enhance detection accuracy if they are more dispersed. Experiments show that given the same dataset with the same OOD sample size, the OOD utterance detection performance improves when OOD samples are more spread-out.

中文翻译：

基于词嵌入的数据增强增强域外话语检测

对于大多数智能助理系统，必须有一种机制来自动检测域外 (OOD) 话语以正确处理嘈杂的输入。一种典型的方法是将一个单独的类引入到分类器中，该类包含与域内文本样本相结合的 OOD 话语示例。然而，由于训练数据集通常看不到 OOD 话语，因此检测性能在很大程度上取决于附加的 OOD 文本数据的质量，由于计算限制，样本大小受到限制。在本文中，我们研究了基于采样的增强 OOD 数据如何影响小样本量的 OOD 话语检测。我们假设随机选择的 OOD 话语样本可以增加未知 OOD 话语空间的覆盖率，如果它们更加分散，则可以提高检测精度。

更新日期：2020-03-30

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>