Towards Textual Out-of-Domain Detection Without In-Domain Labels,IEEE/ACM Transactions on Audio, Speech, and Language Processing

当前位置： X-MOL 学术 › IEEE ACM Trans. Audio Speech Lang. Process. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Towards Textual Out-of-Domain Detection Without In-Domain Labels
IEEE/ACM Transactions on Audio, Speech, and Language Processing ( IF 4.1 ) Pub Date : 2022-03-24 , DOI: 10.1109/taslp.2022.3162081
Di Jin ₁ , Shuyang Gao ₁ , Seokhwan Kim ₁ , Yang Liu ₁ , Dilek Hakkani-Tur ₁

Affiliation

In many real-world settings, machine learning models need to identify user inputs that are out-of-domain (OOD) so as to avoid performing wrong actions. This work focuses on a challenging case of OOD detection, where no labels for in-domain data are accessible (e.g., no intent labels for the intent classification task). To this end, we first evaluate different language model based approaches that predict likelihood for a sequence of tokens. Furthermore, we propose a novel representation learning based method by combining unsupervised clustering and contrastive learning so that better data representations for OOD detection can be learned. Through extensive experiments, we demonstrate that this method can significantly outperform likelihood-based methods and can be even competitive to the state-of-the-art supervised approaches with label information.

中文翻译：

实现没有域内标签的文本域外检测

在许多现实环境中，机器学习模型需要识别域外（OOD）的用户输入，以避免执行错误的操作。这项工作重点关注 OOD 检测的一个具有挑战性的案例，其中无法访问域内数据的标签（例如，没有意图分类任务的意图标签）。为此，我们首先评估基于不同语言模型的方法来预测标记序列的可能性。此外，我们提出了一种新的基于表示学习的方法，通过结合无监督聚类和对比学习，以便可以学习更好的 OOD 检测数据表示。通过大量的实验，我们证明该方法可以显着优于基于可能性的方法，甚至可以与最先进的带有标签信息的监督方法竞争。

更新日期：2022-03-24

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文