当前位置: X-MOL 学术arXiv.cs.IR › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Multi-class Text Classification using BERT-based Active Learning
arXiv - CS - Information Retrieval Pub Date : 2021-04-27 , DOI: arxiv-2104.14289
Sumanth Prabhu, Moosa Mohamed, Hemant Misra

Text Classification finds interesting applications in the pickup and delivery services industry where customers require one or more items to be picked up from a location and delivered to a certain destination. Classifying these customer transactions into multiple categories helps understand the market needs for different customer segments. Each transaction is accompanied by a text description provided by the customer to describe the products being picked up and delivered which can be used to classify the transaction. BERT-based models have proven to perform well in Natural Language Understanding. However, the product descriptions provided by the customers tend to be short, incoherent and code-mixed (Hindi-English) text which demands fine-tuning of such models with manually labelled data to achieve high accuracy. Collecting this labelled data can prove to be expensive. In this paper, we explore Active Learning strategies to label transaction descriptions cost effectively while using BERT to train a transaction classification model. On TREC-6, AG's News Corpus and an internal dataset, we benchmark the performance of BERT across different Active Learning strategies in Multi-Class Text Classification.

中文翻译:

使用基于BERT的主动学习进行多类文本分类

文本分类在取件和送货服务行业中找到了有趣的应用程序,在这些行业中,客户需要将一个或多个物品从某个位置取走并运送到某个目的地。将这些客户交易分为多个类别,有助于了解不同客户群的市场需求。每笔交易都附有由客户提供的文字描述,以描述要提取和交付的产品,这些产品可用于对交易进行分类。实践证明,基于BERT的模型在“自然语言理解”中表现良好。但是,客户提供的产品描述往往简短,不连贯和代码混合(印地文-英文)文本,这要求使用人工标记的数据对此类模型进行微调,以实现高精度。收集这些带标签的数据可能会很昂贵。在本文中,我们探索了主动学习策略来有效地标记交易描述,同时使用BERT训练交易分类模型。在TREC-6,AG的新闻语料库和内部数据集上,我们在多类文本分类的不同主动学习策略中对BERT的性能进行了基准测试。
更新日期:2021-04-30
down
wechat
bug