当前位置: X-MOL 学术arXiv.cs.CL › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Sequential Targeting: an incremental learning approach for data imbalance in text classification
arXiv - CS - Computation and Language Pub Date : 2020-11-20 , DOI: arxiv-2011.10216
Joel Jang, Yoonjeon Kim, Kyoungho Choi, Sungho Suh

Classification tasks require a balanced distribution of data to ensure the learner to be trained to generalize over all classes. In real-world datasets, however, the number of instances vary substantially among classes. This typically leads to a learner that promotes bias towards the majority group due to its dominating property. Therefore, methods to handle imbalanced datasets are crucial for alleviating distributional skews and fully utilizing the under-represented data, especially in text classification. While addressing the imbalance in text data, most methods utilize sampling methods on the numerical representation of the data, which limits its efficiency on how effective the representation is. We propose a novel training method, Sequential Targeting(ST), independent of the effectiveness of the representation method, which enforces an incremental learning setting by splitting the data into mutually exclusive subsets and training the learner adaptively. To address problems that arise within incremental learning, we apply elastic weight consolidation. We demonstrate the effectiveness of our method through experiments on simulated benchmark datasets (IMDB) and data collected from NAVER.

中文翻译:

顺序目标:用于文本分类中数据不平衡的增量学习方法

分类任务需要平衡地分配数据,以确保对学习者进行培训以在所有课程中进行概括。但是,在现实世界的数据集中,实例的数量在类之间有很大不同。这通常会导致学习者由于其支配性而促进对多数群体的偏见。因此,处理不平衡数据集的方法对于缓解分布偏斜和充分利用代表性不足的数据至关重要,尤其是在文本分类中。在解决文本数据的不平衡问题时,大多数方法都在数据的数字表示形式上使用了采样方法,这限制了其在表示形式的有效性上的效率。我们提出一种新颖的训练方法,即顺序目标(ST),与表示方法的有效性无关,通过将数据分成互斥的子集并自适应地训练学习者,从而实施了增量学习设置。为了解决增量学习中出现的问题,我们应用了弹性权重合并。我们通过对模拟基准数据集(IMDB)和从NAVER收集的数据进行实验,证明了该方法的有效性。
更新日期:2020-11-23
down
wechat
bug