当前位置: X-MOL 学术Connect. Sci. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Learning from data streams and class imbalance
Connection Science ( IF 3.2 ) Pub Date : 2019-04-03 , DOI: 10.1080/09540091.2019.1572975
Shuo Wang 1 , Leandro L. Minku 2 , Nitesh Chawla 3 , Xin Yao 4
Affiliation  

With the wide application of machine learning algorithms to the real world, class imbalanceandconceptdrift havebecomecrucial learning issues. Applications in variousdomains such as riskmanagement, anomaly detection, fraud detection, software engineering, social media mining, and recommender systems are affected by both class imbalance and concept drift. Class imbalance happens when the data categories are not equally represented, i.e., at least one category isminority compared toother categories. It can cause learningbias towards the majority class and poor generalisation. Concept drift is a change in the underlying distribution of the problem and is a significant issue specially when learning from data streams. It requires learners to be adaptive to dynamic changes. Class imbalance and concept drift can significantly hinder predictive performance. The problem becomes particularly challengingwhen they occur simultaneously, due to the fact that one problem can affect the treatment of the other. This special issue is composed of four high-quality papers studying the challenges andproposing new solutions for learning fromdata effectively and efficiently in the presence of class imbalance or concept drift. Resampling is the most popular set of techniques to overcome class imbalance in data. One-class learning methods can also be effective on imbalanced data. In the paper “Learning in Presence of Class Imbalance and Class Overlapping by Using One-class SVM and Undersampling Technique”, the authors proposed to use both undersampling and oneclass learning to preprocess data. One-class SVM is used to detect overlapping regions, and Tomek-link is used to further clean up the overlapping data and balance the data set. The proposed method is compared with other six state-of-the-art methods over seven binary and twomulti-class data sets, showingbetter accuracy onminority classeswithout harming majority-class accuracy. Different fromdata-level techniques and cost-sensitivemethods, the paper “AWeighted Pattern Matching Approach for Classification of Imbalanced Data with a Fireworks Based Algorithm for Feature Selection” proposed a novel weighted pattern matching method to classify imbalanced data, combining fireworks algorithms to select the best set of features for learning. The experiment was performed on 44 binary and 15multi-class data sets with class imbalance difficulty. The proposedmethod showed competitive performance in comparison with other state-of-the-art methods. As mentioned earlier, the class imbalance issue exists in many real-world applications. In the paper “Semantic Segmentation of High Resolution Remote Sensing Images Using Fully Convolutional Networkwith Adaptive Threshold”, the class imbalance issue in semantic segmentation is studied. Semantic segmentation is a multi-class classification problem in remote sensing. To overcome class imbalance, a fully convolutional neural network with an adaptive threshold of the Jaccard index is proposed. The experimental results showed superior classification performance on remote sensing images.

中文翻译:

从数据流和类不平衡中学习

随着机器学习算法在现实世界中的广泛应用,类不平衡和概念漂移已经成为关键的学习问题。风险管理、异常检测、欺诈检测、软件工程、社交媒体挖掘和推荐系统等各个领域的应用都受到类不平衡和概念漂移的影响。当数据类别没有被平等地表示时,就会发生类别不平衡,即与其他类别相比,至少一个类别是少数。它会导致对多数类的学习偏差和泛化能力差。概念漂移是问题潜在分布的变化,特别是在从数据流中学习时,这是一个重要问题。它要求学习者适应动态变化。类别不平衡和概念漂移会严重阻碍预测性能。当它们同时发生时,问题变得特别具有挑战性,因为一个问题会影响另一个问题的治疗。本期特刊由四篇高质量论文组成,研究挑战并提出了在存在类不平衡或概念漂移的情况下有效和高效地从数据中学习的新解决方案。重采样是克服数据中类别不平衡的最流行的一组技术。一类学习方法也可以对不平衡数据有效。在“Learning in Presence of Class Imbalance and Class Overlapping by Using One-class SVM and Undersampling Technique”一文中,作者提出同时使用欠采样和oneclass学习来预处理数据。一类 SVM 用于检测重叠区域,而Tomek-link则用于进一步清理重叠数据,平衡数据集。所提出的方法与其他六种最先进的方法在七个二进制和两个多类数据集上进行了比较,在不损害多数类准确性的情况下显示了对少数类的更好的准确性。与数据级技术和成本敏感方法不同,论文“AWeighted Pattern Matching Approach for Classification of Imbalanced Data with a Fireworks Based Algorithm for Feature Selection”提出了一种新颖的加权模式匹配方法对不平衡数据进行分类,结合烟花算法选择最佳用于学习的一组特征。实验在 44 个二元数据集和 15 个多类数据集上进行,具有类别不平衡难度。与其他最先进的方法相比,所提出的方法显示出具有竞争力的性能。如前所述,类不平衡问题存在于许多实际应用中。在论文“Semantic Segmentation of High Resolution Remote Sensing Images Usingfully Convolutional Network with Adaptive Threshold”中,研究了语义分割中的类不平衡问题。语义分割是遥感中的多类分类问题。为了克服类别不平衡,提出了一种具有 Jaccard 指数自适应阈值的全卷积神经网络。实验结果表明,在遥感图像上具有优越的分类性能。研究语义分割中的类不平衡问题。语义分割是遥感中的多类分类问题。为了克服类别不平衡,提出了一种具有 Jaccard 指数自适应阈值的全卷积神经网络。实验结果表明,在遥感图像上具有优越的分类性能。研究语义分割中的类不平衡问题。语义分割是遥感中的多类分类问题。为了克服类别不平衡,提出了一种具有 Jaccard 指数自适应阈值的全卷积神经网络。实验结果表明,在遥感图像上具有优越的分类性能。
更新日期:2019-04-03
down
wechat
bug