当前位置: X-MOL 学术Int. J. Intell. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A fuzzy data augmentation technique to improve regularisation
International Journal of Intelligent Systems ( IF 7 ) Pub Date : 2021-11-08 , DOI: 10.1002/int.22731
Rukshima Dabare 1, 2 , Kok Wai Wong 1 , Mohd Fairuz Shiratuddin 1 , Polychronis Koutsakis 1

Deep learning (DL) has achieved superior classification in many applications due to its capability of extracting features from the data. However, the success of DL comes with the tradeoff of possible overfitting. The bias towards the data it has seen during the training process leads to poor generalisation. One way of solving this issue is by having enough training data so that the classifier is invariant to many data patterns. In the literature, data augmentation has been used as a type of regularisation method to reduce the chance for the model to overfit. However, most of the relevant works focus on image, sound or text data. There is not much work on numerical data augmentation, although many real-world problems deal with numerical data. In this paper, we propose using a technique based on Fuzzy C-Means clustering and fuzzy membership grades. Fuzzy-related techniques are used to address the variance problem by generating new data items based on fuzzy numbers and each data item's belongings to different fuzzy clusters. This data augmentation technique is used to improve the generalisation of a Deep Neural Network that is suitable for numerical data. By combining the proposed fuzzy data augmentation technique with the Dropout regularisation technique, we manage to balance the classification model's bias-variance tradeoff. Our proposed technique is evaluated using four popular data sets and is shown to provide better regularisation and higher classification accuracy compared with popular regularisation approaches.



深度学习 (DL) 由于能够从数据中提取特征,因此在许多应用中实现了卓越的分类。然而,DL 的成功伴随着可能的过度拟合的权衡。对训练过程中看到的数据的偏见导致泛化能力差。解决此问题的一种方法是拥有足够的训练数据,以便分类器对许多数据模式保持不变。在文献中,数据增强已被用作一种正则化方法,以减少模型过拟合的机会。然而,大多数相关工作都集中在图像、声音或文本数据上。尽管许多现实世界的问题都涉及数值数据,但在数值数据增强方面的工作并不多。在本文中,我们建议使用基于 Fuzzy C的技术- 表示聚类和模糊成员等级。模糊相关技术用于通过基于模糊数生成新数据项以及每个数据项属于不同的模糊聚类来解决方差问题。这种数据增强技术用于改进适用于数值数据的深度神经网络的泛化能力。通过将所提出的模糊数据增强技术与 Dropout 正则化技术相结合,我们设法平衡了分类模型的偏差-方差权衡。我们提出的技术使用四个流行的数据集进行评估,与流行的正则化方法相比,它显示出提供更好的正则化和更高的分类精度。