Simultaneous feature and instance selection in big noisy data using memetic variable neighborhood search,Applied Soft Computing

当前位置： X-MOL 学术 › Appl. Soft Comput. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Simultaneous feature and instance selection in big noisy data using memetic variable neighborhood search
Applied Soft Computing ( IF 7.2 ) Pub Date : 2021-08-27 , DOI: 10.1016/j.asoc.2021.107855
Chun-Cheng Lin , Jia-Rong Kang , Yu-Lin Liang , Chih-Chi Kuo

In smart factories, the data collected by Internet-of-things sensors is enormous and includes a lot of noise and missing values. To address this big data problem, metaheuristic is one of the main approaches to data preprocessing, i.e., instance selection or feature selection before training the model. Most previous works on metaheuristic approaches rarely considered simultaneous instance selection and feature selection, and rarely focused on addressing big noisy data. Consequently, this work proposes a hybrid memetic algorithm (MA) with variable neighborhood search (VNS) to simultaneously select instances and features, in which MA performs excellently in data selection; and VNS has been shown to perform well in local search. To evaluate the performance of the proposed algorithm, this work creates simulation data by combining the datasets from the UCI with noisy data. The proposed algorithm for simultaneous feature and instance selection is adopted to reduce the simulation data, and then the reduced data is adopted to train a predictive model for later performance evaluation of model testing. As compared with other metaheuristics, the proposed algorithm achieves a balance between exploration and exploitation. Additionally, the results show that the proposed algorithm is more robust than other feature selection methods.

中文翻译：

使用模因变量邻域搜索在大噪声数据中同时选择特征和实例

在智能工厂中，物联网传感器收集的数据非常庞大，其中包含大量噪声和缺失值。为了解决这个大数据问题，元启发式是数据预处理的主要方法之一，即在训练模型之前进行实例选择或特征选择。大多数以前关于元启发式方法的工作很少考虑同时进行实例选择和特征选择，也很少关注解决大噪声数据。因此，这项工作提出了一种具有可变邻域搜索（VNS）的混合模因算法（MA）来同时选择实例和特征，其中MA在数据选择方面表现出色；并且 VNS 已被证明在本地搜索中表现良好。为了评估所提出算法的性能，这项工作通过将来自 UCI 的数据集与噪声数据相结合来创建模拟数据。采用所提出的同时特征和实例选择算法来减少仿真数据，然后采用减少的数据来训练预测模型，用于模型测试的后期性能评估。与其他元启发式相比，所提出的算法在探索和利用之间取得了平衡。此外，结果表明，所提出的算法比其他特征选择方法更鲁棒。所提出的算法实现了探索和开发之间的平衡。此外，结果表明，所提出的算法比其他特征选择方法更鲁棒。所提出的算法实现了探索和开发之间的平衡。此外，结果表明，所提出的算法比其他特征选择方法更鲁棒。

更新日期：2021-09-04

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11