当前位置: X-MOL 学术Int. J. Inf. Technol. Decis. Mak. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Detection and Correction of Abnormal Data with Optimized Dirty Data: A New Data Cleaning Model
International Journal of Information Technology & Decision Making ( IF 4.9 ) Pub Date : 2021-03-22 , DOI: 10.1142/s0219622021500188
Kumar Rahul 1 , Rohitash Kumar Banyal 2
Affiliation  

Each and every business enterprises require noise-free and clean data. There is a chance of an increase in dirty data as the data warehouse loads and refreshes a large quantity of data continuously from the various sources. Hence, in order to avoid the wrong conclusions, the data cleaning process becomes a vital one in various data-connected projects. This paper made an effort to introduce a novel data cleaning technique for the effective removal of dirty data. This process involves the following two steps: (i) dirty data detection and (ii) dirty data cleaning. The dirty data detection process has been assigned with the following process namely, data normalization, hashing, clustering, and finding the suspected data. In the clustering process, the optimal selection of centroid is the promising one and is carried out by employing the optimization concept. After the finishing of dirty data prediction, the subsequent process: dirty data cleaning begins to activate. The cleaning process also assigns with some processes namely, the leveling process, Huffman coding, and cleaning the suspected data. The cleaning of suspected data is performed based on the optimization concept. Hence, for solving all optimization problems, a new hybridized algorithm is proposed, the so-called Firefly Update Enabled Rider Optimization Algorithm (FU-ROA), which is the hybridization of the Rider Optimization Algorithm (ROA) and Firefly (FF) algorithm is introduced. To the end, the analysis of the performance of the implanted data cleaning method is scrutinized over the other traditional methods like Particle Swarm Optimization (PSO), FF, Grey Wolf Optimizer (GWO), and ROA in terms of their positive and negative measures. From the result, it can be observed that for iteration 12, the performance of the proposed FU-ROA model for test case 1 on was 0.013%, 0.7%, 0.64%, and 0.29% better than the extant PSO, FF, GWO, and ROA models, respectively.

中文翻译:

优化脏数据对异常数据的检测与纠正:一种新的数据清洗模型

每个企业都需要无噪音和干净的数据。随着数据仓库不断地从各种来源加载和刷新大量数据,脏数据可能会增加。因此,为了避免错误的结论,数据清洗过程成为各种数据连接项目中至关重要的一环。本文努力介绍一种新的数据清洗技术,以有效去除脏数据。该过程涉及以下两个步骤:(i)脏数据检测和(ii)脏数据清理。脏数据检测过程被分配了以下过程,即数据规范化、散列、聚类和发现可疑数据。在聚类过程中,质心的最优选择是有希望的,它是通过采用优化概念来进行的。脏数据预测完成后,接下来的流程:脏数据清洗开始启动。清洗过程还分配了一些过程,即调平过程、霍夫曼编码和清洗可疑数据。基于优化概念执行可疑数据的清洗。因此,为了解决所有优化问题,提出了一种新的混合算法,即所谓的 Firefly Update Enabled Rider Optimization Algorithm (FU-ROA),它是 Rider Optimization Algorithm (ROA) 和 Firefly (FF) 算法的混合介绍了。到最后,对植入数据清理方法的性能分析进行了仔细检查,并根据它们的正面和负面措施,对粒子群优化 (PSO)、FF、灰狼优化器 (GWO) 和 ROA 等其他传统方法进行了审查。从结果可以看出,对于第 12 次迭代,所提出的 FU-ROA 模型在测试用例 1 上的性能分别比现有的 PSO、FF、GWO、和 ROA 模型。
更新日期:2021-03-22
down
wechat
bug