Toward semantic data imputation for a dengue dataset,Knowledge-Based Systems

当前位置： X-MOL 学术 › Knowl. Based Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Toward semantic data imputation for a dengue dataset
Knowledge-Based Systems ( IF 7.2 ) Pub Date : 2020-03-26 , DOI: 10.1016/j.knosys.2020.105803
N. Kamkhad , K. Jampachaisri , P. Siriyasatien , K. Kesorn

Missing data are a major problem that affects data analysis techniques for forecasting. Traditional methods suffer from poor performance in predicting missing values using simple techniques, e.g., mean and mode. In this paper, we present and discuss a novel method of imputing missing values semantically with the use of an ontology model. We make three new contributions to the field: first, an improvement in the efficiency of predicting missing data utilizing Particle Swarm Optimization (PSO), which is applied to the numerical data cleansing problem, with the performance of PSO being enhanced using K-means to help determine the fitness value. Second, the incorporation of an ontology with PSO for the purpose of narrowing the search space, to make PSO provide greater accuracy in predicting numerical missing values while quickly converging on the answer. Third, the facilitation of a framework to substitute nominal data that are lost from the dataset using the relationships of concepts and a reasoning mechanism concerning the knowledge-based model. The experimental results indicated that the proposed method could estimate missing data more efficiently and with less chance of error than conventional methods, as measured by the root mean square error.

中文翻译：

登革热数据集的语义数据归因

数据丢失是影响预测数据分析技术的主要问题。传统方法在使用简单技术（例如均值和众数）预测缺失值时性能较差。在本文中，我们提出并讨论了一种使用本体模型在语义上估算缺失值的新方法。我们对该领域做出了三点新的贡献：首先，提高了使用粒子群优化（PSO）预测丢失数据的效率，该算法应用于数值数据清洗问题，而使用K增强了PSO的性能-有助于确定适合度的值。其次，将本体与PSO合并以缩小搜索空间的目的，以使PSO在预测数值缺失值时提供更高的准确性，同时快速收敛于答案。第三，使用概念的关系和基于知识的模型的推理机制，简化了一个框架来替代从数据集中丢失的名义数据。实验结果表明，与均方根误差相比，所提出的方法可以比常规方法更有效地估计丢失的数据，并且出错的机会更少。

更新日期：2020-03-27

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11