当前位置: X-MOL 学术IEEE Trans. Knowl. Data. Eng. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Impacts of Fractional Hot-Deck Imputation on Learning and Prediction of Engineering Data
IEEE Transactions on Knowledge and Data Engineering ( IF 8.9 ) Pub Date : 2020-12-01 , DOI: 10.1109/tkde.2019.2922638
Ikkyun Song , Yicheng Yang , Jongho Im , Tong Tong , Halil Ceylan , In Ho Cho

In broad engineering fields, missing data is a common issue which often causes undesired bias and sparseness impeding rigorous data analyses. To tackle this problem, many imputation theories have been proposed and widely used. However, prior methods often require distributional assumptions and prior knowledge regarding data which may cause some difficulty for engineering research. Essentially, the fractional hot-deck imputation (FHDI) is an assumption-free imputation method, holding broad applicability in the engineering domains. FHDIs internal parameters and impact on statistical and machine learning methods, however, have been rarely understood. Thus, this study investigates the behavior and impacts of FHDI on prediction methods including generalized additive model, support vector machine, extremely randomized trees, and artificial neural network, for which four practical datasets (appliance energy, air quality, phenotypes, and weather) are used. Results show that FHDI performs better for improving the prediction accuracy compared to a simple naive method which cures missing data using the mean value of attributes, and FHDI has an asymptotically positive effect on prediction accuracy with decreasing response rates. Regarding an optimal setting, 30 to 35 is recommended for the FHDIs internal categorization number while 5 is recommended for the FHDI donors, which is aligned with Rubins recommendation.

中文翻译:

分数热甲板插补对工程数据学习和预测的影响

在广泛的工程领域中,缺失数据是一个常见问题,它往往会导致不希望的偏差和稀疏性,从而阻碍严格的数据分析。为了解决这个问题,已经提出并广泛使用了许多插补理论。然而,先验方法通常需要关于数据的分布假设和先验知识,这可能会给工程研究带来一些困难。本质上,分数热甲板插补(FHDI)是一种无假设插补方法,在工程领域具有广泛的适用性。然而,FHDI 的内部参数以及对统计和机器学习方法的影响却鲜为人知。因此,本研究调查了 FHDI 对预测方法的行为和影响,包括广义加性模型、支持向量机、极端随机树、和人工神经网络,其中使用了四个实用数据集(电器能源、空气质量、表型和天气)。结果表明,与使用属性平均值修复缺失数据的简单朴素方法相比,FHDI 在提高预测精度方面表现更好,并且 FHDI 随着响应率的降低对预测精度具有渐进的积极影响。关于最佳设置,建议 FHDI 内部分类编号为 30 到 35,而建议 FHDI 捐赠者为 5,这与 Rubins 建议一致。结果表明,与使用属性平均值修复缺失数据的简单朴素方法相比,FHDI 在提高预测精度方面表现更好,并且 FHDI 随着响应率的降低对预测精度具有渐进的积极影响。关于最佳设置,建议 FHDI 内部分类编号为 30 到 35,而建议 FHDI 捐赠者为 5,这与 Rubins 建议一致。结果表明,与使用属性平均值修复缺失数据的简单朴素方法相比,FHDI 在提高预测精度方面表现更好,并且 FHDI 随着响应率的降低对预测精度具有渐近的积极影响。关于最佳设置,建议 FHDI 内部分类编号为 30 到 35,而建议 FHDI 捐赠者为 5,这与 Rubins 建议一致。
更新日期:2020-12-01
down
wechat
bug