A reliable KNN filling approach for incomplete interval-valued data,Engineering Applications of Artificial Intelligence

当前位置： X-MOL 学术 › Eng. Appl. Artif. Intell. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A reliable KNN filling approach for incomplete interval-valued data
Engineering Applications of Artificial Intelligence ( IF 7.5 ) Pub Date : 2021-01-29 , DOI: 10.1016/j.engappai.2021.104175
Xiaobo Qi , Husheng Guo , Wenjian Wang

Interval-valued data (IVD) is a kind of data where each feature is an interval, and embeds the uncertainty and variability information. However, the missing values (lower or upper bound, or both of them are missed) may occur in the process of data acquisition and transmission, which may lead to obstacles for data processing. To obtain good results, it is important for IVD to process (often ignore or fill) the missing values. A dataset including missing values is named as incomplete interval-valued (IIV) set here. Some ignoring and filling methods for numeric or symbolic data have been proposed, but they cannot be applied for IIV datasets directly. In this work, a reliable k-nearest neighbor approach (RKNN) for incomplete interval-valued data (IIVD) is proposed. A combining rule to determine whether a datum including missing values should be ignored or filled is designed. Those samples with the missing value for each feature will be ignored directly. It is different from existing ignoring methods that need to set the percentage of missing entries. For the rest of missing samples, they will be filled according to their K complete nearest neighbors, which can ensure the filled value more reliable. In so doing, RKNN can exclude a small number of missing samples that may increase uncertainty, and avoid the repetition of the filled values (like median or a fixed constant). The experiment results on 12 synthetic datasets and 4 real-world datasets demonstrate that the proposed method can process the incomplete interval-valued data effectively, and obtain a good classification performance simultaneously.

中文翻译：

一种不完整的区间值数据的可靠KNN填充方法

间隔值数据（IVD）是一种数据，其中每个特征都是一个间隔，并且嵌入了不确定性和可变性信息。但是，丢失值（下限值或上限值，或者都丢失了这两个值）可能会在数据获取和传输过程中发生，这可能会导致数据处理的障碍。为了获得良好的结果，IVD处理（通常忽略或填充）缺失值非常重要。包含缺失值的数据集在此处设置为不完整间隔值（IIV）。已经提出了一些忽略或填充数字或符号数据的方法，但是它们不能直接应用于IIV数据集。在这项工作中，提出了一种不完整的间隔值数据（IIVD）的可靠的k最近邻方法（RKNN）。设计了确定是否应忽略或填充包含缺失值的基准的组合规则。具有每个功能的缺失值的那些样本将被直接忽略。它与需要设置丢失条目百分比的现有忽略方法不同。对于其余的丢失样本，将根据它们的K个最接近的最近邻居进行填充，这可以确保填充值更加可靠。这样，RKNN可以排除少量丢失的样本，这些样本可能会增加不确定性，并避免重复填充值（例如中值或固定常数）。在12个合成数据集和4个真实数据集上的实验结果表明，该方法可以有效地处理不完整的区间值数据，并同时获得良好的分类性能。

更新日期：2021-02-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11