当前位置: X-MOL 学术Log. J. IGPL › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Non-removal strategy for outliers in predictive models: The PAELLA algorithm case
Logic Journal of the IGPL ( IF 0.6 ) Pub Date : 2019-12-09 , DOI: 10.1093/jigpal/jzz052
Manuel Castejón-limas 1 , Hector Alaiz-Moreton 2 , Laura Fernández-Robles 1 , Javier Alfonso-Cendón 1 , Camino Fernández-Llamas 1 , lidia Sánchez-González 1 , Hilde Pérez 1
Affiliation  

This paper reports the experience of using the PAELLA algorithm as a helper tool in robust regression instead of as originally intended for outlier identification and removal. This novel usage of the algorithm takes advantage of the occurrence vector calculated by the algorithm in order to strengthen the effect of the more reliable samples and lessen the impact of those that otherwise would be considered outliers. Following that aim, a series of experiments is conducted in order to learn how to better use the information contained in the occurrence vector. Using a contrively difficult artificial data set, a reference predictive model is fit using the whole raw dataset. The second experiment reports the results of fitting a similar predictive model but discarding the samples marked as outliers by PAELLA. The third experiment uses the occurrence vector provided by PAELLA in order to classify the observations in multiple bins and fit every possible model changing which bins are considered for fitting and which are discarded in that particular model. The fourth experiment introduces a sampling process before fitting in which the occurrence vector represents the likelihood of being considered in the training data set. The fifth experiment considers the sampling process as an internal step to be performed interleaved between the training epochs. The last experiment compares our approach using weighted neural networks to a state of the art method.

中文翻译:

预测模型中异常值的非删除策略:PAELLA算法案例

本文报告了使用PAELLA算法作为鲁棒回归中的辅助工具的经验,而不是最初用于异常值识别和去除的经验。该算法的这种新颖用法充分利用了由该算法计算出的出现向量,从而增强了更可靠样本的效果,并减少了那些否则会被认为是异常值的样本的影响。遵循该目标,进行了一系列实验,以学习如何更好地使用出现矢量中包含的信息。使用困难的人工数据集,可以使用整个原始数据集拟合参考预测模型。第二个实验报告了拟合相似预测模型的结果,但丢弃了PAELLA标记为异常值的样本。第三个实验使用PAELLA提供的发生向量,以便将观察结果分类为多个箱,并拟合每个可能的模型,从而更改在特定模型中考虑将哪些箱进行拟合以及将哪些箱丢弃。第四个实验在拟合之前引入了一个采样过程,其中出现向量表示在训练数据集中被考虑的可能性。第五个实验将采样过程视为要在训练时期之间进行的内部步骤。最后一个实验将使用加权神经网络的方法与最新方法进行了比较。第四个实验在拟合之前引入了一个采样过程,其中出现向量表示在训练数据集中被考虑的可能性。第五个实验将采样过程视为要在训练时期之间进行的内部步骤。最后一个实验将使用加权神经网络的方法与最新方法进行了比较。第四个实验在拟合之前引入了一个采样过程,其中出现向量表示在训练数据集中被考虑的可能性。第五个实验将采样过程视为要在训练时期之间进行的内部步骤。最后一个实验将使用加权神经网络的方法与最新方法进行了比较。
更新日期:2019-12-09
down
wechat
bug