当前位置: X-MOL 学术Mach. Learn. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Transforming variables to central normality
Machine Learning ( IF 7.5 ) Pub Date : 2021-03-21 , DOI: 10.1007/s10994-021-05960-5
Jakob Raymaekers , Peter J. Rousseeuw

Many real data sets contain numerical features (variables) whose distribution is far from normal (Gaussian). Instead, their distribution is often skewed. In order to handle such data it is customary to preprocess the variables to make them more normal. The Box–Cox and Yeo–Johnson transformations are well-known tools for this. However, the standard maximum likelihood estimator of their transformation parameter is highly sensitive to outliers, and will often try to move outliers inward at the expense of the normality of the central part of the data. We propose a modification of these transformations as well as an estimator of the transformation parameter that is robust to outliers, so the transformed data can be approximately normal in the center and a few outliers may deviate from it. It compares favorably to existing techniques in an extensive simulation study and on real data.



中文翻译:

将变量转换为中心正态性

许多实际数据集包含数值特征(变量),其分布与正态分布(高斯分布)相差甚远。相反,它们的分布通常是歪斜的。为了处理此类数据,通常需要对变量进行预处理以使其更加正常。Box-Cox和Yeo-Johnson转换是众所周知的工具。但是,其变换参数的标准最大似然估计器对离群值高度敏感,并且通常会尝试以向内移动异常值,而以数据中心部分的正常性为代价。我们提出了对这些变换的修改以及对离群值具有鲁棒性的变换参数的估计,因此,变换后的数据在中心处可能近似正常,并且可能会有一些离群值偏离。

更新日期:2021-03-22
down
wechat
bug