当前位置: X-MOL 学术J. Multivar. Anal. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Kick-one-out-based variable selection method for Euclidean distance-based classifier in high-dimensional settings
Journal of Multivariate Analysis ( IF 1.6 ) Pub Date : 2021-03-26 , DOI: 10.1016/j.jmva.2021.104756
Tomoyuki Nakagawa , Hiroki Watanabe , Masashi Hyodo

This paper presents a variable selection method for the Euclidean distance-based classifier in high-dimensional settings. We are concerned that the expected probabilities of misclassification (EPMC) for the Euclidean distance-based classifier may be increasing with dimension when redundant variables are included in feature values. First, we show the Euclidean distance-based classifier with only non-redundant variables reduces asymptotic EPMC more than the Euclidean distance-based classifier with all variables. Next, we obtain a kick-one-out based variable selection method that helps reduce EPMC and prove its consistency in variable selection in the context of high dimensionality. Finally, we conduct a Monte Carlo simulation study to examine the finite sample performance of the proposed selection method. Our simulation results show that the selection method frequently selects the set containing non-redundant variables. We also observed that the discrimination rules constructed from the selected variables reduce EPMC more than the discrimination rules constructed from all variables.



中文翻译:

高维背景下基于欧几里得距离分类器的基于踢一跳的变量选择方法

本文提出了一种在高维环境下基于欧氏距离的分类器的变量选择方法。我们担心,当冗余变量包含在特征值中时,基于欧氏距离的分类器的预期错误分类概率(EPMC)可能会随着维数的增加而增加。首先,我们展示了仅具有非冗余变量的基于欧几里德距离的分类器比具有所有变量的基于欧几里德距离的分类器更能减少渐近EPMC。接下来,我们获得了一种基于自动踢出的变量选择方法,该方法有助于减少EPMC并证明其在高维情况下变量选择中的一致性。最后,我们进行了蒙特卡洛模拟研究,以检验所提出选择方法的有限样本性能。我们的仿真结果表明,选择方法经常选择包含非冗余变量的集合。我们还观察到,由选定变量构成的判别规则比由所有变量构成的判别规则减少的EPMC更大。

更新日期:2021-04-06
down
wechat
bug