当前位置: X-MOL 学术Comput. Stat. Data Anal. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Robust variable selection for model-based learning in presence of adulteration
Computational Statistics & Data Analysis ( IF 1.8 ) Pub Date : 2021-01-26 , DOI: 10.1016/j.csda.2021.107186
Andrea Cappozzo , Francesca Greselin , Thomas Brendan Murphy

The problem of identifying the most discriminating features when performing supervised learning has been extensively investigated. In particular, several methods for variable selection have been proposed in model-based classification. The impact of outliers and wrongly labeled units on the determination of relevant predictors has instead received far less attention, with almost no dedicated methodologies available. Two robust variable selection approaches are introduced: one that embeds a robust classifier within a greedy-forward selection procedure and the other based on the theory of maximum likelihood estimation and irrelevance. The former recasts the feature identification as a model selection problem, while the latter regards the relevant subset as a model parameter to be estimated. The benefits of the proposed methods, in contrast with non-robust solutions, are assessed via an experiment on synthetic data. An application to a high-dimensional classification problem of contaminated spectroscopic data is presented.



中文翻译:

在掺假的情况下基于模型的学习的可靠变量选择

在进行监督学习时识别最有区别的特征的问题已得到广泛研究。特别地,在基于模型的分类中已经提出了几种用于变量选择的方法。异常值和标记错误的单位对相关预测变量的确定所产生的影响却很少受到关注,几乎没有专门的方法可用。介绍了两种鲁棒的变量选择方法:一种将鲁棒的分类器嵌入贪婪的选择过程中,另一种基于最大似然估计和不相关性理论。前者将特征标识重铸为模型选择问题,而后者将相关子集视为要估计的模型参数。拟议方法的好处,与非鲁棒性解决方案相反,通过合成数据实验进行评估。提出了一种在污染光谱数据的高维分类问题中的应用。

更新日期:2021-02-08
down
wechat
bug