当前位置: X-MOL 学术Stat. Anal. Data Min. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Data-driven sparse partial least squares
Statistical Analysis and Data Mining ( IF 1.3 ) Pub Date : 2021-10-18 , DOI: 10.1002/sam.11558
Hadrien Lorenzo 1, 2, 3 , Olivier Cloarec 2 , Rodolphe Thiébaut 1, 4, 5, 6 , Jérôme Saracco 1, 3
Affiliation  

In the supervised high dimensional settings with a large number of variables and a low number of individuals, variable selection allows a simpler interpretation and more reliable predictions. That subspace selection is often managed with supervised tools when the real question is motivated by variable prediction. We propose a partial least square (PLS) based method, called data-driven sparse PLS (ddsPLS), allowing variable selection both in the covariate and the response parts using a single hyperparameter per component. The subspace estimation is also performed by tuning a number of underlying parameters. The ddsPLS method is compared with existing methods such as classical PLS and two well established sparse PLS methods through numerical simulations. The observed results are promising both in terms of variable selection and prediction performance. This methodology is based on new prediction quality descriptors associated with the classical urn:x-wiley:19321864:media:sam11558:sam11558-math-0001 and urn:x-wiley:19321864:media:sam11558:sam11558-math-0002, and uses bootstrap sampling to tune parameters and select an optimal regression model.

中文翻译:

数据驱动的稀疏偏最小二乘法

在具有大量变量和少量个体的监督高维设置中,变量选择允许更简单的解释和更可靠的预测。当真正的问题是由变量预测驱动时,通常使用监督工具来管理子空间选择。我们提出了一种基于偏最小二乘 (PLS) 的方法,称为数据驱动的稀疏 PLS (ddsPLS),允许在协变量和响应部分中使用每个组件的单个超参数来选择变量。子空间估计也是通过调整一些基础参数来执行的。ddsPLS 方法通过数值模拟与现有方法(如经典 PLS 和两种成熟的稀疏 PLS 方法)进行了比较。观察到的结果在变量选择和预测性能方面都是有希望的。该方法基于与经典相关的新预测质量描述符骨灰盒:x-wiley:19321864:媒体:sam11558:sam11558-math-0001骨灰盒:x-wiley:19321864:媒体:sam11558:sam11558-math-0002,并使用自举抽样来调整参数并选择最佳回归模型。
更新日期:2021-10-18
down
wechat
bug