当前位置: X-MOL 学术Chem. Eng. Commun. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A comparative study between PCR, PLSR, and LW-PLS on the predictive performance at different data splitting ratios
Chemical Engineering Communications ( IF 2.5 ) Pub Date : 2021-07-26 , DOI: 10.1080/00986445.2021.1957853
Teck Fu Thien 1 , Wan Sieng Yeo 1
Affiliation  

Abstract

Principal component regression (PCR), partial least squares regression (PLSR), and locally weighted partial least squares (LW-PLS) models are supervised learning methods in which a labeled dataset is used to train the model. The split-sample validation is normally used to train these models where a dataset is split into training and testing datasets to develop and evaluate the model. However, a limited study is done to evaluate the prediction performance of PCR, PLSR, and LW-PLS models at the different data splitting ratios. Hence, to address this research gap, this submitted work is conducted to investigate the predictive performance of the abovementioned regression models at the different split sample ratios for the data. Meanwhile, this study also serves to determine the optimal splitting ratios for PCR, PLSR, and LW-PLS models via a simple data splitting method where a minimum of 50% of the entire dataset is allocated to train the model. The optimal split is determined by evaluating the root mean squared error, coefficient of determination, and error of approximation (Ea) for five case studies. For PCR, PLSR, and LW-PLS models, LW-PLS performed better in most of the case studies since it copes better with the nonlinear data. Among these best models in each case study, it was found that the split-sample ratios of above 70% of training data had allowed major improvements in terms of predictive performance as compared to their base scenarios which have the largest Ea values.



中文翻译:

PCR、PLSR和LW-PLS在不同数据拆分率下预测性能的比较研究

摘要

主成分回归 (PCR)、偏最小二乘回归 (PLSR) 和局部加权偏最小二乘 (LW-PLS) 模型是监督学习方法,其中使用标记的数据集来训练模型。拆分样本验证通常用于训练这些模型,其中数据集被拆分为训练和测试数据集以开发和评估模型。然而,对 PCR、PLSR 和 LW-PLS 模型在不同数据拆分率下的预测性能进行了有限的研究。因此,为了解决这一研究空白,这项提交的工作旨在研究上述回归模型在数据的不同拆分样本比率下的预测性能。同时,本研究还有助于确定 PCR、PLSR、和 LW-PLS 模型通过简单的数据拆分方法,其中至少 50% 的整个数据集被分配来训练模型。通过评估均方根误差、确定系数和近似误差 (E a ) 五个案例研究。对于 PCR、PLSR 和 LW-PLS 模型,LW-PLS 在大多数案例研究中表现更好,因为它可以更好地处理非线性数据。在每个案例研究中的这些最佳模型中,发现与具有最大E a值的基本场景相比,超过 70% 的训练数据的拆分样本比率在预测性能方面取得了重大改进。

更新日期:2021-07-26
down
wechat
bug