当前位置: X-MOL 学术Ind. Eng. Chem. Res. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Consistency-Enhanced Evolution for Variable Selection Can Identify Key Chemical Information from Spectroscopic Data
Industrial & Engineering Chemistry Research ( IF 3.8 ) Pub Date : 2020-02-13 , DOI: 10.1021/acs.iecr.9b06049
Jangwon Lee 1 , Jesus Flores-Cerrillo 2 , Jin Wang 1 , Q. Peter He 1
Affiliation  

In the last few decades, spectroscopic techniques such as near-infrared (NIR) spectroscopy have gained wide applications in several industries, such as the pharmaceutical, agricultural, oil, and gas industries. As a result, various soft sensors have been developed to predict sample properties from spectroscopic readings. Because the spectroscopic readings at different wavelengths, especially at the adjacent wavelengths, are highly correlated, it has been shown that variable selection could significantly improve a soft sensor’s prediction performance while reducing the model complexity. To improve the prediction performance, most variable selection methods focus on identifying the variables (i.e., wavelengths or wavelength segments) that are strongly correlated with the dependent variable. Although many successful applications have been reported, these variable selection methods do have their limitations. Specifically, the selected wavelengths sometimes show little connection to the chemical bounds or functional groups presenting in the sample. In addition, the selected variables can be quite sensitive to the choice of the training samples. In this work, we address these limitations from a different perspective: if a variable selection algorithm can identify the truly relevant input variables, it should consistently identify the same subset of variables regardless of the choice of the training samples. Therefore, we propose a variable selection method that aims to improve the consistency of variable selection resulting from different training samples. The new algorithm is termed consistency-enhanced evolution for variable selection (CEEVS). To demonstrate the performance and robustness of CEEVS, we compare the proposed method with three representative variable selection methods using five published NIR data sets. These case studies clearly demonstrate that by improving the variable selection consistency, we can not only achieve improved prediction performance, but also identify key chemical information from spectroscopic data.

中文翻译:

变量选择的一致性增强演化可以从光谱数据中识别关键化学信息

在过去的几十年中,诸如近红外(NIR)光谱学之类的光谱技术已在制药,农业,石油和天然气等多个行业中得到了广泛的应用。结果,已经开发出各种软传感器以根据光谱读数来预测样品特性。由于不同波长(尤其是相邻波长)的光谱读数高度相关,因此显示出变量选择可以显着提高软传感器的预测性能,同时降低模型复杂性。为了提高预测性能,大多数变量选择方法着重于识别与因变量密切相关的变量(即波长或波长段)。尽管已经报道了许多成功的应用,这些变量选择方法确实有其局限性。具体而言,所选波长有时显示与样品中存在的化学键或官能团几乎没有联系。此外,所选变量可能对训练样本的选择非常敏感。在这项工作中,我们从不同的角度解决了这些局限性:如果变量选择算法可以识别真正相关的输入变量,则无论选择训练样本如何,它都应一致地识别变量的相同子集。因此,我们提出了一种变量选择方法,旨在提高不同训练样本产生的变量选择的一致性。新算法被称为变量选择一致性增强进化(CEEVS)。为了证明CEEVS的性能和鲁棒性,我们使用五个已发布的NIR数据集,将该提议的方法与三种代表性的变量选择方法进行了比较。这些案例研究清楚地表明,通过改善变量选择的一致性,我们不仅可以实现改进的预测性能,而且还可以从光谱数据中识别关键的化学信息。
更新日期:2020-02-14
down
wechat
bug