当前位置: X-MOL 学术J. Chemometr. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Modified PCA and PLS: Towards a better classification in Raman spectroscopy–based biological applications
Journal of Chemometrics ( IF 2.4 ) Pub Date : 2020-04-01 , DOI: 10.1002/cem.3202
Shuxia Guo 1, 2 , Petra Rösch 1, 2 , Jürgen Popp 1, 2 , Thomas Bocklitz 1, 2
Affiliation  

Raman spectra of biological samples often exhibit variations originating from changes of spectrometers, measurement conditions, and cultivation conditions. Such unwanted variations make a classification extremely challenging, especially if they are more significant compared with the differences between groups to be separated. A classifier is prone to such unwanted variations (ie, intragroup variations) and can fail to learn the patterns that can help separate different groups (ie, intergroup differences). This often leads to a poor generalization performance and a degraded transferability of the trained model. A natural solution is to separate the intragroup variations from the intergroup differences and build the classifier based on merely the latter information, for example, by a well‐designed feature extraction. This forms the idea of this contribution. Herein, we modified two commonly applied feature extraction approaches, principal component analysis (PCA) and partial least squares (PLS), in order to extract merely the features representing the intergroup differences. Both of the methods were verified with two Raman spectral datasets measured from bacterial cultures and colon tissues of mice, respectively. In comparison to ordinary PCA and PLS, the modified PCA was able to improve the prediction on the testing data that bears significant difference to the training data, while the modified PLS could help avoid overfitting and lead to a more stable classification.

中文翻译:

改进的 PCA 和 PLS:在基于拉曼光谱的生物应用中实现更好的分类

生物样品的拉曼光谱经常表现出源于光谱仪、测量条件和培养条件的变化的变化。这种不需要的变化使分类极具挑战性,尤其是当它们与要分离的组之间的差异相比更为显着时。分类器容易出现这种不需要的变化(即组内变化),并且可能无法学习有助于区分不同组的模式(即组间差异)。这通常会导致泛化性能不佳和训练模型的可迁移性降低。一个自然的解决方案是将组内差异与组间差异分开,并仅基于后者的信息构建分类器,例如,通过精心设计的特征提取。这形成了这个贡献的想法。在这里,我们修改了两种常用的特征提取方法,主成分分析(PCA)和偏最小二乘法(PLS),以便仅提取代表组间差异的特征。这两种方法都用分别从细菌培养物和小鼠结肠组织测量的两个拉曼光谱数据集进行了验证。与普通的 PCA 和 PLS 相比,改进的 PCA 能够提高对与训练数据有显着差异的测试数据的预测,而改进的 PLS 可以帮助避免过度拟合并导致更稳定的分类。为了仅提取代表组间差异的特征。这两种方法都用分别从细菌培养物和小鼠结肠组织测量的两个拉曼光谱数据集进行了验证。与普通的 PCA 和 PLS 相比,改进的 PCA 能够提高对与训练数据有显着差异的测试数据的预测,而改进的 PLS 可以帮助避免过度拟合并导致更稳定的分类。为了仅提取代表组间差异的特征。这两种方法都用分别从细菌培养物和小鼠结肠组织测量的两个拉曼光谱数据集进行了验证。与普通的 PCA 和 PLS 相比,改进的 PCA 能够提高对与训练数据有显着差异的测试数据的预测,而改进的 PLS 可以帮助避免过度拟合并导致更稳定的分类。
更新日期:2020-04-01
down
wechat
bug