当前位置: X-MOL 学术Chemometr. Intell. Lab. Systems › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Improving discrimination of Raman spectra by optimising preprocessing strategies on the basis of the ability to refine the relationship between variance components
Chemometrics and Intelligent Laboratory Systems ( IF 3.9 ) Pub Date : 2020-07-01 , DOI: 10.1016/j.chemolab.2020.104029
Agnieszka Martyna , Alicja Menżyk , Alessandro Damin , Aleksandra Michalska , Gianmario Martra , Eugenio Alladio , Grzegorz Zadora

Abstract Discrimination of the samples into predefined groups is the issue at hand in many fields, such as medicine, environmental and forensic studies, etc. Its success strongly depends on the effectiveness of groups separation, which is optimal when the group means are much more distant than the data within the groups, i.e. the variation of the group means is greater than the variation of the data averaged over all groups. The task is particularly demanding for signals (e.g. spectra) as a lot of effort is required to prepare them in a way to uncover interesting features and turn them into more meaningful information that better fits for the purpose of data analysis. The solution can be adequately handled by using preprocessing strategies which should highlight the features relevant for further analysis (e.g. discrimination) by removing unwanted variation, deteriorating effects, such as noise or baseline drift, and standardising the signals. The aim of the research was to develop an automated procedure for optimising the choice of the preprocessing strategy to make it most suitable for discrimination purposes. The authors propose a novel concept to assess the goodness of the preprocessing strategy using the ratio of the between-groups to within-groups variance on the first latent variable derived from regularised MANOVA that is capable of exposing the groups differences for highly multidimensional data. The quest for the best preprocessing strategy was carried out using the grid search and much more efficient genetic algorithm. The adequacy of this novel concept, that remarkably supports the discrimination analysis, was verified through the assessment of the capability of solving two forensic comparison problems - discrimination between differently-aged bloodstains and various car paints described by Raman spectra - using likelihood ratio framework, as a recommended tool for discriminating samples in the forensics.



摘要 将样本区分为预定义的组是许多领域的问题,例如医学、环境和法医研究等。它的成功在很大程度上取决于组分离的有效性,当组均值相距较远时这是最佳的比组内的数据,即组均值的变化大于所有组平均数据的变化。该任务对信号(例如光谱)的要求特别高,因为需要付出大量努力来准备它们,以揭示有趣的特征并将它们转化为更有意义的信息,从而更适合数据分析的目的。通过使用预处理策略可以充分处理该解决方案,该策略应突出显示与进一步分析相关的特征(例如 通过消除不需要的变化、恶化的影响(如噪声或基线漂移)和标准化信号。该研究的目的是开发一种自动化程序,用于优化预处理策略的选择,使其最适合用于区分目的。作者提出了一个新概念来评估预处理策略的优劣,该方法使用来自正则化 MANOVA 的第一个潜在变量的组间方差与组内方差的比率,该潜在变量能够暴露高度多维数据的组差异。使用网格搜索和更有效的遗传算法来寻求最佳预处理策略。这个新概念的充分性,显着地支持了歧视分析,