Detecting outlying variables in multigroup data: A comparison of different loading similarity coefficients,Journal of Chemometrics

当前位置： X-MOL 学术 › J. Chemometr. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Detecting outlying variables in multigroup data: A comparison of different loading similarity coefficients
Journal of Chemometrics ( IF 1.9 ) Pub Date : 2020-03-20 , DOI: 10.1002/cem.3233
Sopiko Gvaladze ₁ , Kim De Roover _{1,

2} , Francis Tuerlinckx ₁ , Eva Ceulemans ₁

Affiliation

Multivariate multigroup data are collected in many fields of science, where the so‐called groups pertain to, for instance, experimental groups or countries the participants are nested in. To summarize the main information in such data, principal component analysis (PCA) is highly popular. PCA reduces the variables to a few components that are linear combinations of the original variables. Researchers usually assume those components to be the same across the groups and aim to apply a simultaneous component analysis. To investigate whether this assumption is reasonable, one often analyzes the groups separately and computes a similarity index between the group‐specific component loadings of the variables. In many cases, however, most variables have highly similar loadings across the groups, but a few variables, which we will call “outlying variables,” behave differently, indicating that a simultaneous analysis is not warranted. In such cases, the outlying variables should be removed before proceeding with the simultaneous analysis. To do so, the variables are ranked according to their relative outlyingness. Although some procedures have been proposed that yield such an outlyingness ranking, they might not be optimal, because they all rely on the same choice of similarity coefficient without evaluating other alternatives. In this paper, we give an overview of other options and report extensive simulations in which we investigate how this choice affects the correctness of the outlyingness ranking. We also illustrate the added value of the outlying variable approach by means of sensometric data on different bread samples.

中文翻译：

在多组数据中检测离群变量：不同负载相似系数的比较

多变量多组数据是在许多科学领域中收集的，所谓的组涉及例如实验组或参与者所嵌套的国家。要归纳此类数据中的主要信息，主成分分析（PCA）是高度重要的受欢迎的。PCA将变量减少为几个部分，这些部分是原始变量的线性组合。研究人员通常认为这些成分在各组中是相同的，并旨在应用同时进行的成分分析。为了研究这一假设是否合理，人们经常分别分析各组并计算变量在各组特定组件之间的相似性指数。不过，在许多情况下，大多数变量在各个组中的加载情况非常相似，但有几个变量（我们称为“外部变量”，”的行为会有所不同，表明不要求同时进行分析。在这种情况下，应先删除外围变量，然后再进行同时分析。为此，将根据变量的相对外在程度对其进行排名。尽管已经提出了一些程序来产生这样的离群值排名，但是它们可能不是最佳的，因为它们都依赖于相似系数的相同选择，而没有评估其他选择。在本文中，我们给出了其他选择的概述，并报告了广泛的模拟，在其中我们研究了这种选择如何影响外围排名的正确性。我们还通过不同面包样品上的感测数据说明了外围变量方法的附加值。表示不需要同时进行分析。在这种情况下，应先删除外围变量，然后再进行同时分析。为此，将根据变量的相对外在程度对其进行排名。尽管已经提出了一些程序来产生这样的离群值排名，但是它们可能不是最佳的，因为它们都依赖于相似系数的相同选择，而没有评估其他选择。在本文中，我们给出了其他选择的概述，并报告了广泛的模拟，在其中我们研究了这种选择如何影响外围排名的正确性。我们还通过不同面包样品上的感测数据说明了外围变量方法的附加值。表示不需要同时进行分析。在这种情况下，应先删除外围变量，然后再进行同时分析。为此，将根据变量的相对外在程度对其进行排名。尽管已经提出了一些程序来产生这样的离群值排名，但是它们可能不是最佳的，因为它们都依赖于相似系数的相同选择，而没有评估其他选择。在本文中，我们给出了其他选择的概述，并报告了广泛的模拟，在其中我们研究了这种选择如何影响外围排名的正确性。我们还通过不同面包样品上的感测数据说明了外围变量方法的附加值。在进行同步分析之前，应删除外围变量。为此，将根据变量的相对外在程度对其进行排名。尽管已经提出了一些程序来产生这样的离群值排名，但是它们可能不是最佳的，因为它们都依赖于相似系数的相同选择，而没有评估其他选择。在本文中，我们给出了其他选择的概述，并报告了广泛的模拟，在其中我们研究了这种选择如何影响外围排名的正确性。我们还通过不同面包样品上的感测数据说明了外围变量方法的附加值。在进行同步分析之前，应删除外围变量。为此，将根据变量的相对外在程度对其进行排名。尽管已经提出了一些程序来产生这样的离群值排名，但是它们可能不是最佳的，因为它们都依赖于相似系数的相同选择，而没有评估其他选择。在本文中，我们给出了其他选择的概述，并报告了广泛的模拟，在其中我们研究了这种选择如何影响外围排名的正确性。我们还通过不同面包样品上的感测数据说明了外围变量方法的附加值。尽管已经提出了一些程序来产生这样的离群值排名，但是它们可能不是最佳的，因为它们都依赖于相似系数的相同选择，而没有评估其他选择。在本文中，我们给出了其他选择的概述，并报告了广泛的模拟，在其中我们研究了这种选择如何影响外围排名的正确性。我们还通过不同面包样品上的感测数据说明了外围变量方法的附加值。尽管已经提出了一些程序来产生这样的离群值排名，但是它们可能不是最佳的，因为它们都依赖于相似系数的相同选择，而没有评估其他选择。在本文中，我们给出了其他选择的概述，并报告了广泛的模拟，在其中我们研究了这种选择如何影响外围排名的正确性。我们还通过不同面包样品上的感测数据说明了外围变量方法的附加值。

更新日期：2020-03-20

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11