当前位置: X-MOL 学术Annu. Rev. Stat. Appl. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Compositional Data Analysis
Annual Review of Statistics and Its Application ( IF 7.4 ) Pub Date : 2021-03-08 , DOI: 10.1146/annurev-statistics-042720-124436
Michael Greenacre 1
Affiliation  

Compositional data are nonnegative data carrying relative, rather than absolute, information—these are often data with a constant-sum constraint on the sample values, for example, proportions or percentages summing to 1% or 100%, respectively. Ratios between components of a composition are important since they are unaffected by the particular set of components chosen. Logarithms of ratios (logratios) are the fundamental transformation in the ratio approach to compositional data analysis—all data thus need to be strictly positive, so that zero values present a major problem. Components that group together based on domain knowledge can be amalgamated (i.e., summed) to create new components, and this can alleviate the problem of data zeros. Once compositional data are transformed to logratios, regular univariate and multivariate statistical analysis can be performed, such as dimension reduction and clustering, as well as modeling. Alternative methodologies that come close to the ideals of the logratio approach are also considered, especially those that avoid the problem of data zeros, which is particularly acute in large bioinformatic data sets.

中文翻译:


成分数据分析

成分数据是携带相对(而非绝对)信息的非负数据,这些数据通常是对样本值具有恒定和约束的数据,例如,比例或百分比的总和分别为1%或100%。组成成分之间的比例很重要,因为它们不受所选的特定成分组的影响。比率的对数(logratios)是比率方法在成分数据分析中的基本转换-因此,所有数据都必须严格地为正数,以便零值成为主要问题。可以将基于领域知识分组在一起的组件合并(即相加)以创建新的组件,这可以缓解数据零的问题。将成分数据转换为对数后,可以执行常规的单变量和多变量统计分析,例如降维和聚类以及建模。还考虑了接近对数比方法理想的替代方法,尤其是那些避免了数据零问题的方法,这在大型生物信息数据集中尤为突出。

更新日期:2021-03-09
down
wechat
bug