当前位置: X-MOL 学术J. Am. Soc. Mass Spectrom. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Statistics in Proteomics: A Meta-analysis of 100 Proteomics Papers Published in 2019.
Journal of the American Society for Mass Spectrometry ( IF 3.1 ) Pub Date : 2020-05-01 , DOI: 10.1021/jasms.9b00142
David C L Handler 1 , Paul A Haynes 1
Affiliation  

We randomly selected 100 journal articles published in five proteomics journals in 2019 and manually examined each of them against a set of 13 criteria concerning the statistical analyses used, all of which were based on items mentioned in the journals' instructions to authors. This included questions such as whether a pilot study was conducted and whether false discovery rate calculation was employed at either the quantitation or identification stage. These data were then transformed to binary inputs, analyzed via machine learning algorithms, and classified accordingly, with the aim of determining if clusters of data existed for specific journals or if certain statistical measures correlated with each other. We applied a variety of classification methods including principal component analysis decomposition, agglomerative clustering, and multinomial and Bernoulli naïve Bayes classification and found that none of these could readily determine journal identity given extracted statistical features. Logistic regression was useful in determining high correlative potential between statistical features such as false discovery rate criteria and multiple testing corrections methods, but was similarly ineffective at determining correlations between statistical features and specific journals. This meta-analysis highlights that there is a very wide variety of approaches being used in statistical analysis of proteomics data, many of which do not conform to published journal guidelines, and that contrary to implicit assumptions in the field there are no clear correlations between statistical methods and specific journals.

中文翻译:

蛋白质组学统计学:对2019年发表的100篇蛋白质组学论文的荟萃分析。

我们随机选择了2019年在5个蛋白质组学期刊上发表的100篇期刊文章,并针对涉及使用的统计分析的13项标准对每篇文章进行了人工检查,所有这些标准均基于期刊发给作者的说明中提及的项目。这包括诸如是否进行了试点研究以及在定量或鉴定阶段是否采用了错误发现率计算等问题。然后,将这些数据转换为二进制输入,通过机器学习算法进行分析,然后进行相应分类,以确定是否存在特定期刊的数据簇,或者某些统计指标是否相互关联。我们应用了多种分类方法,包括主成分分析分解,聚集聚类,以及多项式和伯努利朴素贝叶斯分类法,发现在提取统计特征的情况下,这些方法都无法轻易确定期刊身份。Logistic回归在确定统计特征(例如错误发现率标准)和多种测试更正方法之间的高度相关性方面很有用,但在确定统计特征与特定期刊之间的相关性方面同样无效。这项荟萃分析突出表明,蛋白质组学数据的统计分析中使用了多种方法,其中许多方法不符合已发布的期刊指南,并且与该领域的隐含假设相反,统计之间没有明确的相关性。方法和特定期刊。
更新日期:2020-05-01
down
wechat
bug