当前位置: X-MOL 学术Stat. Appl. Genet. Molecul. Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Reproducibility of biomarker identifications from mass spectrometry proteomic data in cancer studies
Statistical Applications in Genetics and Molecular Biology ( IF 0.9 ) Pub Date : 2019-05-11 , DOI: 10.1515/sagmb-2018-0039
Yulan Liang 1 , Adam Kelemen 2 , Arpad Kelemen 3
Affiliation  

Reproducibility of disease signatures and clinical biomarkers in multi-omics disease analysis has been a key challenge due to a multitude of factors. The heterogeneity of the limited sample, various biological factors such as environmental confounders, and the inherent experimental and technical noises, compounded with the inadequacy of statistical tools, can lead to the misinterpretation of results, and subsequently very different biology. In this paper, we investigate the biomarker reproducibility issues, potentially caused by differences of statistical methods with varied distribution assumptions or marker selection criteria using Mass Spectrometry proteomic ovarian tumor data. We examine the relationship between effect sizes, p values, Cauchy p values, False Discovery Rate p values, and the rank fractions of identified proteins out of thousands in the limited heterogeneous sample. We compared the markers identified from statistical single features selection approaches with machine learning wrapper methods. The results reveal marked differences when selecting the protein markers from varied methods with potential selection biases and false discoveries, which may be due to the small effects, different distribution assumptions, and p value type criteria versus prediction accuracies. The alternative solutions and other related issues are discussed in supporting the reproducibility of findings for clinical actionable outcomes.

中文翻译:

癌症研究中质谱蛋白质组学数据中生物标志物鉴定的可重复性

由于多种因素,多组学疾病分析中疾病特征和临床生物标志物的可重复性一直是一项关键挑战。有限样本的异质性、各种生物因素(如环境混杂​​因素)以及固有的实验和技术噪音,再加上统计工具的不足,可能导致对结果的误解,从而导致生物学差异很大。在本文中,我们使用质谱蛋白质组学卵巢肿瘤数据研究了生物标志物可重复性问题,这可能是由具有不同分布假设或标志物选择标准的统计方法的差异引起的。我们检查效应大小之间的关系,p价值观,柯西p值,错误发现率p值,以及在有限的异质样本中数千个已识别蛋白质的等级分数。我们比较了从统计单一特征选择方法中识别的标记与机器学习包装方法。结果表明,从具有潜在选择偏差和错误发现的不同方法中选择蛋白质标记时存在显着差异,这可能是由于影响较小、分布假设不同以及p值类型标准与预测精度。讨论了替代解决方案和其他相关问题,以支持临床可操作结果的发现的可重复性。
更新日期:2019-05-11
down
wechat
bug