当前位置: X-MOL 学术J. Comput. Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
EnsMOD: A Software Program for Omics Sample Outlier Detection.
Journal of Computational Biology ( IF 1.4 ) Pub Date : 2023-04-12 , DOI: 10.1089/cmb.2022.0243
Nathan P Manes 1 , Jian Song 1 , Aleksandra Nita-Lazar 1
Affiliation  

Detection of omics sample outliers is important for preventing erroneous biological conclusions, developing robust experimental protocols, and discovering rare biological states. Two recent publications describe robust algorithms for detecting transcriptomic sample outliers, but neither algorithm had been incorporated into a software tool for scientists. Here we describe Ensemble Methods for Outlier Detection (EnsMOD) which incorporates both algorithms. EnsMOD calculates how closely the quantitation variation follows a normal distribution, plots the density curves of each sample to visualize anomalies, performs hierarchical cluster analyses to calculate how closely the samples cluster with each other, and performs robust principal component analyses to statistically test if any sample is an outlier. The probabilistic threshold parameters can be easily adjusted to tighten or loosen the outlier detection stringency. EnsMOD can be used to analyze any omics dataset with normally distributed variance. Here it was used to analyze a simulated proteomics dataset, a multiomic (proteome and transcriptome) dataset, a single-cell proteomics dataset, and a phosphoproteomics dataset. EnsMOD successfully identified all of the simulated outliers, and subsequent removal of a detected outlier improved data quality for downstream statistical analyses.

中文翻译:


EnsMOD:用于组学样本离群值检测的软件程序。



组学样本异常值的检测对于防止错误的生物学结论、开发稳健的实验方案和发现罕见的生物学状态非常重要。最近的两篇出版物描述了用于检测转录组样本异常值的强大算法,但这两种算法都没有被纳入科学家的软件工具中。在这里,我们描述了结合了两种算法的异常值检测集成方法 (EnsMOD)。 EnsMOD 计算定量变化遵循正态分布的程度,绘制每个样本的密度曲线以可视化异常情况,执行层次聚类分析以计算样本彼此聚类的紧密程度,并执行稳健的主成分分析以统计测试是否有任何样本是一个异常值。可以轻松调整概率阈值参数以加强或放松异常值检测严格性。 EnsMOD 可用于分析任何具有正态分布方差的组学数据集。这里它用于分析模拟蛋白质组数据集、多组学(蛋白质组和转录组)数据集、单细胞蛋白质组数据集和磷酸蛋白质组数据集。 EnsMOD 成功识别了所有模拟异常值,随后删除检测到的异常值提高了下游统计分析的数据质量。
更新日期:2023-04-12
down
wechat
bug