当前位置: X-MOL 学术Biol. Direct › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Identification of city specific important bacterial signature for the MetaSUB CAMDA challenge microbiome data.
Biology Direct ( IF 5.5 ) Pub Date : 2019-07-24 , DOI: 10.1186/s13062-019-0243-z
Alejandro R Walker 1, 2 , Susmita Datta 1
Affiliation  

BACKGROUND Metagenomic data of whole genome sequences (WGS) from samples across several cities around the globe may unravel city specific signatures of microbes. Illumina MiSeq sequencing data was provided from 12 cities in 7 different countries as part of the 2018 CAMDA "MetaSUB Forensic Challenge", including also samples from three mystery sets. We used appropriate machine learning techniques on this massive dataset to effectively identify the geographical provenance of "mystery" samples. Additionally, we pursued compositional data analysis to develop accurate inferential techniques for such microbiome data. It is expected that this current data, which is of higher quality and higher sequence depth compared to the CAMDA 2017 MetaSUB challenge data, along with improved analytical techniques would yield many more interesting, robust and useful results that can be beneficial for forensic analysis. RESULTS A preliminary quality screening of the data revealed a much better dataset in terms of Phred quality score (hereafter Phred score), and larger paired-end MiSeq reads, and a more balanced experimental design, though still not equal number of samples across cities. PCA (Principal Component Analysis) analysis showed interesting clusters of samples and a large amount of the variability in the data was explained by the first three components (~ 70%). The classification analysis proved to be consistent across both the testing mystery sets with a similar percentage of the samples correctly predicted (up to 90%). The analysis of the relative abundance of bacterial "species" showed that some "species" are specific to some regions and can play important roles for predictions. These results were also corroborated by the variable importance given to the "species" during the internal cross validation (CV) run with Random Forest (RF). CONCLUSIONS The unsupervised analysis (PCA and two-way heatmaps) of the log2-cpm normalized data and relative abundance differential analysis seemed to suggest that the bacterial signature of common "species" was distinctive across the cities; which was also supported by the variable importance results. The prediction of the city for mystery sets 1 and 3 showed convincing results with high classification accuracy/consistency. The focus of this work on the current MetaSUB data and the analytical tools utilized here can be of great help in forensic, metagenomics, and other sciences to predict city of provenance of metagenomic samples, as well as in other related fields. Additionally, the pairwise analysis of relative abundance showed that the approach provided consistent and comparable "species" when compared with the classification importance variables. REVIEWERS This article was reviewed by Manuela Oliveira, Dimitar Vassilev, and Patrick Lee.

中文翻译:

为MetaSUB CAMDA挑战微生物组数据鉴定城市特定的重要细菌签名。

背景技术来自全球多个城市的样本中的全基因组序列(WGS)的元基因组数据可能会揭示特定于城市的微生物特征。作为2018年CAMDA“ MetaSUB法医挑战赛”的一部分,Illumina MiSeq测序数据来自7个不同国家的12个城市,其中还包括来自三个谜团的样本。我们在这个庞大的数据集上使用了适当的机器学习技术,以有效地识别“神秘”样本的地理来源。此外,我们进行了成分数据分析,以开发针对此类微生物组数据的准确推断技术。预计与CAMDA 2017 MetaSUB挑战数据相比,该当前数据具有更高的质量和更高的序列深度,以及改进的分析技术将产生更多有趣的结果,强大而有用的结果,可能对法医分析有利。结果初步的数据筛选显示,在Phred质量得分(以下称Phred得分),更大的配对末端MiSeq读数和更均衡的实验设计方面,尽管整个城市的样本数量仍然不相等,但该数据集要好得多。PCA(主成分分析)分析显示了有趣的样本簇,并且数据的大量可变性由前三个成分(〜70%)解释。分类分析在两个测试谜题集上均证明是一致的,正确预测的样本比例相似(高达90%)。对细菌“物种”相对丰富度的分析表明,某些“物种” 特定于某些地区,可以在预测中发挥重要作用。在使用随机森林(RF)进行内部交叉验证(CV)期间,对“物种”给予的不同重要性也证实了这些结果。结论log2-cpm归一化数据的无监督分析(PCA和双向热图)和相对丰度差异分析似乎表明,在整个城市中,常见的“物种”的细菌特征是不同的。重要性重要性的变化也证明了这一点。对城市的神秘预测1和3显示出令人信服的结果,具有很高的分类准确性/一致性。这项工作的重点在于当前的MetaSUB数据和此处使用的分析工具,这在法医,宏基因组学,以及其他用于预测宏基因组样本来源城市以及其他相关领域的科学。另外,相对丰度的成对分析表明,与分类重要性变量相比,该方法提供了一致且可比较的“种类”。审阅者本文由Manuela Oliveira,Dimitar Vassilev和Patrick Lee审阅。
更新日期:2020-04-22
down
wechat
bug