当前位置: X-MOL 学术Limnol. Oceanogr. Methods › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Unsupervised biodiversity estimation using proteomic fingerprints from MALDI‐TOF MS data
Limnology and Oceanography: Methods ( IF 2.1 ) Pub Date : 2020-05-05 , DOI: 10.1002/lom3.10358
Sven Rossel 1 , Pedro Martínez Arbizu 1, 2
Affiliation  

Species identification using matrix assisted laser desorption/ionization time‐of‐flight mass spectrometry (MALDI‐TOF MS) data strongly relies on reference libraries to differentiate species. Because comprehensive reference libraries, especially for metazoans, are rare, we explored the accuracy of unsupervised diversity estimations of communities using MALDI‐TOF MS data in the absence of reference libraries to provide a method for future application in ecological research. To discover the best analysis strategy providing high congruence with true community structures, we carried out a simulation with more than 30,000 analyses using different combinations of data transformations, dimensionality reductions, and cluster algorithms. Species profile, Hellinger, and presence/absence transformations were applied to raw data and dimensions were reduced using principal component analysis (PCA), t‐distributed stochastic neighbor embedding, and uniform manifold approximation and projection. To estimate biodiversity, data were clustered making use of partitioning around medoids, model‐based clustering, and K‐means clustering. The analyses were carried out on published mass spectrometry data of harpacticoid copepods. Most successful combinations (Hellinger transformation + PCA or raw data + partitioning around medoids) returned good values even for difficult species distributions containing numerous singleton species. Nevertheless, errors occurred most frequently because of such singleton taxa. Hence, replicative sampling in wide sampling areas for analysis is emphasized to increase the minimum number of specimens per species, thus reducing putative sources of errors. Our results demonstrate that MALDI‐TOF MS data can be used to accurately estimate the biodiversity of unknown communities using unsupervised learning methods. The provided approach allows the biodiversity comparison of sampled regions for which no reference libraries are available. Hence, especially data on groups which demand a time‐consuming identification or are highly abundant can be analyzed within short working time, accelerating ecological studies.

中文翻译:

使用MALDI-TOF MS数据中的蛋白质组指纹进行无监督的生物多样性评估

使用基质辅助激光解吸/电离飞行时间质谱(MALDI-TOF MS)数据进行物种鉴定非常依赖于参考库来区分物种。由于全面的参考库(尤其是后生动物)很少见,因此我们在没有参考库的情况下探索了使用MALDI-TOF MS数据进行的社区无监督多样性估计的准确性,为将来在生态研究中的应用提供了一种方法。为了发现提供与真实社区结构高度一致的最佳分析策略,我们使用数据转换,降维和聚类算法的不同组合对30,000多个分析进行了仿真。物种概况,赫林格,并将存在/不存在转换应用于原始数据,并使用主成分分析(PCA),t分布随机邻居嵌入以及统一的流形近似和投影来减少维数。为了估计生物多样性,对数据进行了聚类,利用了围绕类固醇的分区,基于模型的聚类和K均值聚类。该分析是在已发表的类六足co足类动物的质谱数据上进行的。大多数成功的组合(Hellinger变换+ PCA或原始数据+围绕类固醇的分区)即使对于包含众多单例物种的困难物种分布也能获得良好的价值。但是,由于这种单例分类群,错误最常发生。因此,强调在广泛的采样区域进行分析性复制采样,以增加每个物种的最小标本数量,从而减少假定的误差来源。我们的结果表明,MALDI-TOF MS数据可用于通过无监督学习方法准确估算未知社区的生物多样性。提供的方法允许对没有参考库的采样区域进行生物多样性比较。因此,特别是需要费时的鉴定或高度丰富的群体数据可以在短时间内进行分析,从而加快了生态学研究的速度。提供的方法允许对没有参考库的采样区域进行生物多样性比较。因此,特别是需要费时的鉴定或高度丰富的群体数据可以在短时间内进行分析,从而加快了生态学研究的速度。提供的方法可以对没有参考库的采样区域进行生物多样性比较。因此,特别是需要费时的鉴定或高度丰富的群体数据可以在短时间内进行分析,从而加快了生态学研究的速度。
更新日期:2020-05-05
down
wechat
bug