当前位置: X-MOL 学术Chemoecology › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A biosynthetically informed distance measure to compare secondary metabolite profiles.
Chemoecology ( IF 1.6 ) Pub Date : 2017-11-27 , DOI: 10.1007/s00049-017-0250-4
Robert R Junker 1
Affiliation  

Secondary metabolite profiles are one of the most diverse phenotypes of organisms and can consist of a large number of compounds originating from a limited number of biosynthetic pathways. The statistical treatment of such profiles often is complicated due to their diversity as well as the intra- and interspecific variability in the quantitative and qualitative composition of secondary metabolites. Most importantly, the assumption of independence of the presence/absence and the quantity of compounds is violated due to the shared biosynthetic origin of many compounds. Therefore, I propose a biosynthetically informed pairwise distance measure that fully considers the biosynthesis of the compounds and thus quantifies the similarity in the enzymatic equipment of two samples. The biosynthetic similarity of compounds is calculated based on the proportion of shared enzymes that are required for their biosynthesis. Using this information (provided as dendrogram structure) and the quantitative composition of the samples, generalized UniFrac distances are calculated measuring that fraction of the dendrogram (i.e., the branch lengths) that is unique to either of the samples but not shared by both samples. To allow a straightforward cross-platform application of the approach, I provide functions for the statistical software R and sample data sets. A hypothetical and a real world example show the feasibility of the biosynthetically informed distances d A,B and highlight the differences to conventional distance measures. The advantages of this approach and potential fields of application are discussed.

中文翻译:


用于比较次生代谢物谱的生物合成信息距离测量。



次生代谢物谱是生物体最多样化的表型之一,可以由源自有限数量的生物合成途径的大量化合物组成。由于其多样性以及次生代谢物的定量和定性组成的种内和种间变异性,此类谱的统计处理通常很复杂。最重要的是,由于许多化合物具有共同的生物合成来源,因此违反了化合物的存在/不存在和数量的独立性假设。因此,我提出了一种生物合成的成对距离测量,充分考虑化合物的生物合成,从而量化两个样品酶设备的相似性。化合物的生物合成相似性是根据其生物合成所需的共享酶的比例来计算的。使用该信息(作为树状图结构提供)和样品的定量组成,计算广义 UniFrac 距离,测量树状图的部分(即分支长度),该部分对于任一样品是唯一的,但不是由两个样品共享的。为了允许该方法直接跨平台应用,我提供了统计软件 R 和样本数据集的函数。假设和现实世界的例子显示了生物合成距离d A,B的可行性,并强调了与传统距离测量的差异。讨论了这种方法的优点和潜在的应用领域。
更新日期:2017-11-27
down
wechat
bug