当前位置: X-MOL 学术Comput. Stat. Data Anal. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Correlation for tree-shaped datasets and its Bayesian estimation
Computational Statistics & Data Analysis ( IF 1.8 ) Pub Date : 2021-07-05 , DOI: 10.1016/j.csda.2021.107307
Shanjun Mao 1, 2 , Xiaodan Fan 2 , Jie Hu 3
Affiliation  

Tree-shaped datasets have arisen in various research and industrial fields, such as gene expression data measured on a cell lineage tree and information spreading on tree-shaped paths. Certain correlation measure between two tree-shaped datasets, i.e., how the values increase or decrease together along corresponding paths of the two trees, is desired; but the tree topology prohibits the use of classical vector-based correlation measures such as Pearson correlation coefficient. To this end, a statistical framework for measuring such tree correlation is proposed. As a specific model in this framework, a parametric model based on bivariate Gaussian distributions is provided, and a Bayesian approach for parameter estimation is introduced. The model allows the coupling degree of corresponding nodes to change with the depth of the tree. It provides an intuitive mapping of the trend similarity of the values along two trees to the classical Pearson correlation. A Metropolis-within-Gibbs algorithm is used to obtain the posterior estimates. Extensive simulations and in-depth sensitivity analyses are performed to demonstrate the validity and robustness of the method. Furthermore, an application to embryonic gene expression datasets shows that this tree similarity measure aligns well with the biological properties.



中文翻译:

树形数据集的相关性及其贝叶斯估计

树形数据集出现在各种研究和工业领域,例如在细胞谱系树上测量的基因表达数据和在树形路径上传播的信息。需要两个树形数据集之间的某种相关性度量,即值如何沿两棵树的相应路径一起增加或减少;但是树拓扑禁止使用经典的基于向量的相关度量,例如 Pearson 相关系数。为此,提出了一种用于测量这种树相关性的统计框架。作为该框架中的一个具体模型,提供了一个基于二元高斯分布的参数模型,并引入了参数估计的贝叶斯方法。该模型允许相应节点的耦合度随树的深度而变化。它提供了沿两棵树的值的趋势相似性与经典 Pearson 相关性的直观映射。Metropolis-within-Gibbs 算法用于获得后验估计。进行了广泛的模拟和深入的敏感性分析,以证明该方法的有效性和稳健性。此外,胚胎基因表达数据集的应用表明,这种树相似性度量与生物学特性非常吻合。

更新日期:2021-07-13
down
wechat
bug