当前位置: X-MOL 学术J. Am. Stat. Assoc. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Distribution-Free Multisample Tests Based on Optimal Matchings With Applications to Single Cell Genomics
Journal of the American Statistical Association ( IF 3.7 ) Pub Date : 2020-08-18 , DOI: 10.1080/01621459.2020.1791131
Somabha Mukherjee 1 , Divyansh Agarwal 1 , Nancy R. Zhang 1 , Bhaswar B. Bhattacharya 1
Affiliation  

Abstract

In this article, we propose a nonparametric graphical test based on optimal matching, for assessing the equality of multiple unknown multivariate probability distributions. Our procedure pools the data from the different classes to create a graph based on the minimum non-bipartite matching, and then utilizes the number of edges connecting data points from different classes to examine the closeness between the distributions. The proposed test is exactly distribution-free (the null distribution does not depend on the distribution of the data) and can be efficiently applied to multivariate as well as non-Euclidean data, whenever the inter-point distances are well-defined. We show that the test is universally consistent, and prove a distributional limit theorem for the test statistic under general alternatives. Through simulation studies, we demonstrate its superior performance against other common and well-known multisample tests. The method is applied to single cell transcriptomics data obtained from the peripheral blood, cancer tissue, and tumor-adjacent normal tissue of human subjects with hepatocellular carcinoma and non-small-cell lung cancer. Our method unveils patterns in how biochemical metabolic pathways are altered across immune cells in a cancer setting, depending on the tissue location. All of the methods described herein are implemented in the R package multicross. Supplementary materials for this article are available online.



中文翻译:

基于最佳匹配的无分布多样本测试与单细胞基因组学的应用

摘要

在本文中,我们提出了一种基于最优匹配的非参数图形检验,用于评估多个未知多元概率分布的相等性。我们的程序汇集了来自不同类的数据,以基于最小非二分匹配创建一个图,然后利用连接来自不同类的数据点的边数来检查分布之间的紧密度。所提出的测试完全是无分布的(零分布不依赖于数据的分布),并且只要点间距离定义明确,就可以有效地应用于多变量和非欧几里得数据。我们证明了该测试是普遍一致的,并证明了在一般备选方案下测试统计量的分布极限定理。通过模拟研究,我们展示了它与其他常见和众所周知的多样本测试相比的卓越性能。该方法适用于从患有肝细胞癌和非小细胞肺癌的人类受试者的外周血、癌组织和与肿瘤相邻的正常组织中获得的单细胞转录组学数据。我们的方法揭示了癌症环境中免疫细胞的生化代谢途径如何改变的模式,具体取决于组织位置。本文描述的所有方法都在 R 包 multicross 中实现。本文的补充材料可在线获取。以及患有肝细胞癌和非小细胞肺癌的人类受试者的肿瘤邻近正常组织。我们的方法揭示了癌症环境中免疫细胞的生化代谢途径如何改变的模式,具体取决于组织位置。本文描述的所有方法都在 R 包 multicross 中实现。本文的补充材料可在线获取。以及患有肝细胞癌和非小细胞肺癌的人类受试者的肿瘤邻近正常组织。我们的方法揭示了癌症环境中免疫细胞的生化代谢途径如何改变的模式,具体取决于组织位置。本文描述的所有方法都在 R 包 multicross 中实现。本文的补充材料可在线获取。

更新日期:2020-08-18
down
wechat
bug