Back To Index Previous Article Next Article Full Text

Statistica Sinica 32 (2022), 391-415

GRAPH-BASED TWO-SAMPLE TESTS FOR DATA
WITH REPEATED OBSERVATIONS

Jingru Zhang and Hao Chen

University of Pennsylvania and University of California, Davis

Abstract: For two-sample comparisons, tests based on graphs constructed using the similarity information between observations are gaining attention, owing to their flexibility and good performance for high-dimensional/non-Euclidean data. However, when there are repeated observations, these graph-based tests can be problematic, because they are influenced by the choice of the similarity graph. We propose extended graph-based test statistics to resolve this problem. We also study the asymptotic properties of these extended statistics, and provide analytic formulae to approximate the p-values of the tests under finite samples, facilitating the application of the new tests in practice. The proposed tests are applied to analyze a phone-call network data set. All tests are implemented in the R package gTests.

Key words and phrases: High-dimensional data, network data, non-euclidean data, nonparametric test, similarity graph, ties in distance.

Back To Index Previous Article Next Article Full Text