Abstract
We propose a novel strategy for incorporating hierarchical supervised label information into nonlinear dimensionality reduction techniques. Specifically, we extend t-SNE, UMAP, and PHATE to include known or predicted class labels and demonstrate the efficacy of our approach on multiple single-cell RNA sequencing datasets. Our approach, “Haisu,” is applicable across domains and methods of nonlinear dimensionality reduction. In general, the mathematical effect of Haisu can be summarized as a variable perturbation of the high dimensional space in which the original data is observed. We thereby preserve the core characteristics of the visualization method and only change the manifold to respect known or assumed class labels when provided. Our strategy is designed to aid in the discovery and understanding of underlying patterns in a dataset that is heavily influenced by parent-child relationships. We show that using our approach can also help in semi-supervised settings where labels are known for only some datapoints (for instance when only a fraction of the cells is labeled). In summary, Haisu extends existing popular visualization methods to enable a user to incorporate known, relevant relationships via a user-defined hierarchical distancing factor.
Availability github.com/Cobanoglu-Lab/Haisu
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
{kevin.vanhorn{at}utsouthwestern.edu, muratcan.cobanoglu{at}utsouthwestern.edu}
Generalized the implementation of H-tSNE, now termed "Haisu," to be more widely applicable across nonlinear dimensionality reduction methods. We introduce additional datasets, demonstrate the efficacy of this hierarchical supervised approach on UMAP, PHATE and t-SNE, and discuss further applications.