GPGPU Linear Complexity t-SNE Optimization.,IEEE Transactions on Visualization and Computer Graphics

当前位置： X-MOL 学术 › IEEE Trans. Vis. Comput. Graph. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

GPGPU Linear Complexity t-SNE Optimization.
IEEE Transactions on Visualization and Computer Graphics ( IF 4.7 ) Pub Date : 2019-08-23 , DOI: 10.1109/tvcg.2019.2934307
Nicola Pezzotti , Julian Thijssen , Alexander Mordvintsev , Thomas Hollt , Baldur Van Lew , Boudewijn P.F. Lelieveldt , Elmar Eisemann , Anna Vilanova

In recent years the t-distributed Stochastic Neighbor Embedding (t-SNE) algorithm has become one of the most used and insightful techniques for exploratory data analysis of high-dimensional data. It reveals clusters of high-dimensional data points at different scales while only requiring minimal tuning of its parameters. However, the computational complexity of the algorithm limits its application to relatively small datasets. To address this problem, several evolutions of t-SNE have been developed in recent years, mainly focusing on the scalability of the similarity computations between data points. However, these contributions are insufficient to achieve interactive rates when visualizing the evolution of the t-SNE embedding for large datasets. In this work, we present a novel approach to the minimization of the t-SNE objective function that heavily relies on graphics hardware and has linear computational complexity. Our technique decreases the computational cost of running t-SNE on datasets by orders of magnitude and retains or improves on the accuracy of past approximated techniques. We propose to approximate the repulsive forces between data points by splatting kernel textures for each data point. This approximation allows us to reformulate the t-SNE minimization problem as a series of tensor operations that can be efficiently executed on the graphics card. An efficient implementation of our technique is integrated and available for use in the widely used Google TensorFlow.js, and an open-source C++ library.

中文翻译：

GPGPU线性复杂度t-SNE优化。

近年来，t分布随机邻居嵌入（t-SNE）算法已成为用于高维数据探索性数据分析的最常用和最有见地的技术之一。它揭示了不同规模的高维数据点的群集，而只需要对其参数进行最小调整。但是，该算法的计算复杂性将其应用限制在相对较小的数据集上。为了解决这个问题，近年来已经开发了几种t-SNE，主要集中在数据点之间相似性计算的可扩展性上。但是，当可视化大型数据集的t-SNE嵌入的演变时，这些贡献不足以实现交互速率。在这项工作中我们提出了一种最小化t-SNE目标函数的新颖方法，该方法严重依赖于图形硬件并且具有线性计算复杂性。我们的技术将在数据集上运行t-SNE的计算成本降低了几个数量级，并保持或提高了过去近似技术的准确性。我们建议通过为每个数据点分配内核纹理来近似数据点之间的排斥力。这种近似使我们可以将t-SNE最小化问题重新表述为一系列可在图形卡上有效执行的张量运算。我们技术的有效实施已集成，可在广泛使用的Google TensorFlow.js和开源C ++库中使用。我们的技术将在数据集上运行t-SNE的计算成本降低了几个数量级，并保持或提高了过去近似技术的准确性。我们建议通过为每个数据点分配内核纹理来近似数据点之间的排斥力。这种近似使我们可以将t-SNE最小化问题重新表述为一系列可在图形卡上有效执行的张量运算。我们技术的有效实施已集成，可在广泛使用的Google TensorFlow.js和开源C ++库中使用。我们的技术将在数据集上运行t-SNE的计算成本降低了几个数量级，并保持或提高了过去近似技术的准确性。我们建议通过为每个数据点分配内核纹理来近似数据点之间的排斥力。这种近似使我们可以将t-SNE最小化问题重新表述为一系列可在图形卡上有效执行的张量运算。我们技术的有效实施已集成，可在广泛使用的Google TensorFlow.js和开源C ++库中使用。我们建议通过为每个数据点分配内核纹理来近似数据点之间的排斥力。这种近似使我们可以将t-SNE最小化问题重新表述为一系列可在图形卡上有效执行的张量运算。我们技术的有效实施已集成，可在广泛使用的Google TensorFlow.js和开源C ++库中使用。我们建议通过为每个数据点分配内核纹理来近似数据点之间的排斥力。这种近似使我们可以将t-SNE最小化问题重新表述为一系列可在图形卡上有效执行的张量运算。我们技术的有效实施已集成，可在广泛使用的Google TensorFlow.js和开源C ++库中使用。

更新日期：2019-11-01

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11