当前位置: X-MOL 学术Proc. Natl. Acad. Sci. U.S.A. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Geometric anomaly detection in data.
Proceedings of the National Academy of Sciences of the United States of America ( IF 11.1 ) Pub Date : 2020-08-18 , DOI: 10.1073/pnas.2001741117
Bernadette J Stolz 1 , Jared Tanner 1, 2 , Heather A Harrington 1, 2 , Vidit Nanda 2, 3
Affiliation  

The quest for low-dimensional models which approximate high-dimensional data is pervasive across the physical, natural, and social sciences. The dominant paradigm underlying most standard modeling techniques assumes that the data are concentrated near a single unknown manifold of relatively small intrinsic dimension. Here, we present a systematic framework for detecting interfaces and related anomalies in data which may fail to satisfy the manifold hypothesis. By computing the local topology of small regions around each data point, we are able to partition a given dataset into disjoint classes, each of which can be individually approximated by a single manifold. Since these manifolds may have different intrinsic dimensions, local topology discovers singular regions in data even when none of the points have been sampled precisely from the singularities. We showcase this method by identifying the intersection of two surfaces in the 24-dimensional space of cyclo-octane conformations and by locating all of the self-intersections of a Henneberg minimal surface immersed in 3-dimensional space. Due to the local nature of the topological computations, the algorithmic burden of performing such data stratification is readily distributable across several processors.



中文翻译:

数据中的几何异常检测。

对近似于高维数据的低维模型的追求遍及物理,自然和社会科学。大多数标准建模技术所基于的主导范式都假设数据集中在具有相对较小内在尺寸的单个未知流形附近。在这里,我们提出了一个系统的框架,用于检测可能无法满足流形假设的数据中的接口和相关异常。通过计算每个数据点周围小区域的局部拓扑,我们可以将给定的数据集划分为不相交的类,每个类都可以由单个流形单独逼近。由于这些歧管可能具有不同的固有尺寸,即使没有从奇异点精确采样任何点,局部拓扑也会发现数据中的奇异区域。我们通过确定环辛烷构象的24维空间中两个表面的交点并找到沉浸在3维空间中的Henneberg最小表面的所有自交点来展示此方法。由于拓扑计算的局部性质,执行此类数据分层的算法负担很容易分配给多个处理器。

更新日期:2020-08-19
down
wechat
bug