当前位置: X-MOL 学术Artif. Intell. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Probabilistic modelling of general noisy multi-manifold data sets
Artificial Intelligence ( IF 5.1 ) Pub Date : 2021-08-31 , DOI: 10.1016/j.artint.2021.103579
M. Canducci 1 , P. Tiño 1 , M. Mastropietro 2
Affiliation  

The intrinsic nature of noisy and complex data sets is often concealed in low-dimensional structures embedded in a higher dimensional space. Number of methodologies have been developed to extract and represent such structures in the form of manifolds (i.e. geometric structures that locally resemble continuously deformable intervals of Rj1). Usually a-priori knowledge of the manifold's intrinsic dimensionality is required. Additionally, their performance can often be hampered by the presence of a significant high-dimensional noise aligned along the low-dimensional core manifold. In real-world applications, the data can contain several low-dimensional structures of different dimensionalities. We propose a framework for dimensionality estimation and reconstruction of multiple noisy manifolds embedded in a noisy environment. To the best of our knowledge, this work represents the first attempt at detection and modelling of a set of coexisting general noisy manifolds by uniting two aspects of multi-manifold learning: the recovery and approximation of core noiseless manifolds and the construction of their probabilistic models. The easy-to-understand hyper-parameters can be manipulated to obtain an emerging picture of the multi-manifold structure of the data. We demonstrate the workings of the framework on two synthetic data sets, presenting challenging features for state-of-the-art techniques in Multi-Manifold learning. The first data set consists of multiple sampled noisy manifolds of different intrinsic dimensionalities, such as Möbius strip, toroid and spiral arm. The second one is a topologically complex set of three interlocked toroids. Given the absence of such unified methodologies in the literature, the comparison with existing techniques is organized along the two separate aspects of our approach mentioned above, namely manifold approximation and probabilistic modelling. The framework is then applied to a complex data set containing simulated gas volume particles from a particle simulation of a dwarf galaxy interacting with its host galaxy cluster. Detailed analysis of the recovered 1D and 2D manifolds can help us to understand the nature of Star Formation in such complex systems.



中文翻译:

一般噪声多流形数据集的概率建模

嘈杂和复杂数据集的内在性质通常隐藏在嵌入高维空间的低维结构中。已经开发了许多方法来以流形的形式提取和表示这种结构(即局部类似于连续变形区间的几何结构电阻j1)。通常需要流形的内在维度的先验知识。此外,它们的性能通常会受到沿低维核心歧管排列的显着高维噪声的影响。在实际应用中,数据可以包含多个不同维度的低维结构。我们提出了一个框架,用于在噪声环境中嵌入的多个噪声流形的维数估计和重建。据我们所知,这项工作代表了通过结合多流形学习的两个方面来检测和建模一组共存的一般噪声流形的第一次尝试:核心无噪声流形的恢复和近似及其概率模型的构建. 可以操纵易于理解的超参数以获得数据的多流形结构的新兴图片。我们展示了该框架在两个合成数据集上的工作原理,展示了多流形学习中最先进技术的具有挑战性的特征。第一个数据集由多个不同固有维度的采样噪声流形组成,例如莫比乌斯带、环形和螺旋臂。第二个是一组拓扑复杂的三个互锁环。鉴于文献中缺乏这种统一的方法,与现有技术的比较是按照我们上面提到的方法的两个独立方面来组织的,即流形近似和概率建模。然后将该框架应用于包含模拟气体体积粒子的复杂数据集,这些粒子来自矮星系与其宿主星系团相互作用的粒子模拟。对恢复的一维和二维流形的详细分析可以帮助我们了解这种复杂系统中恒星形成的性质。

更新日期:2021-09-06
down
wechat
bug