Probabilistic cosmic web classification using fast-generated training data,Monthly Notices of the Royal Astronomical Society

当前位置： X-MOL 学术 › Mon. Not. R. Astron. Soc. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Probabilistic cosmic web classification using fast-generated training data
Monthly Notices of the Royal Astronomical Society ( IF 4.7 ) Pub Date : 2020-07-13 , DOI: 10.1093/mnras/staa2008
Brandon Buncher ₁ , Matias Carrasco Kind _{2,

3}

Affiliation

We present a novel method of robust probabilistic cosmic web particle classification in three dimensions using a supervised machine learning algorithm. Training data was generated using a simplified $\Lambda$CDM toy model with pre-determined algorithms for generating halos, filaments, and voids. While this framework is not constrained by physical modeling, it can be generated substantially more quickly than an N-body simulation without loss in classification accuracy. For each particle in this dataset, measurements were taken of the local density field magnitude and directionality. These measurements were used to train a random forest algorithm, which was used to assign class probabilities to each particle in a $\Lambda$CDM, dark matter-only N-body simulation with $256^3$ particles, as well as on another toy model data set. By comparing the trends in the ROC curves and other statistical metrics of the classes assigned to particles in each dataset using different feature sets, we demonstrate that the combination of measurements of the local density field magnitude and directionality enables accurate and consistent classification of halo, filament, and void particles in varied environments We also show that this combination of training features ensures that the construction of our toy model does not affect classification. The use of a fully supervised algorithm allows greater control over the information deemed important for classification, preventing issues arising from hyperparameters and mode collapse in deep learning models. Due to the speed of training data generation, our method is highly scalable, making it particularly suited for classifying large datasets, including observed data.

中文翻译：

使用快速生成的训练数据进行概率宇宙网络分类

我们提出了一种使用监督机器学习算法在三个维度上进行稳健概率宇宙网粒子分类的新方法。训练数据是使用简化的 $\Lambda$CDM 玩具模型生成的，该模型具有用于生成光晕、细丝和空隙的预定算法。虽然这个框架不受物理建模的限制，但它可以比 N 体模拟更快地生成，而不会损失分类精度。对于该数据集中的每个粒子，对局部密度场的大小和方向性进行了测量。这些测量值用于训练随机森林算法，该算法用于为 $\Lambda$CDM、仅暗物质 N 体模拟（使用 $256^3$ 粒子以及另一个玩具）中的每个粒子分配类别概率模型数据集。通过使用不同的特征集比较每个数据集中分配给粒子的类别的 ROC 曲线和其他统计指标的趋势，我们证明了局部密度场幅度和方向性的测量组合能够实现对晕圈、细丝的准确和一致的分类和不同环境中的空粒子我们还表明，这种训练特征的组合确保我们的玩具模型的构建不会影响分类。使用全监督算法可以更好地控制被认为对分类很重要的信息，防止深度学习模型中的超参数和模式崩溃引起的问题。由于训练数据生成的速度，我们的方法具有高度可扩展性，使其特别适合对大型数据集进行分类，

更新日期：2020-07-13

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11