Photometric redshift estimation using ExtraTreesRegressor: Galaxies and quasars from low to very high redshifts,Astrophysics and Space Science

当前位置： X-MOL 学术 › Astrophys. Space Sci. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Photometric redshift estimation using ExtraTreesRegressor: Galaxies and quasars from low to very high redshifts
Astrophysics and Space Science ( IF 1.8 ) Pub Date : 2020-03-01 , DOI: 10.1007/s10509-020-03758-w
Moonzarin Reza , Mohammad Ariful Haque

Although photometric redshift estimation using machine learning (ML) methods is gaining popularity in recent times, almost all previous work has focused on estimating the redshift, z $z$ , for z < 1 $z < 1$ . Few projects, employing state-of-the-art deep learning techniques have worked on a greater range of redshift values. However, the rigorous demand for a large dataset has limited the deep learning approach to redshift values less than 4, since there is high sparsity in the number of samples of higher values in the available dataset. The main challenge here is to train a model with a highly imbalanced dataset. This paper proposes a method for obtaining photometric redshifts, of both galaxies and quasars, that span the entire redshift range ( 0 < z < 7 $0 < z < 7$ ) in the SDSS catalogue, using ExtraTreesRegressor, a conventional ML method. Data Release 12, 13 and 14 have been used to increase the number of training samples for high redshifts. The redshift values are first transferred to a logarithmic domain to negate the class imbalance effect. Besides the five photometric magnitudes (u-g-r-i-z) typically employed for such tasks, we have used a total of 30 features including morphological parameters, band overlap magnitudes and adjacent filters’ mean magnitudes, and the less contributing features are eliminated using recursive feature elimination. In this work, we propose a custom scoring metric, redshift-weighted mean squared error to penalize the errors of higher redshift values more. Postprocessing methods are used to compensate the biasing of the model for high redshift values. A uniform test set is used to evaluate the performance in all subranges of labels by splitting the entire range into seven bins of equal width. A large number of other in-state ML models have also been used for comparison, where the proposed method has proved to be much superior in terms of various performance metrics. The mean squared error obtained for the uniform test set is 0.66.

中文翻译：

使用 ExtraTreesRegressor 估计光度红移：星系和类星体从低红移到非常高的红移

尽管使用机器学习 (ML) 方法的光度红移估计最近越来越流行，但几乎所有以前的工作都集中在估计红移 z $z$ ，对于 z < 1 $z < 1$ 。很少有项目采用最先进的深度学习技术来处理更大范围的红移值。然而，对大型数据集的严格需求限制了红移值小于 4 的深度学习方法，因为可用数据集中较高值的样本数量非常稀疏。这里的主要挑战是训练一个具有高度不平衡数据集的模型。本文提出了一种获取星系和类星体光度红移的方法，该方法跨越 SDSS 目录中的整个红移范围 (0 < z < 7 $0 < z < 7$ )，使用 ExtraTreesRegressor，传统的 ML 方法。数据发布 12、13 和 14 已用于增加高红移的训练样本数量。红移值首先转移到对数域以抵消类不平衡效应。除了通常用于此类任务的五个光度级 (ugriz) 外，我们还使用了总共 30 个特征，包括形态参数、频带重叠幅度和相邻滤波器的平均幅度，并且使用递归特征消除消除了贡献较小的特征。在这项工作中，我们提出了一种自定义评分指标，即红移加权均方误差，以更多地惩罚较高红移值的误差。后处理方法用于补偿模型对高红移值的偏差。统一测试集用于通过将整个范围拆分为七个等宽的 bin 来评估标签所有子范围的性能。大量其他状态 ML 模型也被用于比较，其中所提出的方法已被证明在各种性能指标方面要优越得多。对于统一测试集获得的均方误差为 0.66。

更新日期：2020-03-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11