Astronomy and Computing ( IF 1.9 ) Pub Date : 2021-06-12 , DOI: 10.1016/j.ascom.2021.100483 Z.-Z. Li , L. Li , Z. Shao
The Gaussian process (GP) regression can be severely biased when the data are contaminated by outliers. This paper presents a new robust GP regression algorithm that iteratively trims the most extreme data points. While the new algorithm retains the attractive properties of the standard GP as a nonparametric and flexible regression method, it can greatly improve the model accuracy for contaminated data even in the presence of extreme or abundant outliers. It is also easier to implement compared with previous robust GP variants that rely on approximate inference. Applied to a wide range of experiments with different contamination levels, the proposed method significantly outperforms the standard GP and the popular robust GP variant with the Student- likelihood in most test cases. In addition, as a practical example in the astrophysical study, we show that this method can precisely determine the main-sequence ridge line in the color–magnitude diagram of star clusters.
中文翻译:
基于迭代修整的鲁棒高斯过程回归
当数据被异常值污染时,高斯过程 (GP) 回归可能会出现严重偏差。本文提出了一种新的鲁棒 GP 回归算法,该算法迭代地修剪最极端的数据点。虽然新算法保留了标准 GP 作为非参数和灵活回归方法的吸引力,但即使在存在极端或大量异常值的情况下,它也可以大大提高模型对污染数据的准确性。与之前依赖近似推理的稳健 GP 变体相比,它也更容易实现。应用于具有不同污染水平的各种实验,所提出的方法显着优于标准 GP 和流行的稳健 GP 变体与 Student-大多数测试用例中的可能性。此外,作为天体物理研究中的一个实例,我们表明该方法可以精确确定星团颜色-星等图中的主序脊线。