Pulsar candidate identification using semi-supervised generative adversarial networks,Monthly Notices of the Royal Astronomical Society

当前位置： X-MOL 学术 › Mon. Not. R. Astron. Soc. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Pulsar candidate identification using semi-supervised generative adversarial networks
Monthly Notices of the Royal Astronomical Society ( IF 4.7 ) Pub Date : 2021-05-06 , DOI: 10.1093/mnras/stab1308
Vishnu Balakrishnan ₁ , David Champion ₁ , Ewan Barr ₁ , Michael Kramer ₁ , Rahul Sengar ₂ , Matthew Bailes ₂

Affiliation

Machine learning methods are increasingly helping astronomers identify new radio pulsars. However, they require a large amount of labelled data, which is time consuming to produce and biased. Here, we describe a Semi-supervised generative adversarial network, which achieves better classification performance than the standard supervised algorithms using majority unlabelled data sets. We achieved an accuracy and mean F-Score of 94.9 per cent trained on only 100 labelled candidates and 5000 unlabelled candidates compared to our standard supervised baseline which scored at 81.1 per cent and 82.7 per cent, respectively. Our final model trained on a much larger labelled data set achieved an accuracy and mean F-score value of 99.2 per cent and a recall rate of 99.7 per cent. This technique allows for high-quality classification during the early stages of pulsar surveys on new instruments when limited labelled data are available. We open-source our work along with a new pulsar-candidate data set produced from the High Time Resolution Universe – South Low Latitude Survey. This data set has the largest number of pulsar detections of any public data set and we hope it will be a valuable tool for benchmarking future machine learning models.

中文翻译：

使用半监督生成对抗网络的脉冲星候选识别

机器学习方法越来越多地帮助天文学家识别新的无线电脉冲星。但是，它们需要大量的标记数据，这既耗时又存在偏差。在这里，我们描述了一个半监督生成对抗网络，它比使用大多数未标记数据集的标准监督算法实现了更好的分类性能。与我们的标准监督基线得分分别为 81.1% 和 82.7% 相比，我们仅在 100 个标记的候选者和 5000 个未标记的候选者上获得了 94.9% 的准确度和平均 F 分数。我们在更大的标记数据集上训练的最终模型达到了 99.2% 的准确度和平均 F 值以及 99.7% 的召回率。当可用的标记数据有限时，这种技术允许在新仪器上进行脉冲星调查的早期阶段进行高质量分类。我们将我们的工作与从高时间分辨率宇宙 - 南低纬度调查产生的新脉冲星候选数据集一起开源。该数据集拥有所有公共数据集中最多的脉冲星检测，我们希望它将成为对未来机器学习模型进行基准测试的宝贵工具。

更新日期：2021-05-06

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11