当前位置: X-MOL 学术Adv. Astron. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Classification of Continuous Sky Brightness Data Using Random Forest
Advances in Astronomy ( IF 1.6 ) Pub Date : 2020-04-01 , DOI: 10.1155/2020/5102065
Rhorom Priyatikanto 1 , Lidia Mayangsari 2 , Rudi A. Prihandoko 3 , Agustinus G. Admiranto 1
Affiliation  

Sky brightness measuring and monitoring are required to mitigate the negative effect of light pollution as a byproduct of modern civilization. Good handling of a pile of sky brightness data includes evaluation and classification of the data according to its quality and characteristics such that further analysis and inference can be conducted properly. This study aims to develop a classification model based on Random Forest algorithm and to evaluate its performance. Using sky brightness data from 1250 nights with minute temporal resolution acquired at eight different stations in Indonesia, datasets consisting of 15 features were created to train and test the model. Those features were extracted from the observation time, the global statistics of nightly sky brightness, or the light curve characteristics. Among those features, 10 are considered to be the most important for the classification task. The model was trained to classify the data into six classes (1: peculiar data, 2: overcast, 3: cloudy, 4: clear, 5: moonlit-cloudy, and 6: moonlit-clear) and then tested to achieve high accuracy (92%) and scores (F-score = 84% and G-mean = 84%). Some misclassifications exist, but the classification results are considerably good as indicated by posterior distributions of the sky brightness as a function of classes. Data classified as class-4 have sharp distribution with typical full width at half maximum of 1.5 mag/arcsec2, while distributions of class-2 and -3 are left skewed with the latter having lighter tail. Due to the moonlight, distributions of class-5 and -6 data are more smeared or have larger spread. These results demonstrate that the established classification model is reasonably good and consistent.

中文翻译:

使用随机森林对连续天空亮度数据进行分类

需要对天空亮度进行测量和监控,以减轻作为现代文明的副产品光污染的负面影响。对一堆天空亮度数据的良好处理包括根据数据的质量和特性对数据进行评估和分类,以便可以适当地进行进一步的分析和推断。本研究旨在开发基于随机森林算法的分类模型并评估其性能。利用在印度尼西亚的八个不同站点获得的1250个夜晚的天空亮度数据和微小的时间分辨率,创建了由15个要素组成的数据集以训练和测试该模型。这些特征是从观测时间,夜空亮度的总体统计数据或光曲线特征中提取的。在这些功能中,10个被认为是最重要的分类任务。该模型经过训练可将数据分为六类(1:特殊数据,2:阴天,3:多云,4:晴朗,​​5:月光多云和6:月光晴朗),然后进行测试以实现高精度( 92%)和分数(F分数= 84%,G平均值= 84%)。存在一些错误分类,但是分类结果相当好,如天空亮度作为分类函数的后验分布所示。归类为第4类的数据具有锐利的分布,典型的半峰全宽为1.5 mag / arcsec 2,而第2类和-3的分布则偏斜,后者的尾巴较轻。由于月光的影响,5级和-6级数据的分布更加模糊或散布更大。这些结果表明,所建立的分类模型是合理的,一致的。
更新日期:2020-04-01
down
wechat
bug