当前位置: X-MOL 学术Astron. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Solar activity classification based on Mg II spectra: Towards classification on compressed data
Astronomy and Computing ( IF 1.9 ) Pub Date : 2021-06-02 , DOI: 10.1016/j.ascom.2021.100473
S. Ivanov , M. Tsizh , D. Ullmann , B. Panos , S. Voloshynovskiy

Although large volumes of solar data are available for investigation and study, the vast majority of these data remain unlabeled and are therefore not amenable to modern supervised machine learning methods. Having a way to accurately and automatically classify spectra into categories related to the degree of solar activity is highly desirable and will assist and speed up future research efforts in solar physics. At the same time, the large volume of raw observational data is a serious bottleneck for machine learning, requiring powerful computational means that are not at the disposal of many laboratories. Additionally, the raw data communication imposes some restrictions on real time data observations and requires considerable bandwidth and energy for the onboard solar observation systems. To cope with the above mentioned issues, we propose a framework to classify solar activity on compressed data. To this end, we used a labeling scheme from a pre-existing vector quantization technique in conjunction with several machine learning algorithms to categorize spectra of singly-ionized magnesium Mg II measured by NASA’s Interface Region Imaging Spectrograph small explorer satellite IRIS into several groups characterizing solar activity. Our training dataset is a human annotated list of 85 IRIS observations containing 29097 frames in total or equivalently 9 million Mg II spectra. The annotated types of Solar activities are: active region, pre-flare activity, Solar flare, Sunspot and quiet Sun. We used the vector quantization to compress these data and to reduce its complexity before training classifiers. From a host of classifiers, we found that the XGBoost classifier produced the most accurate results on the compressed data, yielding over a 95% prediction rate, and outperforming other ML methods like convolution neural networks, K-nearest neighbors, naive Bayes classifiers and support vector machines. A principle finding of this research is that the classification performance on compressed and uncompressed data is comparable under our particular architecture, implying the possibility of large compression rates for relatively low degrees of information loss.



中文翻译:

基于 Mg II 光谱的太阳活动分类:对压缩数据进行分类

尽管有大量太阳能数据可供调查和研究,但这些数据中的绝大多数仍未标记,因此不适用于现代监督机器学习方法。有一种方法可以准确、自动地将光谱分类为与太阳活动程度相关的类别,这将有助于和加快未来太阳物理学的研究工作。同时,大量的原始观测数据是机器学习的严重瓶颈,需要强大的计算手段,而许多实验室都无法使用这些手段。此外,原始数据通信对实时数据观测施加了一些限制,并且需要相当大的带宽和能量用于机载太阳观测系统。针对上述问题,我们提出了一个框架来根据压缩数据对太阳活动进行分类。为此,我们使用来自预先存在的矢量量化技术的标记方案结合几种机器学习算法,将 NASA 界面区域成像光谱仪小型探测器卫星 IRIS 测量的单电离镁 Mg II 的光谱分类为几个表征太阳的组活动。我们的训练数据集是包含 85 个 IRIS 观测值的人工注释列表,总共包含 29097 个帧或相当于 900 万个 Mg II 光谱。太阳活动的注释类型是:活跃区、前耀斑活动、太阳耀斑、太阳黑子和安静太阳。我们使用矢量量化来压缩这些数据并在训练分类器之前降低其复杂性。从众多分类器中,我们发现 XGBoost 分类器在压缩数据上产生了最准确的结果,产生了超过 95% 的预测率,并且优于其他 ML 方法,如卷积神经网络、K 近邻、朴素贝叶斯分类器和支持向量机。这项研究的一个主要发现是,在我们的特定架构下,压缩和未压缩数据的分类性能是可比的,这意味着对于相对较低程度的信息丢失,有可能使用大压缩率。

更新日期:2021-06-18
down
wechat
bug