当前位置: X-MOL 学术J. Quant. Spectrosc. Radiat. Transf. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Machine learning for automatic identification of new minor species
Journal of Quantitative Spectroscopy and Radiative Transfer ( IF 2.3 ) Pub Date : 2020-09-29 , DOI: 10.1016/j.jqsrt.2020.107361
Frédéric Schmidt , Guillaume Cruz Mermy , Justin Erwin , Séverine Robert , Lori Neary , Ian R. Thomas , Frank Daerden , Bojan Ristic , Manish R. Patel , Giancarlo Bellucci , Jose-Juan Lopez-Moreno , Ann-Carine Vandaele

One of the main difficulties to analyze modern spectroscopic datasets is due to the large amount of data. For example, in atmospheric transmittance spectroscopy, the solar occultation channel (SO) of the NOMAD instrument onboard the ESA ExoMars2016 satellite called Trace Gas Orbiter (TGO) had produced ~ 10 millions of spectra in ~ 20000 acquisition sequences since the beginning of the mission in April 2018 until 15 January 2020. Other datasets are even larger with ~ billions of spectra for OMEGA onboard Mars Express or CRISM onboard Mars Reconnaissance Orbiter. Usually, new lines are discovered after a long iterative process of model fitting and manual residual analysis. Here we propose a new method based on unsupervised machine learning, to automatically detect new minor species. Although precise quantification is out of scope, this tool can also be used to quickly summarize the dataset, by giving few endmembers (”source”) and their abundances.

The methodology is the following: we proposed a way to approximate the dataset non-linearity by a linear mixture of abundance and source spectra (endmembers). We used unsupervised source separation in form of non-negative matrix factorization to estimate those quantities. Several methods are tested on synthetic and simulation data. Our approach is dedicated to detect minor species spectra rather than precisely quantifying them. On synthetic example, this approach is able to detect chemical compounds present in form of 100 hidden spectra out of 104, at 1.5 times the noise level. Results on simulated spectra of NOMAD-SO targeting CH4 show that detection limits goes in the range of 100–500 ppt in favorable conditions. Results on real martian data from NOMAD-SO show that CO2 and H2O are present, as expected, but CH4 is absent. Nevertheless, we confirm a set of new unexpected lines in the database, attributed by ACS instrument Team to the CO2 magnetic dipole.



中文翻译:

机器学习可自动识别新的次要物种

分析现代光谱数据集的主要困难之一是由于数据量大。例如,在大气透射光谱学中,自从任务开始以来,ESA ExoMars2016卫星上被称为跟踪气体轨道器(TGO)的NOMAD仪器的太阳掩星通道(SO)产生了约20000个采集序列的约1000万个光谱从2018年4月到2020年1月15日。其他数据集甚至更大,在Mars Express上的OMEGA或Mars Reconnaissance Orbiter上的CRISM拥有约数十亿个光谱。通常,在经过长时间的模型拟合和人工残差分析迭代过程后,会发现新的直线。在这里,我们提出了一种基于无监督机器学习的新方法,可以自动检测新的次要物种。尽管精确量化超出了范围,

方法如下:我们提出了一种通过丰度和源光谱(端元)的线性混合来近似数据集非线性的方法。我们使用非负矩阵分解形式的无监督源分离来估计这些量。在综合和模拟数据上测试了几种方法。我们的方法专用于检测次要物种光谱,而不是精确地量化它们。在合成示例中,此方法能够以噪声水平的1.5倍检测以10 4中的100个隐藏光谱形式存在的化合物。NOMAD-SO靶向CH 4的模拟光谱结果表明,在有利条件下,检出限为100–500 ppt。来自NOMAD-SO的真实火星数据的结果表明,CO 2如所期望的,存在H 2 O和H 2 O,但是没有CH 4。但是,我们在数据库中确认了一组新的意外行,这是ACS仪器团队将其归因于CO 2磁偶极子的原因。

更新日期:2020-11-09
down
wechat
bug