当前位置: X-MOL 学术Astron. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
HiSS-Cube: A scalable framework for Hierarchical Semi-Sparse Cubes preserving uncertainties
Astronomy and Computing ( IF 2.5 ) Pub Date : 2021-04-01 , DOI: 10.1016/j.ascom.2021.100463
J. Nádvorník , P. Škoda , P. Tvrdík

A wide variety of approaches are available for big data cube visualization and analysis. However, few exploit the power of array databases and none preserve the scientific uncertainties in measurements when constructing lower resolutions. In machine learning applications, we often need to rapidly search data for regions of interest and then focus on these areas, but without having to retrain the model every time we change the resolution. However, the reliable verification of these areas also requires details of the accuracy of the measured values. In this study, we developed a new software infrastructure called Hierarchical Semi-Sparse Cube (HiSS-Cube) based on Hierarchical Data Format version 5. HiSS-Cube enables visualization and machine learning using combined heterogeneous data and it was designed to be scalable for big data. HiSS-Cube allows data from multiple domains (imaging, spectral, and time-series data) to be combined and the construction of a multi-resolution semi-sparse data cube that preserves the uncertainties of scientific measurement at all resolutions. The functionality of HiSS-Cube was verified based on a subset of the Sloan Digital Sky Survey Stripe 82 survey. We compared the times and volumes for visualizations and machine learning data exported to HiSS-Cube and the original format (FITS). Using these data, we demonstrated that HiSS-Cube is faster by several orders of magnitude. HiSS-Cube supports export to the VOTable format and it is compatible with common Virtual Observatory tools. The source code for our prototype HiSS-Cube is available from GitHub and the data are available from Zenodo.



中文翻译:

HiSS-Cube:用于分层半稀疏多维数据集的可扩展框架,可保留不确定性

各种各样的方法可用于大数据多维数据集的可视化和分析。但是,在构建较低的分辨率时,很少利用阵列数据库的功能,也没有保留测量中的科学不确定性。在机器学习应用程序中,我们经常需要快速搜索感兴趣区域的数据,然后专注于这些区域,但是不必每次更改分辨率时都需要重新训练模型。但是,要对这些区域进行可靠的验证,还需要详细说明测量值的准确性。在本研究中,我们基于层次数据格式版本5开发了一种称为层次半稀疏多维数据集(HiSS-Cube)的新软件基础结构。HiSS-Cube可以使用组合的异构数据实现可视化和机器学习,并且可扩展为大型数据。HiSS-Cube允许合并来自多个域的数据(成像,光谱和时间序列数据),并构建多分辨率半稀疏数据立方体,从而保留所有分辨率下科学测量的不确定性。基于Sloan Digital Sky Survey Stripe 82调查的子集验证了HiSS-Cube的功能。我们比较了导出到HiSS-Cube的可视化和机器学习数据的时间和数量以及原始格式(FITS)的时间和数量。使用这些数据,我们证明了HiSS-Cube的速度要快几个数量级。HiSS-Cube支持导出为VOTable格式,并且与常见的虚拟天文台工具兼容。我们的原型HiSS-Cube的源代码可从GitHub获得,数据可从Zenodo获得。和时间序列数据)相结合,并构建了一个多分辨率半稀疏数据立方体,该立方体保留了所有分辨率下科学测量的不确定性。基于Sloan Digital Sky Survey Stripe 82调查的子集验证了HiSS-Cube的功能。我们比较了导出到HiSS-Cube的可视化和机器学习数据的时间和数量以及原始格式(FITS)的时间和数量。使用这些数据,我们证明了HiSS-Cube的速度要快几个数量级。HiSS-Cube支持导出为VOTable格式,并且与常见的虚拟天文台工具兼容。我们的原型HiSS-Cube的源代码可从GitHub获得,数据可从Zenodo获得。和时间序列数据)相结合,并构建了一个多分辨率半稀疏数据立方体,该立方体保留了所有分辨率下科学测量的不确定性。基于Sloan Digital Sky Survey Stripe 82调查的子集验证了HiSS-Cube的功能。我们比较了导出到HiSS-Cube的可视化和机器学习数据的时间和数量以及原始格式(FITS)的时间和数量。使用这些数据,我们证明了HiSS-Cube的速度要快几个数量级。HiSS-Cube支持导出为VOTable格式,并且与常见的虚拟天文台工具兼容。我们的原型HiSS-Cube的源代码可从GitHub获得,数据可从Zenodo获得。

更新日期:2021-05-14
down
wechat
bug