当前位置: X-MOL 学术Astron. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Comparison of data storage and analysis throughput in the light of high energy physics experiment MACE
Astronomy and Computing ( IF 2.5 ) Pub Date : 2020-08-20 , DOI: 10.1016/j.ascom.2020.100409
D. Sarkar , Mahesh P. , Padmini S. , N. Chouhan , C. Borwankar , A.K. Bhattacharya , A.K. Tickoo , R.C. Rannot

High Energy Physics (HEP) Experiments produce large amounts of data. The data produced in these experiments are in the range of terabytes and petabytes. The explosion of data has posed a challenge in data capture, storage, data integrity, searching, querying, visualization and analysis. This has led to the development of domain-specific file formats like FITS, HDF5, analysis frameworks like ROOT, storage architectures like relational and NoSQL databases, and parallel and distributed data handling methodologies. In this paper, we investigate the read–write performance by comparing the HEP domain-specific framework ROOT and a NoSQL database Berkeley DB in the context of a gamma-ray Cerenkov experiment to meet the requirement of real-time data analysis. Major Atmospheric Cerenkov Experiment (MACE) is a 21 m gamma-ray telescope set up by BARC at HANLE, India. It will generate a few hundred gigabytes of data per observational night. Aiming at the real-time analysis of the data we have developed a dynamic reading mechanism by implementing a binary type provider for data retrieval from the Berkeley DB database. Data analysis queries were performed and compared both in ROOT files using ROOT query methods and in Berkeley DB using Language Integrated Queries (LINQ). Finally, a generic framework facilitating the online analysis of the data is proposed in this paper.



中文翻译:

根据高能物理实验MACE比较数据存储和分析吞吐量

高能物理(HEP)实验产生大量数据。这些实验中产生的数据在TB和PB范围内。数据的爆炸式增长对数据捕获,存储,数据完整性,搜索,查询,可视化和分析提出了挑战。这导致了特定领域文件格式的开发,如FITS,HDF5,分析框架(如ROOT),存储体系结构(如关系数据库和NoSQL数据库)以及并行和分布式数据处理方法。在本文中,我们通过在伽马射线Cerenkov实验的背景下比较HEP特定于域的框架ROOT和NoSQL数据库Berkeley DB来研究读写性能,以满足实时数据分析的要求。主要大气切伦科夫实验(MACE)是BARC在HANLE建立的21 m伽马射线望远镜,印度。每个观测夜将产生数百GB的数据。针对数据的实时分析,我们通过实现用于从Berkeley DB数据库检索数据的二进制类型提供程序,开发了一种动态读取机制。在使用ROOT查询方法的ROOT文件中和在使用Language Integrated Queries(LINQ)的Berkeley DB中都进行了数据分析查询并进行了比较。最后,本文提出了促进数据在线分析的通用框架。在使用ROOT查询方法的ROOT文件中和在使用Language Integrated Queries(LINQ)的Berkeley DB中都进行了数据分析查询并进行了比较。最后,本文提出了促进数据在线分析的通用框架。在使用ROOT查询方法的ROOT文件中和在使用Language Integrated Queries(LINQ)的Berkeley DB中都进行了数据分析查询并进行了比较。最后,本文提出了促进数据在线分析的通用框架。

更新日期:2020-08-20
down
wechat
bug