Full length article
Development and application of an HDF5 schema for SKA-scale image cube visualization

https://doi.org/10.1016/j.ascom.2020.100389Get rights and content

Abstract

In this paper, we describe an HDF5 schema created to support the efficient visualization of the large image cubes that will be produced by SKA Phase 1 and precursor radio telescopes. We demonstrate how the “HDF5-IDIA” schema’s features can improve the performance of visualization software, using both low-level metrics and real-world tests of the schema’s implementation in CARTA, an image viewer that is being developed to replace the existing CyberSKA and CASA viewers.

Introduction

Data produced by radio telescopes such as MeerKAT (Jonas et al., 2018), as well as other Square Kilometer Array (SKA) pathfinders, is now being used to create multi-dimensional image cubes which are large enough to pose multiple challenges to visualization software, requiring an increase in storage space, memory and computation resources. In our ongoing effort to facilitate efficient access to these images, we realized that the way in which the data was represented on storage systems created a considerable limitation to the performance that we could achieve. In particular, we wanted the ability to store additional pre-calculated representations and access them through a well-defined hierarchy.

Most astronomical image files are currently packaged using the FITS standard1  (Wells and Greisen, 1979). However, the FITS standard is primarily used for image transport and archiving, and is not well-suited for storing or defining additional derived data products in a hierarchical structure. The HDF5 technology suite (Folk et al., 2011) provides a data model, file format, API, library, and tools to enable the creation of structured schemas for different applications. We will show how these can be beneficial in packaging of large radio astronomy image cubes for the purpose of visualization and visual analytics.

We initially attempted to utilize existing HDF5 schemas developed for image cubes, but found that they did not meet our needs. The LOFAR HDF5 schema (Anderson et al., 2010, Alexov et al., 2012) did not meet performance requirements: each 2D image plane is stored in a separate group, therefore a separate group and dataset must be opened for each pixel when data is read along the third axis of the image cube.

The HDFITS schema (Price et al., 2015) serves as a starting point for an HDF5 schema that maintains round-trip compatibility with the FITS format, but lacks the additional structures required for precalculated and cached datasets. We have therefore created a new schema tailored to our application, with a hierarchy similar to that of HDFITS, but extensions have been added to support a number of features required for efficient visualization of large datasets.

The rest of the paper is laid out as follows: Section 2 details the requirements we have for the new schema. Section 3 considers the types of workloads that datasets using this schema will commonly be used for, in the context of image cube visualization. Section 4 describes the optional datasets that are defined in the schema in order to accelerate these workloads, as well as an outline of the schema hierarchy and naming conventions. Section 5 details integration of support for the schema into CARTA: The Cube Analysis and Rendering Tool for Astronomy (Comrie et al., 2018), as of Version 1.2 of the software package. Section 6 shows performance metrics of the schema with low-level benchmarks, and compares its performance to that of FITS within CARTA.

Section snippets

Requirements

Our application is the use of client–server visualization tools to view large image cubes remotely, with the image cube remaining on the server and data that is currently being examined streamed to a browser-based viewer on an end user’s computer. We are currently working with data from the MeerKAT Large Survey Projects (LSPs) and the Atacama Large Millimeter/submillimeter Array (ALMA) (Wootten and Thompson, 2009), with the aim of developing a tool that will scale to support data produced by

Common workloads

Some commonly used workloads are described below, along with their associated read access patterns. The access pattern calculations are based on a contiguous data layout, and would be different if a tiled data layout, such as the HDF5 file format’s chunking approach (Folk et al., 2011) or the CASA file format (McMullin et al., 2007), were used.

Optional datasets

To accelerate the workloads described in Section 3, several different types of datasets are required, each of which is discussed below. These datasets are defined in the schema hierarchy described in Section 4.2, but each dataset is optional, and applications attempting to read files adhering to the schema should not require any of the optional datasets to be included in a file.

Integration into CARTA

We have integrated support for HDF5 files which use our schema into the CARTA software package, a viewer designed to provide performant access to very large astronomical images, and developed as an eventual replacement viewer for both the CASA astronomical software package (McMullin et al., 2007) and the CyberSKA portal (Kiddle et al., 2011).

CARTA has a client–server model: remotely stored images are viewed through a web interface. Portions of image data are read by the server, downsampled to

Standalone performance tests

We compared the execution time of common imaging workloads described in Section 3.1, 3.2 Single-pixel profiles, 3.3 Region profiles and statistics when data was read from the original dataset and from a permuted copy. Measurements were performed on three sets of synthetic images created using the Astropy package (Robitaille et al., 2013) and filled with Gaussian noise, with increasing square image area and increasing numbers of channels.5

Summary

In this paper we have presented a new HDF5 schema for astronomical image data. We have explained our motivation for creating this schema to support our requirements for the visualization of large data from radio astronomy. We have provided an overview of the schema and the types of data access patterns that it supports. Tests of reading from a permuted dataset defined in the schema show significant benefits for commonly performed workloads, on HDD- and SSD-based file systems. Speedups in the

CRediT authorship contribution statement

A. Comrie: Conceptualization, Methodology, Software, Writing - original draft. A. Pińska: Software, Writing - review & editing, Visualization, Formal analysis. R. Simmonds: Writing - review & editing. A.R. Taylor: Supervision, Writing - review & editing, Resources.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (20)

  • PriceD. et al.

    Astron. Comput.

    (2015)
  • AlexovA. et al.

    Astron. Data Anal. Softw. Syst. XXI

    (2012)
  • AndersonK. et al.

    PoS

    (2010)
  • ComrieA. et al.

    CARTA: The cube analysis and rendering tool for astronomy

    (2018)
  • DeaneR.
  • FolkM. et al.
  • HassanA. et al.

    Publ. Astron. Soc. Aust.

    (2011)
  • HeichlerJ.

    An Introduction to BeeGFSTechnical Report

    (2014)
  • Van der HulstJ. et al.
  • JarvisM. et al.
There are more references available in the full text version of this article.

Cited by (7)

  • HiSS-Cube: A scalable framework for Hierarchical Semi-Sparse Cubes preserving uncertainties

    2021, Astronomy and Computing
    Citation Excerpt :

    HDF5 has been used for astronomical data by porting FITS to HDF5 (Price et al., 2015) or porting HDS to HDF5 (Jenness, 2015). The latest similar application is an alternative storage scheme for SKA data (Comrie et al., 2020). However, our aims were slightly different because they focused primarily on comparing the FITS performance and functionalities, or converting a different data format such as HDS in order to maintain backward-compatibility with the existing clients.

  • VIRAC maser data processing suite

    2021, Astronomical and Astrophysical Transactions
View all citing articles on Scopus
View full text