Elsevier

Future Generation Computer Systems

Volume 112, November 2020, Pages 695-708
Future Generation Computer Systems

Designing a GPU-parallel algorithm for raw SAR data compression: A focus on parallel performance estimation

https://doi.org/10.1016/j.future.2020.06.027Get rights and content

Highlights

  • Raw SAR images can be compressed through GPU on-board satellites and aircrafts.

  • No significant quality degradation is measured when focusing decompressed images.

  • Algorithm performance on GPU can be predicted by means of algorithmic overhead estimation.

Abstract

When a Synthetic Aperture Radar (SAR) acquires raw data using a satellite or airborne platform, it must be transferred to the ground for further processing. For example, SAR raw data need a so-called ’focusing’ signal processing to render it into a visible image. Such processing is time and computing consuming, and it is commonly carried out in computing centres. Since the data transfer rate is a typical limitation when communicating with the ground station, compression is necessary to reduce transmission time. So far, this procedure has been implemented in application-specific hardware, but recent adoption of avionic computational GPUs opened to new high-performance onboard perspectives. Due to the limited availability of avionic GPUs, we focused on parallel performance estimation starting from measures relative to a similar off-the-shelf solution. In this paper, we present a GPU algorithm for raw SAR data compression, which uses 1-dimensional DCT transforms, followed by quantisation and entropy coding. We evaluate results using ENVISAT (Environmental Satellite) ASAR Image Mode level 0 data by measuring compression rates, statistical parameters, and distortion on decompressed and then focused images. Moreover, by evaluating the Algorithmic Overhead induced by the parallelisation strategy, we predict the best thread-block configuration for possible adoption of such a GPU algorithm on one of the most available avionic hardware.

Introduction

Synthetic Aperture Radar (SAR) is an active microwave imaging technology that plays an important role in remote sensing and observation applications, such as environment monitoring in all-day and all-weather contexts. Thanks to interferometric techniques, it makes it possible to monitor biophysical parameters such as vegetation canopy or atmospheric constituents [1].

By exploiting the motion of a moving plane or satellite, a SAR sensor creates a synthetic aperture much larger than the actual radar antenna carried onboard. This synthetic aperture is achieved by transmitting pulses as it moves, and by recording their echo from the ground. When the platform moves along a straight direction, a sensor collects received data in a natural coordinate system with respect to the ground image. We refer to the azimuth direction for the one following the sensor motion, while the scanning towards the ground in every azimuth position is made along the range direction (see Fig. 1).

SAR systems can acquire data on very long land swaths with high resolution. Such data consists in complex (in-phase and quadrature) matrices, but unlike those obtained with optical sensors, a post-processing procedure (focusing) is necessary to form a comprehensible final image [2].

By increasing resolution, the fidelity of imaged areas along azimuth and range direction increases likewise, as well as the amount of data acquired and afterwards processed. With the idea of efficiently transfer these huge matrices, it is necessary to develop an effective and efficient compression algorithm. The limitations on onboard hardware impose some constraints in algorithm design, and an application-specific computing chip in SAR sensors is usually employed for data compression. However, in recent time, after the development of avionic specialised computing accelerators (GPUs), and thanks to the introduction of GPU Computing in the realm of high-performance embedded computing [3], [4], new opportunities arose. Single-precision GFLOPS on GPU cost an order of magnitude less than on CPU, and result in lower power consumption. While the study of GPU-parallel algorithms for real-time focusing delivered some results [5], [6], [7], which could be useful for tactical reasoning, no significant literature has been produced about raw SAR data compression on GPU.

Our idea is to exploit the onboard avionic GPU computing resources on both strategic and tactical points of view. Indeed, we can imagine an onboard computing platform with multiple GPUs connected to both a SAR sensor and a ground transmitter [8] through GPUDirect RDMA [9] technology. In this setting, multiple GPU functionalities could be implemented and employed on request, where raw SAR data compression is applied when ground service processing and consequential strategic planning is needed.

Final focused SAR images present significant structures which could be exploited during a compression stage, but those cannot be used, if image formation is performed at the ground station only later. Consequently, onboard raw data compression implies the analysis of less exploitable structures with apparently higher entropy [10]. In-phase and quadrature components show quasi-independence as well as low inter-pixel correlation [11], and their histograms present a nearly Gaussian shape with identical variance, giving no hints about possible data structures. For this reason, conventional image compression techniques are ill-suited to this scope. Nevertheless, if we notice that signal power changes slowly both on azimuth and range directions, some compression strategy can be taken into consideration [12].

The first well-known and most widely recognised method of raw SAR data compression is the block-adaptive quantisation (BAQ) developed for the NASA Magellan mission to Venus [11]. In BAQ, a scalar quantiser (usually a Max-Lloyd quantiser [13]) processes blocks of raw SAR data with a given set of parameters updated from block to block and controlled by the statistics of raw SAR data. In this way, data gets quantised with fewer bits than those required by Shannon entropy, and over a small-time interval, the entropy of a data block is lower than that of the whole data set. Within each block, in-phase and quadrature components are usually processed separately, but in [14] data is converted into polar coordinates before performing BAQ. Other variants of scalar-quantisation techniques include the entropy-constrained BAQ (ECBAQ) [15], which employs a uniform quantiser followed by an entropy coder. Such entropy coder could be a Huffman algorithm or the adaptive arithmetic coder.

Also, the exploitation of vector quantisation has been explored in conjunction with BAQ, as it theoretically always achieves a better performance than scalar coding, even if it costs added computational complexity [16]. The block adaptive vector quantisation (BAVQ) first compresses the raw SAR data using BAQ and then uses these data as input of a vector quantisation [17].

The algorithms based on BAQ were formulated for data compression in the time domain. However, they can also be used in the transform domain if the transformed values are also Gaussian distributed [18]. Combining frequency-domain methods with BAQ increases the request of onboard computational performance, but it also provides better compression performance and therefore looks promising in a GPU context. The application of ECBAQ in the frequency domain can produce even better results [15] if the blocks are large enough to obtain an accurate estimate of the standard deviation.

Evaluation of distortion on compressed images can be performed only after decompression and image focusing, making the rate–distortion control more difficult. For this reason, many authors prefer to use parameters as Signal Difference to Noise Ratio, Integrated Side Lobe Ratio [19], and Peak to Side Lobe Ratio on unfocused images to evaluate their algorithms. Some authors apply image compression codecs to raw SAR data, taking advantage of specialised hardware implementations. A JPEG2000-based implementation compresses separate in-phase and quadrature components of the raw SAR data [20], but the distortion is computed on the raw SAR data itself rather than on the focused image, such that final image quality has not been assessed.

The implementation of a GPU-parallel algorithm for raw SAR data compression is a new approach, and the challenge is to provide compression in real-time. Due to limited financing for this project, we had to deal with the lack of an avionic GPU.

Considering our results from previous experiences in GPU computing, also based on special devices [21], [22], [23], we can suppose that the algorithmic logic does not change if using an avionic product or an off-the-shelf hardware solution. Hence, we decided to develop and test our algorithm on the latter. With this intent, and in order to exploit the massive parallelism of GPUs, we applied the approach proposed in [24].

In this paper, we present a GPU-parallel algorithm for raw SAR data compression based on transform coding and designed following the sequential version presented in [2]. After applying a Discrete Cosine Transform (DCT) to expose some data structure, data are quantised using a BAQ, and finally, an Arithmetic Coding (AC) [25] is performed to compress such structure. This approach exposes some degree of parallelism which could fit a GPU architecture.

One key point in designing a GPU-parallel algorithm and its related software is the detection of the best thread-block configuration to achieve the shortest execution time on a specific piece of hardware. This analysis concerns: the architecture of Streaming Multiprocessors; memory availability at different levels; and several execution batches to measure actual timings. Following such procedure, often the best configuration does not reflect expectations. In case of long-running times, this search could take more than planned, primarily when each kernel employs a different partitioning of data blocks, as it happens in our case. With the idea of easing the task, we introduce the measure of Algorithmic Overhead to estimate the load balancing of the software on a specific GPU corresponding to different data partitioning and several thread-block configurations. This tool will give us useful indications to estimate the execution time on the avionic computing board if combined with actual measures taken on a similar off-the-shelf GPU.

By simulating compression using real ENVISAT [26] ASAR Image Mode level 0 [27] data on off-the-shelf computing components, we show that our GPU algorithm, when appropriately configured, fits in the time constraints induced by the data-link to the ground station.

Intending to validate the results for visible distortion, we evaluate some image quality parameters on focused images obtained from both original and processed data. Furthermore, to ponder other distortions, we measure some statistical parameters on both uncompressed and compressed/decompressed data, as delineated on [28].

This paper is organised as follows. In the next section, we provide a brief background and an overview of existing raw SAR data compression approaches, focalising on transform coding. Section 3 presents our approach to the design of an efficient GPU-parallel algorithm with the aid of a mathematical framework for performance evaluation. Experimental testing is presented in Section 4, with details on possible configurations to achieve optimal performance on avionic hardware. We discuss our results and conclude in Section 5.

Section snippets

Transform coding for raw SAR data

The SAR acquisition process naturally defines a matrix representation of raw data. Each row contains samples in the range direction and thus corresponding to a single azimuth location (Fig. 2(a)). Each element of the matrix is a complex number representing the in-phase and quadrature (i/q) components of the SAR signal. In the ENVISAT Image Mode context, the sensor produces a 16 bits integer for each pixel, with 8 bits for the in-phase and 8 bits for the quadrature component.

Such configuration

Towards a GPU-parallel algorithm

Before considering possible GPU-parallel strategies, we should examine the memory requirements of the adopted sequential algorithm. If we focus on the avionic platform where such a compression algorithm should be executed and on its data flux, when the sensor receives radar echoes, it produces a consequent data stream. As soon as a sufficient amount of buffered raw data is available, a computing unit could immediately compress a package of data and send it to the ground station. If the

Software testing

We implemented the GPU-parallel algorithm using CUDA 10 environment on a workstation provided with Intel Core i5-650 Processor, 6 GBs RAM, Ubuntu 18.04, NVIDIA GeForce GTX 780 OC (3071 MBytes global mem., 2304 cores [12 SMs × 192 SPs], 1058 MHz). This configuration is not optimal because avionic/space systems, thanks to the presence of ECC memories, employ data consistency and therefore are more reliable. However, the chosen configuration has a GPU with Cuda Capability and underlying computing

Conclusion

The necessity of an efficient algorithm to compress raw SAR data is not new in the aerospace context. So far, this function has been implemented on special-purpose chips embedded in the sensors. Thanks to the production of avionic GPUs, now we can imagine such computational resources used for several onboard purposes, and one of these is the raw SAR data compression.

In this work, we presented a convenient GPU-parallel algorithm, appropriately configured to run on an avionic system. By analysing

CRediT authorship contribution statement

Diego Romano: Conceptualization, Methodology, Software, Validation, Investigation, Writing-original draft, Writing-review & editing. Marco Lapegna: Methodology, Software, Data curation, Writing-review & editing. Valeria Mele: Methodology, Formal analysis, Writing-original draft. Giuliano Laccetti: Methodology, Writing-review & editing, Supervision.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Diego Romano was awarded a M.S. in Mathematics in 2000, and a Ph.D. degree in Computational and Computer Sciences from the University of Naples Federico II, Italy, in 2012. He obtained a permanent position as researcher at the Italian National Research Council (CNR) in 2008, where he is currently employed at the Institute for High Performance Computing and Networking (ICAR). His research interests include performance and design of GPU Computing algorithms. Within this field, he works, for

References (40)

  • NVIDIA CorporationD.

    Developing a linux kernel module using RDMA for gpudirect

    (2019)
  • CurlanderJ.C. et al.

    Synthetic Aperture Radar, vol. 396

    (1991)
  • KwokR. et al.

    Block adaptive quantization of Magellan SAR data

    IEEE Trans. Geosci. Remote Sens.

    (1989)
  • BoustaniA.E. et al.

    A review of current raw SAR data compression techniques

  • GershoA. et al.

    Vector Quantization and Signal Compression, vol. 159

    (2012)
  • MonetP. et al.

    Block adaptive quantization of images

    IEEE Trans. Commun.

    (1993)
  • AlgraT.

    Compression of raw SAR data using entropy-constrained quantization

  • CoverT.M. et al.

    Elements of Information Theory

    (2012)
  • MoreiraA. et al.

    Fusion of block adaptive and vector quantizer for efficient SAR data compression

  • BenzU. et al.

    A comparison of several algorithms for SAR raw data compression

    IEEE Trans. Geosci. Remote Sens.

    (1995)
  • Cited by (0)

    Diego Romano was awarded a M.S. in Mathematics in 2000, and a Ph.D. degree in Computational and Computer Sciences from the University of Naples Federico II, Italy, in 2012. He obtained a permanent position as researcher at the Italian National Research Council (CNR) in 2008, where he is currently employed at the Institute for High Performance Computing and Networking (ICAR). His research interests include performance and design of GPU Computing algorithms. Within this field, he works, for instance, on the Global Illumination problem in Computer Graphics, and on a mathematical model for performance analysis.

    Marco Lapegna received the master degree in Mathematics and the Ph.D. in Applied Mathematics and Computer Science from the University of Naples Federico II, where is currently associate professor in Computer Science. His main research interests are related to algorithms, data structures and environments for parallel, distributed and grid computing, with special regard to computational mathematics and scientific computing. He has been involved in several national and international project funded by EU. He has been organiser and chair of international workshops and programme committee member of several international conferences as well as guest editor of special issues on international journals.

    Valeria Mele today is a Researcher at University of Naples Federico II (Naples, Italy). Degree in Informatics and Ph.D. in Computational Science. Her research activity has been mainly focused on development and performance evaluation of parallel algorithms and software for heterogeneous, hybrid and multilevel parallel architectures, from multicore, to GPU-enhanced machines and modern clusters and supercomputers. After attending the Argonne Training Program on Extreme-Scale Computing (ATPESC) and visiting the Argonne National Laboratory (ANL, Chicago, Illinois, USA) several times, she is now mainly working on the designing, implementation and performance prediction/evaluation of software with/for the PETSc library.

    Giuliano Laccetti is professor of computer science at the University of Naples Federico II, Italy. He received his Laurea degree (cum laude) in Physics from the University of Naples; his main research interests are Mathematical Software, High Performance Architecture for Scientific Computing, Distributed Computing, Grid and Cloud Computing, Algorithms on emerging hybrid architectures (CPU+GPU, ), Internet of Things. He has been organiser and chair of several Workshops joint to larger International Conferences. He is author (or co-author) of about 100 papers published in refereed international Journals, international books, and International Conference Proceedings.

    View full text