Elsevier

Big Data Research

Volume 25, 15 July 2021, 100235
Big Data Research

OL-HeatMap: Effective Density Visualization of Multiple Overlapping Rectangles

https://doi.org/10.1016/j.bdr.2021.100235Get rights and content

Abstract

Visualization of the density of multiple overlapping axis-aligned objects is a challenging computational problem that can inform large-scale visual analytics, in diverse domains. For example, when dealing with crowd simulations, we care about constructing interaction maps, and in urban planning we care about city areas mostly frequented by people, to name a few. The primary objective of this research is, given a large set of axis-aligned two-dimensional (2D) objects, or simply rectangles, to devise efficient and effective data visualization methods that inform whether, where and how much these rectangles overlap. Currently, such visualizations rely on inefficient implementations of determining the size of the overlapping rectangles that do not scale well and are hard to accomplish. Approximate methods have also been proposed in the literature. To the contrary of these approaches, we aim to address this problem by exploiting state-of-the-art computational geometry methods based on the sweep line paradigm. These methods are fast and can determine the exact size of the overlap of multiple axis-aligned objects, therefore can effectively inform the visualization method. Towards that end, we present OL-HeatMap, a novel type of a heat-map visualization that can be used to represent and perceive density of overlapping rectangles. Our experimental evaluation demonstrates the effectiveness of the proposed method in terms of both accuracy and running time for synthetic and real-world data-sets.

Introduction

Density-based information visualization methods are commonly employed in big data visual analytics. They provide powerful abstract representations of large data sets that can help one to quickly perceive areas of interest due to a large concentration of data points (or their absence). Amongst a plethora of visualization techniques for density, such as scatter plots or treemaps, we focus on one of the most commonly used density-based visualization methods, the heat-map. A heat-map is a graphical representation of data where data values are represented as colors. These colors depict the characteristics of the data based on problem-specific requirements. Typically, darker colors depict regions with higher amounts or concentrations of data values present, while the opposite is true for lighter colors. Variants of heat-maps have been used to show the density or distribution of data on a given region of interest. This technique provides a general view of numerical data, and it can be customized to suit statistical and categorical data variants. It can also be employed to show the results of clustering algorithms. As the rendered graphic is easy to understand, it is typically used to check the expected results versus the actual results of an algorithm.

A common tool employed in the construction of a heat-map visualization is related to bounding volumes, and specifically bounding boxes. A bounding volume is a visual abstraction that is used to approximate complex objects and simplify the visualization process. Such visual abstractions introduce some flexibility to the problem, allowing for faster computation while avoiding significant losses in the information visualized. For different objects in real life, different bounding volumes such as rectangles, cuboids, spheres, and hyper-planes can be used. Furthermore, when the shapes used are rectangles or cuboids, they can be axis-aligned, meaning that their sides are parallel to the respective coordinate axis. In this work, we focus on 2-dimensional axis-aligned bounding boxes which we refer to as rectangles for ease of understanding. Previously, such rectangles have been used to approximate geographical objects [1], for the construction of spatial data structures [2], but also in VLSI design [3], to name a few.

We are interested in creating density-based visualizations that offer insights about the interactions (relationships) of these rectangles on a Cartesian plane. To that end, we need to identify and report the density value (i.e., the number of rectangles that overlap) of every point on the Cartesian plane. In addition, for each of these overlaps we want to determine the size of the overlap and its location in the Cartesian plane. There are a handful of approaches to address this problem, with one of the most common being grid-based [4]. According to grid-based methods, first a uniform grid is defined that would separate the observation space into equal size grid cells. Then, the method determines the overlap of each grid cell to the input rectangles using well-established orthogonal range query methods [5], such as R-trees [6]. However, grid-based methods inherit several limitations. Constructing a spatial grid-based data structure and performing range queries for each grid cell is computationally expensive. Furthermore, the accuracy of the visualization results would greatly depend on the size of the grid (grid granularity). This presents an interesting trade-off where a small grid will be computationally more efficient but less accurate, and a large grid will provide more accurate representation of the overlaps, but at the expense of running cost. An illustrative example of this trade-off is shown in Fig. 1. We further elaborate on this trade-off in the methodology and experimental evaluation sections.

A more desirable outcome would be to be able to identify the exact location, density and size of any overlap among the available rectangles in the data-set directly. The simplest, brute-force approach to accomplish this is to compare every rectangle with every other rectangle, pair-wise first, then proceed to compare the overlap of every pair with every other object to find triple overlaps, and so on. As is apparent, the computational cost of such a method is prohibitively high. Instead, an approach that is commonly used to answer such geometric object overlap problems efficiently is the algorithmic paradigm known as the sweep-line or plane sweep algorithm [7]. Algorithms belonging to this category utilize a conceptual line that sweeps across the plane and quickly identify overlapping objects.

In this work, we employ a recently proposed variation of the sweep-line algorithm that is able to determine the exact location, size and number of multiple overlaps of n-dimensional geometric objects [8]. That method is using a sweep-line to construct an auxiliary data structure known as a region intersection graph and has the potential to significantly reduce the computation required for the effective visualization of the density of overlapping rectangles. Specifically, the main contributions of our work are as follows:

  • We present OL-HeatMap (OverLap HeatMap), a fast and exact density-based visualization method for effective representation of the overlaps of multiple axis-aligned rectangles, based on the sweep-line paradigm.

  • We introduce an evaluation metric that can be used to determine the accuracy of grid-based heat-map visualizations.

  • We conduct an extensive evaluation of the performance of OL-HeatMap which demonstrates that it significantly outperforms competitive grid-based methods, in terms of both running time and accuracy.

  • We build an interactive visualization system that demonstrates the effectiveness of OL-HeatMap in practice.

  • We make source code and data publicly available to encourage reproducibility of method and results.2

The remainder of this paper is organized as follows: Section 2 introduces notation and formally defines the problem of interest in this paper. Our proposed method, OL-HeatMap, along with the grid-based competitors and the overall computational framework are presented in Section 3. Section 4 presents a thorough experimental evaluation of the methods and algorithms. After reviewing the related work in Section 6, we conclude in Section 7.

Section snippets

The problem

In this section, we introduce notation, provide preliminaries and formally define the problem of interest. A summary of all notations used are present in Table 1.

Methodology

In this section, we present the steps required for the visualization of bounding box heat-maps using the different approaches mentioned in the previous section. We start by describing the grid-based technique and the data structures necessary for its implementation. We proceed by outlining the basic sweep-line algorithm concept and specifically the multiple overlap identification and the intersection graph data structure required for it. Finally, we introduce an evaluation metric that can be

Experimental evaluation

In this section we describe the design and execution of the experimental evaluation for the different methods mentioned. Details on the data-sets and computational environment used are provided, and a comparison of performance and accuracy is presented for the OL-HeatMap and baseline grid-based methods.

Proof-of-concept demo system

In this section, we discuss the demo dashboard of OL-HeatMap. We have designed our dashboard to have a client-server architecture which provides the functionality to effectively generate or load data-sets, find the overlaps of the bounding boxes, and visualize them accordingly.

Related work

The work in this paper is related to density-based visualization methods and methods for computing rectangle overlaps. Several key ideas have already been referenced throughout this paper, and here we present a more comprehensive view of existing work on these topics.

Conclusion

Density-based visualizations, such as heat-maps, constitute a popular approach to visualize and perceive large amounts of complex data points effectively. In this research, we focused on a heat-map-like representation for the case of overlapping rectangles. This is a visualization problem that can guide powerful big data visual analytics and inform several applications in diverse domains. However, current state-of-the-art approaches to the problem rely on ad hoc naive implementations or methods

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (38)

  • T. Matsuyama et al.

    A file organization for geographic information systems based on spatial proximity

    Comput. Vis. Graph. Image Process.

    (1984)
  • R. Lubbe et al.

    Analysis of parallel spatial partitioning algorithms for GPU based DEM

    Comput. Geotech.

    (2020)
  • D. Papadias et al.

    Spatial relations, minimum bounding rectangles, and spatial data structures

    Int. J. Geogr. Inf. Sci.

    (1997)
  • J. Fang et al.

    A new fast constraint graph generation algorithm for VLSI layout compaction

  • W.R. Franklin et al.

    Uniform grids: a technique for intersection detection on serial and parallel machines

  • Y. Nekrich

    A linear space data structure for orthogonal range reporting and emptiness queries

    Int. J. Comput. Geom. Appl.

    (2009)
  • A. Guttman

    R-trees: a dynamic index structure for spatial searching

    SIGMOD Rec.

    (1984)
  • M.I. Shamos et al.

    Geometric intersection problems

  • T. Pechlivanoglou et al.

    Efficient mining and exploration of multiple axis-aligned intersecting objects

  • J.L. Bentley et al.

    Algorithms for reporting and counting geometric intersections

    IEEE Trans. Comput.

    (1979)
  • D. Eppstein et al.

    Listing all maximal cliques in sparse graphs in near-optimal time

  • C. Bron et al.

    Algorithm 457: finding all cliques of an undirected graph

    Commun. ACM

    (1973)
  • N. C. for Environmental Information, Storm events database

  • H. Wickham

    Bin-summarise-smooth: a framework for visualising large data

    (2013)
  • T.N. Dang et al.

    Stacking graphic elements to avoid over-plotting

    IEEE Trans. Vis. Comput. Graph.

    (2010)
  • D.B. Carr et al.

    Scatterplot matrix techniques for large n

    J. Am. Stat. Assoc.

    (1987)
  • J. Matejka et al.

    Dynamic opacity optimization for scatter plots

  • G. Ellis et al.

    A taxonomy of clutter reduction for information visualisation

    IEEE Trans. Vis. Comput. Graph.

    (2007)
  • E. Bertini et al.

    See what you know: analyzing data distribution to improve density map visualization

  • Cited by (3)

    • Physiological and biochemical changes during fruit maturation and ripening in highbush blueberry (Vaccinium corymbosum L.)

      2023, Food Chemistry
      Citation Excerpt :

      A heatmap is a visualization graphical representation of data where data values are represented as colors. These colors depict the characteristics of the data based on problem-specific requirements (Costa et al., 2021). Typically, darker colors depict regions with higher amounts or concentrations of data values present, while the opposite is true for lighter colors.

    1

    These authors contributed equally.

    View full text