Elsevier

Information Systems

Volume 95, January 2021, 101620
Information Systems

Collaborative filtering over evolution provenance data for interactive visual data exploration

https://doi.org/10.1016/j.is.2020.101620Get rights and content

Highlights

  • Proposal of a visual data exploration system with content and collaborative recommendations.

  • Proposal of several merge techniques to aggregate users exploration sessions in a multi-user graph.

  • Proposal of several optimizations to improve collaborative-filtering recommendation computation.

  • Quantitative evaluation of collaborative-filtering recommendations techniques.

  • Qualitative evaluation of users experiences when visually exploring data using our system.

Abstract

In interactive visual data exploration, users rely on recommendations on what data to explore next. EVLIN is a system that recommends queries to retrieve these data for the next exploration step, paired with suited visualizations. This paper extends EVLIN by combining its content-based recommendations with recommendations leveraging collaborative filtering to improve the effectiveness of recommendation-based visual data exploration. The recommendations rely on evolution provenance, which tracks users’ interactions during interactive visual data exploration. As more users explore a dataset, the evolution provenance of individual user explorations is incrementally integrated into a multi-user graph, for which we present match and merge algorithms. To compute collaborative-filtering recommendations, we present a search algorithm and optimizations to efficiently search queries similar to a current user’s query in the multi-user graph and give preference to queries that have been previously explored in an exploration step succeeding those similar queries. Our experimental evaluation studies the efficiency and effectiveness of the solutions proposed in this paper and demonstrates that using the full system with both content-based and collaborative-filtering recommendations enabled allows for effective interactive visual data exploration.

Introduction

Interactive visual data exploration systems support users in investigating data by providing facilities to express queries and to visualize their results. In such systems, users interact with visualizations of query results, which triggers further queries to be executed and their results visualized back to the user. Clearly, interactive visual data exploration is an iterative process, where users transition from one data visualization to the next through interactions such as interactive selection or details-on-demand [1].

To assist users by leading the way in transitioning from one interesting data visualization to the next, recent visual data exploration tools have incorporated recommendations to assist users in querying and in visualizing interesting regions of the data. Most tools rely on content-based recommendation techniques that base their recommendations on data available in a user’s exploration session, such as prior queries of the same user [2], [3], [4]. Collaborative-filtering leverages information from previous explorations made by multiple users to identify prior similar exploration queries worth recommending to the current user. A first work that focuses on query recommendations based on collaborative-filtering for data exploration is [5]. However, to the best of our knowledge, no visual data exploration system blends the two aforementioned recommendation techniques.

This paper presents EVLIN++, a system that bridges this gap. Our motivation to combine both recommendation techniques is to improve the quality or effectiveness of the visual data exploration experience, as recommendations then also take into account globally interesting trends instead of solely relying on the often limited local information underlying content-based recommendations. EVLIN++ extends our EVLIN system [6], [7] that relies on content-based recommendations by integrating collaborative recommendations, as introduced in [8]. The collaborative-filtering recommendations rely on evolution provenance, which tracks user actions and selections alongside their exploration. Overall, this paper makes the following contributions:

System architecture. We present the overall architecture of EVLIN++, which, in addition to the components of the original EVLIN system, has components for storing, aggregating, and analyzing evolution provenance for the purpose of collaborative filtering. It further alters the component that quantifies the interestingness of individual recommendations, as it takes into account scores from both the content-based and the collaborative-filtering based recommendations.

Collaborative-filtering. While we have described the general idea of incorporating collaborative-filtering into EVLIN in a short paper [8], we neither discussed detailed algorithms nor optimizations. This paper describes the two steps underlying our collaborative-filtering based recommendations in detail.

In the first step, we merge a description of a user’s exploration session, i.e., its evolution provenance, into a global multi-user interaction graph based on a similarity matching between the local session graph and the global graph. For this, we discuss several techniques to incrementally merge evolution provenance graphs into a global graph. The different approaches trade off merge efficiency and effectiveness.

In the second step, we compute recommendations for next exploration steps by first searching exploration steps similar to a user’s current step in the global graph and then recommending and scoring steps adjacent to those in the global graph. This paper presents our baseline collaborative-filtering query recommendation algorithm and optimizations to improve its runtime.

Implementation and evaluation. We implement the full system and study the performance of both steps of the collaborative-filtering recommendations presented in this paper. We also evaluate the performance of the interactive visual data exploration achieved when combining our content-based and collaborative-filtering-based query recommendations on both synthetic and real data. The results show the efficiency and the effectiveness of our proposed optimized solutions for interactive visual data exploration.

Section 2 introduces the general architecture and processing of EVLIN++. Section 3 describes our evolution provenance model, providing necessary preliminaries and definitions. Merging evolution provenance graphs is discussed in Section 4. Section 5 covers collaborative-filtering recommendation computation. Section 6 discusses how we integrate collaborative-filtering recommendations and the content-based recommendations to obtain a final set of ranked recommendations. Section 7 presents our experimental evaluation. We discuss related work in Section 8 and conclude in Section 9.

Section snippets

System overview

This section introduces the general architecture of our provenance-based visual data exploration system EVLIN++ in Section 2.1. We further illustrate how the system behaves through a running example (Section 2.2).

Evolution provenance database

As we have seen, the iterative exploration process moves from one query, denoted Q, to another query Q at each exploration step. The queries, which we call exploration queries, are generally aggregation queries over data stored in a data warehouse D.

Definition 1 Exploration Query

Given a data warehouse D with a fact table T, a set of measures M in T, a set of dimension tables A={R1,,Rn}, n1, and a set of aggregate functions F, an exploration query Q is a SQL query of the form SELECTa1,,ag,f(m)FROMT,R1,,RkWHEREC1ANDC2GROU

Merging of evolution provenance graphs

In this section, we discuss how the evolution provenance aggregator obtains a multi-user exploration graph. It takes as input a single exploration session graph GXS, GMU, and a similarity threshold θsim. To merge GXS into GMU, it determines a one-to-one matching between the set of nodes of both graphs (more precisely, the queries of exploration steps) and then merges matching nodes. Note that for simplicity, we slightly abuse the notation and consider that nodes of both graphs represent queries

Collaborative-filtering recommendation computation

The multi-user graph GMU resulting from the algorithm presented in the previous section is maintained in the evolution provenance database. This database is accessed during the collaborative recommendations computation. In this section, we first present a baseline recommendation approach that, given a current exploration step Xcurr={Q,V}, searches top-k similar exploration steps in GMU and considers children of those as interesting next exploration steps to be recommended. We further discuss

Integrating content-based and collaborative recommendations

This section describes how we integrate the two types of query recommendations in our system. As discussed in Section 2.2, we display our integrated recommendations in the form of an impact matrix. As a reminder, each row represents an interesting set of values Li of an attribute ai, each column represents a query type. Together with the query Q of the current exploration step, this information allows us to construct the recommended queries underlying each cell. Cell colors translate

Implementation and evaluation

We implemented the methods presented in this paper in a system prototype, which we call EVLIN++. It extends our EVLIN system [6], [7] to integrate collaborative recommendations. We use our implementation to quantitatively evaluate the performance of (i) the evolution provenance aggregator, (ii) the collaborative recommender, and (iii) the recommendation scoring when combining both content-based and collaborative recommendations. In addition, based on a user study, we compare EVLIN++ with EVLIN

Related work

In the following, we briefly review the most relevant research areas related to our two main contributions: the merge of evolution provenance graphs and the collaborative recommendation.

Collaborative-filtering query recommendations. Collaborative-filtering recommendations have recently gained interest in the database community, especially to cope with the problem of querying and analyzing databases or data warehouses, e.g., [5], [20], [21]. Such systems are beneficial in the context of data

Conclusion

This paper extends our visual interactive data exploration system EVLIN by a novel, collaborative-filtering recommendation framework that leverages evolution provenance collected from many previous users’ exploration sessions. In particular, it discussed how to merge graphs of individual exploration sessions into a multi-user graph. This latter is used to compute and rank query recommendations. Experiments validated the effectiveness and the efficiency of our proposed methods. Some points for

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This research is funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) — Projektnummer 251654672 – TRR 161.

References (26)

  • IrvingR.W.

    Stable marriage and indifference

    Discrete Appl. Math.

    (1994)
  • F. Psallidas, E. Wu, Provenance for interactive visualizations, in: Proceedings of the SIGMOD Workshop on...
  • B. Tang, S. Han, M.L. Yiu, R. Ding, D. Zhang, Extracting top-k insights from multi-dimensional data, in: Proceedings of...
  • VartakM. et al.

    SeeDB: Efficient data-driven visualization recommendations to support visual analytics

    Proc. VLDB Endow.

    (2015)
  • K. Wongsuphasawat, Z. Qu, D. Moritz, R. Chang, F. Ouk, A. Anand, J. Mackinlay, B. Howe, J. Heer, Voyager 2: Augmenting...
  • T. Milo, A. Somech, Next-step suggestions for modern interactive data analysis platforms, in: Proceedings of the ACM...
  • H. Ben Lahmar, M. Herschel, M. Blumenschein, D.A. Keim, Provenance-based visual data exploration with EVLIN, in:...
  • H. Ben Lahmar, M. Herschel, Provenance-based recommendations for visual data exploration, in: USENI Workshop on Theory...
  • H. Ben Lahmar, M. Herschel, Towards integrating collaborative filtering in visual data exploration systems, in:...
  • BernáA.G. et al.

    New insights into the suitability of the third dimension for visualizing multivariate/multidimensional data: A study based on loss of quality quantification

    Inf. Vis.

    (2016)
  • H. Piringer, R. Kosara, H. Hauser, Interactive focus+ context visualization with linked 2D/3D scatterplots, in:...
  • GratzlS. et al.

    From visual exploration to storytelling and back again

    Comput. Graph. Forum

    (2016)
  • FanW. et al.

    Graph homomorphism revisited for graph matching

    Proc. VLDB Endow.

    (2010)
  • Cited by (1)

    View full text