Collaborative filtering over evolution provenance data for interactive visual data exploration
Introduction
Interactive visual data exploration systems support users in investigating data by providing facilities to express queries and to visualize their results. In such systems, users interact with visualizations of query results, which triggers further queries to be executed and their results visualized back to the user. Clearly, interactive visual data exploration is an iterative process, where users transition from one data visualization to the next through interactions such as interactive selection or details-on-demand [1].
To assist users by leading the way in transitioning from one interesting data visualization to the next, recent visual data exploration tools have incorporated recommendations to assist users in querying and in visualizing interesting regions of the data. Most tools rely on content-based recommendation techniques that base their recommendations on data available in a user’s exploration session, such as prior queries of the same user [2], [3], [4]. Collaborative-filtering leverages information from previous explorations made by multiple users to identify prior similar exploration queries worth recommending to the current user. A first work that focuses on query recommendations based on collaborative-filtering for data exploration is [5]. However, to the best of our knowledge, no visual data exploration system blends the two aforementioned recommendation techniques.
This paper presents EVLIN++, a system that bridges this gap. Our motivation to combine both recommendation techniques is to improve the quality or effectiveness of the visual data exploration experience, as recommendations then also take into account globally interesting trends instead of solely relying on the often limited local information underlying content-based recommendations. EVLIN++ extends our EVLIN system [6], [7] that relies on content-based recommendations by integrating collaborative recommendations, as introduced in [8]. The collaborative-filtering recommendations rely on evolution provenance, which tracks user actions and selections alongside their exploration. Overall, this paper makes the following contributions:
System architecture. We present the overall architecture of EVLIN++, which, in addition to the components of the original EVLIN system, has components for storing, aggregating, and analyzing evolution provenance for the purpose of collaborative filtering. It further alters the component that quantifies the interestingness of individual recommendations, as it takes into account scores from both the content-based and the collaborative-filtering based recommendations.
Collaborative-filtering. While we have described the general idea of incorporating collaborative-filtering into EVLIN in a short paper [8], we neither discussed detailed algorithms nor optimizations. This paper describes the two steps underlying our collaborative-filtering based recommendations in detail.
In the first step, we merge a description of a user’s exploration session, i.e., its evolution provenance, into a global multi-user interaction graph based on a similarity matching between the local session graph and the global graph. For this, we discuss several techniques to incrementally merge evolution provenance graphs into a global graph. The different approaches trade off merge efficiency and effectiveness.
In the second step, we compute recommendations for next exploration steps by first searching exploration steps similar to a user’s current step in the global graph and then recommending and scoring steps adjacent to those in the global graph. This paper presents our baseline collaborative-filtering query recommendation algorithm and optimizations to improve its runtime.
Implementation and evaluation. We implement the full system and study the performance of both steps of the collaborative-filtering recommendations presented in this paper. We also evaluate the performance of the interactive visual data exploration achieved when combining our content-based and collaborative-filtering-based query recommendations on both synthetic and real data. The results show the efficiency and the effectiveness of our proposed optimized solutions for interactive visual data exploration.
Section 2 introduces the general architecture and processing of EVLIN++. Section 3 describes our evolution provenance model, providing necessary preliminaries and definitions. Merging evolution provenance graphs is discussed in Section 4. Section 5 covers collaborative-filtering recommendation computation. Section 6 discusses how we integrate collaborative-filtering recommendations and the content-based recommendations to obtain a final set of ranked recommendations. Section 7 presents our experimental evaluation. We discuss related work in Section 8 and conclude in Section 9.
Section snippets
System overview
This section introduces the general architecture of our provenance-based visual data exploration system EVLIN++ in Section 2.1. We further illustrate how the system behaves through a running example (Section 2.2).
Evolution provenance database
As we have seen, the iterative exploration process moves from one query, denoted , to another query at each exploration step. The queries, which we call exploration queries, are generally aggregation queries over data stored in a data warehouse .
Definition 1 Exploration Query Given a data warehouse with a fact table , a set of measures in , a set of dimension tables , , and a set of aggregate functions , an exploration query is a SQL query of the form
Merging of evolution provenance graphs
In this section, we discuss how the evolution provenance aggregator obtains a multi-user exploration graph. It takes as input a single exploration session graph , , and a similarity threshold . To merge into , it determines a one-to-one matching between the set of nodes of both graphs (more precisely, the queries of exploration steps) and then merges matching nodes. Note that for simplicity, we slightly abuse the notation and consider that nodes of both graphs represent queries
Collaborative-filtering recommendation computation
The multi-user graph resulting from the algorithm presented in the previous section is maintained in the evolution provenance database. This database is accessed during the collaborative recommendations computation. In this section, we first present a baseline recommendation approach that, given a current exploration step , searches top- similar exploration steps in and considers children of those as interesting next exploration steps to be recommended. We further discuss
Integrating content-based and collaborative recommendations
This section describes how we integrate the two types of query recommendations in our system. As discussed in Section 2.2, we display our integrated recommendations in the form of an impact matrix. As a reminder, each row represents an interesting set of values of an attribute , each column represents a query type. Together with the query of the current exploration step, this information allows us to construct the recommended queries underlying each cell. Cell colors translate
Implementation and evaluation
We implemented the methods presented in this paper in a system prototype, which we call EVLIN++. It extends our EVLIN system [6], [7] to integrate collaborative recommendations. We use our implementation to quantitatively evaluate the performance of (i) the evolution provenance aggregator, (ii) the collaborative recommender, and (iii) the recommendation scoring when combining both content-based and collaborative recommendations. In addition, based on a user study, we compare EVLIN++ with EVLIN
Related work
In the following, we briefly review the most relevant research areas related to our two main contributions: the merge of evolution provenance graphs and the collaborative recommendation.
Collaborative-filtering query recommendations. Collaborative-filtering recommendations have recently gained interest in the database community, especially to cope with the problem of querying and analyzing databases or data warehouses, e.g., [5], [20], [21]. Such systems are beneficial in the context of data
Conclusion
This paper extends our visual interactive data exploration system EVLIN by a novel, collaborative-filtering recommendation framework that leverages evolution provenance collected from many previous users’ exploration sessions. In particular, it discussed how to merge graphs of individual exploration sessions into a multi-user graph. This latter is used to compute and rank query recommendations. Experiments validated the effectiveness and the efficiency of our proposed methods. Some points for
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
This research is funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) — Projektnummer 251654672 – TRR 161.
References (26)
Stable marriage and indifference
Discrete Appl. Math.
(1994)- F. Psallidas, E. Wu, Provenance for interactive visualizations, in: Proceedings of the SIGMOD Workshop on...
- B. Tang, S. Han, M.L. Yiu, R. Ding, D. Zhang, Extracting top-k insights from multi-dimensional data, in: Proceedings of...
- et al.
SeeDB: Efficient data-driven visualization recommendations to support visual analytics
Proc. VLDB Endow.
(2015) - K. Wongsuphasawat, Z. Qu, D. Moritz, R. Chang, F. Ouk, A. Anand, J. Mackinlay, B. Howe, J. Heer, Voyager 2: Augmenting...
- T. Milo, A. Somech, Next-step suggestions for modern interactive data analysis platforms, in: Proceedings of the ACM...
- H. Ben Lahmar, M. Herschel, M. Blumenschein, D.A. Keim, Provenance-based visual data exploration with EVLIN, in:...
- H. Ben Lahmar, M. Herschel, Provenance-based recommendations for visual data exploration, in: USENI Workshop on Theory...
- H. Ben Lahmar, M. Herschel, Towards integrating collaborative filtering in visual data exploration systems, in:...
- et al.
New insights into the suitability of the third dimension for visualizing multivariate/multidimensional data: A study based on loss of quality quantification
Inf. Vis.
(2016)
From visual exploration to storytelling and back again
Comput. Graph. Forum
Graph homomorphism revisited for graph matching
Proc. VLDB Endow.
Cited by (1)
Provenance and social network analysis for recommender systems: a literature review
2022, International Journal of Electrical and Computer Engineering