当前位置: X-MOL 学术ACM Trans. Knowl. Discov. Data › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Large-scale Data Exploration Using Explanatory Regression Functions
ACM Transactions on Knowledge Discovery from Data ( IF 4.0 ) Pub Date : 2020-09-29 , DOI: 10.1145/3410448
Fotis Savva 1 , Christos Anagnostopoulos 1 , Peter Triantafillou 2 , Kostas Kolomvatsos 3
Affiliation  

Analysts wishing to explore multivariate data spaces, typically issue queries involving selection operators, i.e., range or equality predicates, which define data subspaces of potential interest. Then, they use aggregation functions, the results of which determine a subspace’s interestingness for further exploration and deeper analysis. However, Aggregate Query (AQ) results are scalars and convey limited information and explainability about the queried subspaces for enhanced exploratory analysis. Analysts have no way of identifying how these results are derived or how they change w.r.t query (input) parameter values. We address this shortcoming by aiding analysts to explore and understand data subspaces by contributing a novel explanation mechanism based on machine learning. We explain AQ results using functions obtained by a three-fold joint optimization problem which assume the form of explainable piecewise-linear regression functions. A key feature of the proposed solution is that the explanation functions are estimated using past executed queries. These queries provide a coarse grained overview of the underlying aggregate function (generating the AQ results) to be learned. Explanations for future, previously unseen AQs can be computed without accessing the underlying data and can be used to further explore the queried data subspaces, without issuing more queries to the backend analytics engine. We evaluate the explanation accuracy and efficiency through theoretically grounded metrics over real-world and synthetic datasets and query workloads.

中文翻译:

使用解释性回归函数进行大规模数据探索

希望探索多元数据空间的分析人员通常会发出涉及选择运算符的查询,即范围或等式谓词,它们定义了可能感兴趣的数据子空间。然后,他们使用聚合函数,其结果决定了子空间的兴趣,以便进一步探索和更深入的分析。但是,聚合查询 (AQ) 结果是标量,并传达了有关查询子空间的有限信息和可解释性,以增强探索性分析。分析师无法确定这些结果是如何得出的,或者它们如何更改查询(输入)参数值。我们通过提供一种基于机器学习的新颖解释机制来帮助分析师探索和理解数据子空间,从而解决了这一缺点。我们使用由三重联合优化问题获得的函数来解释 AQ 结果,该问题采用可解释的分段线性回归函数的形式。所提出的解决方案的一个关键特征是解释函数是使用过去执行的查询来估计的。这些查询提供了要学习的底层聚合函数(生成 AQ 结果)的粗粒度概述。可以在不访问基础数据的情况下计算对未来、以前未见过的 AQ 的解释,并可用于进一步探索查询的数据子空间,而无需向后端分析引擎发出更多查询。我们通过现实世界和合成数据集和查询工作负载的理论基础指标来评估解释的准确性和效率。
更新日期:2020-09-29
down
wechat
bug