当前位置: X-MOL 学术Inform. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Explaining data with descriptions
Information Systems ( IF 3.7 ) Pub Date : 2020-05-04 , DOI: 10.1016/j.is.2020.101549
Matteo Paganelli , Paolo Sottovia , Antonio Maccioni , Matteo Interlandi , Francesco Guerra

With the advent of Big Data, it is impossible for a human user to properly inspect and understand data at a glance. In this paper, we introduce the problem of generating data descriptions: a set of compact, readable and insightful formulas of boolean predicates that represents a set of data records. Unfortunately, finding the best description for a dataset is both NP-hard and task-specific. Therefore, we introduce a dynamic programming approach which, in concert with a set of heuristics, allows us not only to generate descriptions at interactive speed but also to accommodate diverse user needs—from anomaly detection to data exploration. Using real datasets, we evaluate our approach both quantitatively and qualitatively, and prove that descriptions are indeed a viable and powerful tool for supporting data enthusiasts and practitioners in gaining insights from data.



中文翻译:

用描述解释数据

随着大数据的出现,人类用户一眼就不可能正确地检查和理解数据。在本文中,我们介绍了生成数据描述的问题:一组布尔谓词的紧凑,可读和有见地的公式,它们代表一组数据记录。不幸的是,为数据集找到最佳描述既难于NP,也特定于任务。因此,我们引入了一种动态编程方法,该方法与一系列试探法相结合,不仅使我们能够以交互速度生成描述,而且能够适应从异常检测到数据探索的各种用户需求。通过使用真实的数据集,我们可以定量和定性地评估我们的方法,并证明描述确实是一种可行的强大工具,可支持数据爱好者和从业人员从数据中获取见解。

更新日期:2020-05-04
down
wechat
bug