当前位置: X-MOL 学术arXiv.cs.DB › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
MCRapper: Monte-Carlo Rademacher Averages for Poset Families and Approximate Pattern Mining
arXiv - CS - Databases Pub Date : 2020-06-16 , DOI: arxiv-2006.09085
Leonardo Pellegrina, Cyrus Cousins, Fabio Vandin, Matteo Riondato

We present MCRapper, an algorithm for efficient computation of Monte-Carlo Empirical Rademacher Averages (MCERA) for families of functions exhibiting poset (e.g., lattice) structure, such as those that arise in many pattern mining tasks. The MCERA allows us to compute upper bounds to the maximum deviation of sample means from their expectations, thus it can be used to find both statistically-significant functions (i.e., patterns) when the available data is seen as a sample from an unknown distribution, and approximations of collections of high-expectation functions (e.g., frequent patterns) when the available data is a small sample from a large dataset. This feature is a strong improvement over previously proposed solutions that could only achieve one of the two. MCRapper uses upper bounds to the discrepancy of the functions to efficiently explore and prune the search space, a technique borrowed from pattern mining itself. To show the practical use of MCRapper, we employ it to develop an algorithm TFP-R for the task of True Frequent Pattern (TFP) mining. TFP-R gives guarantees on the probability of including any false positives (precision) and exhibits higher statistical power (recall) than existing methods offering the same guarantees. We evaluate MCRapper and TFP-R and show that they outperform the state-of-the-art for their respective tasks.

中文翻译:

MCRapper:Poset 族的 Monte-Carlo Rademacher 平均值和近似模式挖掘

我们提出了 MCRapper,这是一种有效计算蒙特卡罗经验拉德马赫平均 (MCERA) 的算法,用于展示偏集(例如,格)结构的函数族,例如在许多模式挖掘任务中出现的那些。MCERA 允许我们计算样本均值与其预期的最大偏差的上限,因此当可用数据被视为来自未知分布的样本时,它可用于找到具有统计意义的函数(即模式),当可用数据是来自大数据集的小样本时,高期望函数(例如,频繁模式)的集合的近似值。与之前提出的只能实现两者之一的解决方案相比,此功能有了很大的改进。MCRapper 使用函数差异的上限来有效地探索和修剪搜索空间,这是一种从模式挖掘本身借用的技术。为了展示 MCRapper 的实际用途,我们使用它来开发一种算法 TFP-R,用于真正频繁模式 (TFP) 挖掘的任务。TFP-R 保证了包含任何误报(精度)的概率,并且比提供相同保证的现有方法具有更高的统计能力(召回率)。我们评估了 MCRapper 和 TFP-R,并表明它们在各自任务中的表现优于最先进的技术。TFP-R 保证了包含任何误报(精度)的概率,并且比提供相同保证的现有方法具有更高的统计能力(召回率)。我们评估了 MCRapper 和 TFP-R,并表明它们在各自任务中的表现优于最先进的技术。TFP-R 保证了包含任何误报(精度)的概率,并且比提供相同保证的现有方法具有更高的统计能力(召回率)。我们评估了 MCRapper 和 TFP-R,并表明它们在各自任务中的表现优于最先进的技术。
更新日期:2020-06-17
down
wechat
bug