A sample decreasing threshold greedy-based algorithm for big data summarisation,Journal of Big Data

当前位置： X-MOL 学术 › J. Big Data › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A sample decreasing threshold greedy-based algorithm for big data summarisation
Journal of Big Data ( IF 8.6 ) Pub Date : 2021-02-09 , DOI: 10.1186/s40537-021-00416-y
Teng Li , Hyo-Sang Shin , Antonios Tsourdos

As the scale of datasets used for big data applications expands rapidly, there have been increased efforts to develop faster algorithms. This paper addresses big data summarisation problems using the submodular maximisation approach and proposes an efficient algorithm for maximising general non-negative submodular objective functions subject to k-extendible system constraints. Leveraging a random sampling process and a decreasing threshold strategy, this work proposes an algorithm, named Sample Decreasing Threshold Greedy (SDTG). The proposed algorithm obtains an expected approximation guarantee of \(\frac{1}{1+k}-\epsilon \) for maximising monotone submodular functions and of \(\frac{k}{(1+k)^2}-\epsilon \) in non-monotone cases with expected computational complexity of \(O\left(\frac{n}{(1+k)\epsilon }\ln \frac{r}{\epsilon }\right)\). Here, r is the largest size of feasible solutions, and \(\epsilon \in \left(0, \frac{1}{1+k}\right)\) is an adjustable designing parameter for the trade-off between the approximation ratio and the computational complexity. The performance of the proposed algorithm is validated and compared with that of benchmark algorithms through experiments with a movie recommendation system based on a real database.

中文翻译：

基于样本递减阈值贪婪算法的大数据汇总

随着用于大数据应用程序的数据集规模迅速扩大，人们越来越多地努力开发更快的算法。本文使用子模最大化方法解决了大数据汇总问题，并提出了一种有效的算法来最大化受k可扩展系统约束的一般非负子模目标函数。利用随机采样过程和递减阈值策略，这项工作提出了一种算法，称为样本递减阈值贪婪（SDTG）。所提出的算法获得了用于最大化单调子模函数的\（\ frac {1} {1 + k}-\ epsilon \）和\（\ frac {k} {（1 + k）^ 2}-的期望近似保证。\ epsilon \）在非单调情况下，预期计算复杂度为\（O \ left（\ frac {n} {（1 + k）\ epsilon} \ ln \ frac {r} {\ epsilon} \ right）\）。在此，r是可行解的最大大小，\（\ epsilon \ in \ left（0，\ frac {1} {1 + k} \ right）\）是可调整设计参数，可在近似比和计算复杂度。通过基于真实数据库的电影推荐系统的实验，验证了所提算法的性能，并将其与基准算法的性能进行了比较。

更新日期：2021-02-09

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文