Near-Optimal Data Source Selection for Bayesian Learning,arXiv - CS - Information Theory

当前位置： X-MOL 学术 › arXiv.cs.IT › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Near-Optimal Data Source Selection for Bayesian Learning
arXiv - CS - Information Theory Pub Date : 2020-11-21 , DOI: arxiv-2011.10712
Lintao Ye, Aritra Mitra, Shreyas Sundaram

We study a fundamental problem in Bayesian learning, where the goal is to select a set of data sources with minimum cost while achieving a certain learning performance based on the data streams provided by the selected data sources. First, we show that the data source selection problem for Bayesian learning is NP-hard. We then show that the data source selection problem can be transformed into an instance of the submodular set covering problem studied in the literature, and provide a standard greedy algorithm to solve the data source selection problem with provable performance guarantees. Next, we propose a fast greedy algorithm that improves the running times of the standard greedy algorithm, while achieving performance guarantees that are comparable to those of the standard greedy algorithm. We provide insights into the performance guarantees of the greedy algorithms by analyzing special classes of the problem. Finally, we validate the theoretical results using numerical examples, and show that the greedy algorithms work well in practice.

中文翻译：

贝叶斯学习的最佳数据源选择

我们研究贝叶斯学习中的一个基本问题，即目标是基于选定数据源提供的数据流，以最低的成本选择一组数据源，同时实现一定的学习性能。首先，我们证明了贝叶斯学习的数据源选择问题是NP难的。然后，我们表明可以将数据源选择问题转化为文献中研究的亚模集覆盖问题的实例，并提供标准的贪婪算法来解决具有可证明的性能保证的数据源选择问题。接下来，我们提出了一种快速贪婪算法，该算法可改善标准贪婪算法的运行时间，同时实现与标准贪婪算法可比的性能保证。我们通过分析问题的特殊类别来提供有关贪婪算法性能保证的见解。最后，我们使用数值例子验证了理论结果，并表明贪婪算法在实践中效果很好。

更新日期：2020-11-25

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>