当前位置: X-MOL 学术arXiv.cs.DS › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Learning Based Distributed Tracking
arXiv - CS - Data Structures and Algorithms Pub Date : 2020-06-23 , DOI: arxiv-2006.12943
Hao Wu, Junhao Gan, Rui Zhang

Inspired by the great success of machine learning in the past decade, people have been thinking about the possibility of improving the theoretical results by exploring data distribution. In this paper, we revisit a fundamental problem called Distributed Tracking (DT) under an assumption that the data follows a certain (known or unknown) distribution, and propose a number data-dependent algorithms with improved theoretical bounds. Informally, in the DT problem, there is a coordinator and k players, where the coordinator holds a threshold N and each player has a counter. At each time stamp, at most one counter can be increased by one. The job of the coordinator is to capture the exact moment when the sum of all these k counters reaches N. The goal is to minimise the communication cost. While our first type of algorithms assume the concrete data distribution is known in advance, our second type of algorithms can learn the distribution on the fly. Both of the algorithms achieve a communication cost bounded byO(k log log N) with high probability, improving the state-of-the-art data-independent bound O(k log N/k). We further propose a number of implementation optimisation heuristics to improve both efficiency and robustness of the algorithms. Finally, we conduct extensive experiments on three real datasets and four synthetic datasets. The experimental results show that the communication cost of our algorithms is as least as 20% of that of the state-of-the-art algorithms.

中文翻译:

基于学习的分布式跟踪

在过去十年机器学习取得巨大成功的启发下,人们一直在思考通过探索数据分布来改进理论结果的可能性。在本文中,我们在假设数据遵循某个(已知或未知)分布的情况下重新审视称为分布式跟踪 (DT) 的基本问题,并提出了具有改进理论界限的数字依赖算法。非正式地,在 DT 问题中,有一个协调器和 k 个玩家,其中协调器持有阈值 N,每个玩家都有一个计数器。每个时间戳最多可以增加一个计数器。协调器的工作是捕捉所有这 k 个计数器的总和达到 N 的确切时刻。目标是最小化通信成本。虽然我们的第一类算法假设具体的数据分布是预先知道的,但我们的第二类算法可以动态学习分布。这两种算法都以高概率实现了以 O(k log log N) 为界的通信成本,从而改进了最先进的数据无关边界 O(k log N/k)。我们进一步提出了一些实现优化启发式方法,以提高算法的效率和鲁棒性。最后,我们对三个真实数据集和四个合成数据集进行了广泛的实验。实验结果表明,我们算法的通信成本至少是最先进算法的 20%。这两种算法都以高概率实现了以 O(k log log N) 为界的通信成本,从而改进了最先进的数据无关边界 O(k log N/k)。我们进一步提出了一些实现优化启发式方法,以提高算法的效率和鲁棒性。最后,我们对三个真实数据集和四个合成数据集进行了广泛的实验。实验结果表明,我们算法的通信成本至少是最先进算法的 20%。这两种算法都以高概率实现了以 O(k log log N) 为界的通信成本,从而改进了最先进的数据无关边界 O(k log N/k)。我们进一步提出了一些实现优化启发式方法,以提高算法的效率和鲁棒性。最后,我们对三个真实数据集和四个合成数据集进行了广泛的实验。实验结果表明,我们算法的通信成本至少是最先进算法的 20%。
更新日期:2020-06-24
down
wechat
bug