当前位置: X-MOL 学术Front. Comput. Sci. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Follow the perturbed approximate leader for solving semi-bandit combinatorial optimization
Frontiers of Computer Science ( IF 3.4 ) Pub Date : 2021-07-16 , DOI: 10.1007/s11704-020-9519-9
Feidiao Yang 1, 2 , Jialin Zhang 1, 2 , Xiaoming Sun 1, 2 , Wei Chen 3
Affiliation  

Combinatorial optimization in the face of uncertainty is a challenge in both operational research and machine learning. In this paper, we consider a special and important class called the adversarial online combinatorial optimization with semi-bandit feedback, in which a player makes combinatorial decisions and gets the corresponding feedback repeatedly. While existing algorithms focus on the regret guarantee or assume there exists an efficient offline oracle, it is still a challenge to solve this problem efficiently if the offline counterpart is NP-hard. In this paper, we propose a variant of the Follow-the-Perturbed-Leader (FPL) algorithm to solve this problem. Unlike the existing FPL approach, our method employs an approximation algorithm as an offline oracle and perturbs the collected data by adding nonnegative random variables. Our approach is simple and computationally efficient. Moreover, it can guarantee a sublinear (1 + ε)-scaled regret of order \(O(T^{\frac{2}{3}})\) for any small ε v> 0 for an important class of combinatorial optimization problems that admit an FPTAS (fully polynomial time approximation scheme), in which T is the number of rounds of the learning process. In addition to the theoretical analysis, we also conduct a series of experiments to demonstrate the performance of our algorithm.



中文翻译:

遵循扰动近似领导者求解半强盗组合优化

面对不确定性的组合优化在运筹学和机器学习中都是一个挑战。在本文中,我们考虑了一个特殊而重要的类,称为具有半强盗反馈的对抗性在线组合优化,其中玩家做出组合决策并反复获得相应的反馈。虽然现有算法侧重于后悔保证或假设存在一个高效的离线预言机,但如果离线对应物是 NP-hard,那么高效地解决这个问题仍然是一个挑战。在本文中,我们提出了一种跟随扰动领导者 (FPL) 算法的变体来解决这个问题。与现有的 FPL 方法不同,我们的方法采用近似算法作为离线预言机,并通过添加非负随机变量来扰乱收集到的数据。我们的方法简单且计算效率高。而且,它可以保证一个次线性的 (1 +ε)的顺序-scaled遗憾\(O(T ^ {\压裂{2} {3}})\)对于任何小的ε V> 0为一类重要的组合优化问题,一个接纳FPTAS(完全多项式时间近似方案),其中T是学习过程的轮数。除了理论分析之外,我们还进行了一系列实验来证明我们算法的性能。

更新日期:2021-07-16
down
wechat
bug