An improved parametrization and analysis of the EXP3++ algorithm for stochastic and adversarial bandits

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Seldin, Yevgeny
Gábor Lugosi

We present a new strategy for gap estimation in randomized algorithms for multiarmed bandits and combine it with the EXP3++ algorithm of Seldin and Slivkins (2014). In the stochastic regime the strategy reduces dependence of regret on a time horizon from $(ln t)^3$ to $(ln t)^2$ and eliminates an additive factor of order $\Delta e^{\Delta^2}$, where $\Delta$ is the minimal gap of a problem instance. In the adversarial regime regret guarantee remains unchanged.

Original language	English
Title of host publication	Proceedings of Conference on Learning Theory, 7-10 July 2017, Amsterdam, Netherlands
Editors	Satyen Kale, Ohad Shamir
Publisher	Proceedings of Machine Learning Research
Publication date	2017
Pages	1743-1759
Publication status	Published - 2017
Event	The 30th Annual Conference on Learning Theory (COLT) - Amsterdam, Netherlands Duration: 7 Jul 2017 → 10 Jul 2017 Conference number: 30 http://www.learningtheory.org/colt2017/

Conference

Conference	The 30th Annual Conference on Learning Theory (COLT)
Nummer	30
Land	Netherlands
By	Amsterdam
Periode	07/07/2017 → 10/07/2017
Internetadresse	http://www.learningtheory.org/colt2017/

Series	Proceedings of Machine Learning Research
Volume	65
ISSN	1938-7228

Research

An improved parametrization and analysis of the EXP3++ algorithm for stochastic and adversarial bandits

Conference

Links