Factored Bandits

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

We introduce the factored bandits model, which is a framework for learning with limited (bandit) feedback, where actions can be decomposed into a Cartesian product of atomic actions. Factored bandits incorporate rank-1 bandits as a special case, but significantly relax the assumptions on the form of the reward function. We provide an anytime algorithm for stochastic factored bandits and up to constants matching upper and lower regret bounds for the problem. Furthermore, we show that with a slight modification the proposed algorithm can be applied to utility based dueling bandits. We obtain an improvement in the additive terms of the regret bound compared to state of the art algorithms (the additive terms are dominating up to time horizons which are exponential in the number of arms).
Original languageEnglish
Title of host publicationProceedings of 32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montréal, Canada.
Number of pages10
PublisherNIPS Proceedings
Publication date2018
Publication statusPublished - 2018
Event32nd Annual Conference on Neural Information Processing Systems - Montreal, Montreal, Canada
Duration: 2 Dec 20188 Dec 2018
Conference number: 32


Conference32nd Annual Conference on Neural Information Processing Systems
SeriesAdvances in Neural Information Processing Systems

ID: 225479776