An improved multileaving algorithm for online ranker evaluation
Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review
Standard
An improved multileaving algorithm for online ranker evaluation. / Brost, Brian; Cox, Ingemar Johansson; Seldin, Yevgeny; Lioma, Christina.
Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval: SIGIR '16. Association for Computing Machinery, 2016. p. 745-748.Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - GEN
T1 - An improved multileaving algorithm for online ranker evaluation
AU - Brost, Brian
AU - Cox, Ingemar Johansson
AU - Seldin, Yevgeny
AU - Lioma, Christina
N1 - Conference code: 39
PY - 2016
Y1 - 2016
N2 - Online ranker evaluation is a key challenge in informationretrieval. An important task in the online evaluation ofrankers is using implicit user feedback for inferring preferences between rankers. Interleaving methods have beenfound to be ecient and sensitive, i.e. they can quickly detect even small dierences in quality. It has recently beenshown that multileaving methods exhibit similar sensitivitybut can be more ecient than interleaving methods. Thispaper presents empirical results demonstrating that existing multileaving methods either do not scale well with thenumber of rankers, or, more problematically, can produceresults which substantially dier from evaluation measureslike NDCG. The latter problem is caused by the fact thatthey do not correctly account for the similarities that canoccur between rankers being multileaved. We propose a newmultileaving method for handling this problem and demonstrate that it substantially outperforms existing methods, insome cases reducing errors by as much as 50%.
AB - Online ranker evaluation is a key challenge in informationretrieval. An important task in the online evaluation ofrankers is using implicit user feedback for inferring preferences between rankers. Interleaving methods have beenfound to be ecient and sensitive, i.e. they can quickly detect even small dierences in quality. It has recently beenshown that multileaving methods exhibit similar sensitivitybut can be more ecient than interleaving methods. Thispaper presents empirical results demonstrating that existing multileaving methods either do not scale well with thenumber of rankers, or, more problematically, can produceresults which substantially dier from evaluation measureslike NDCG. The latter problem is caused by the fact thatthey do not correctly account for the similarities that canoccur between rankers being multileaved. We propose a newmultileaving method for handling this problem and demonstrate that it substantially outperforms existing methods, insome cases reducing errors by as much as 50%.
U2 - 10.1145/2911451.2914706
DO - 10.1145/2911451.2914706
M3 - Article in proceedings
SN - 978-1-4503-4069-4
SP - 745
EP - 748
BT - Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval
PB - Association for Computing Machinery
Y2 - 17 July 2016 through 21 July 2016
ER -
ID: 164440333