Unbiased offline recommender evaluation for missing-not-at-random implicit feedback

Research output: Contribution to journal › Conference article › Research › peer-review

Standard

Unbiased offline recommender evaluation for missing-not-at-random implicit feedback. / Yang, Longqi; Wang, Chenyang; Cui, Yin; Belongie, Serge; Xuan, Yuan; Estrin, Deborah.

In: RecSys 2018 - 12th ACM Conference on Recommender Systems, 27.09.2018, p. 279-287.

Research output: Contribution to journal › Conference article › Research › peer-review

Harvard

Yang, L, Wang, C, Cui, Y, Belongie, S, Xuan, Y & Estrin, D 2018, 'Unbiased offline recommender evaluation for missing-not-at-random implicit feedback', RecSys 2018 - 12th ACM Conference on Recommender Systems, pp. 279-287. https://doi.org/10.1145/3240323.3240355

APA

Yang, L., Wang, C., Cui, Y., Belongie, S., Xuan, Y., & Estrin, D. (2018). Unbiased offline recommender evaluation for missing-not-at-random implicit feedback. RecSys 2018 - 12th ACM Conference on Recommender Systems, 279-287. https://doi.org/10.1145/3240323.3240355

Vancouver

Yang L, Wang C, Cui Y, Belongie S, Xuan Y, Estrin D. Unbiased offline recommender evaluation for missing-not-at-random implicit feedback. RecSys 2018 - 12th ACM Conference on Recommender Systems. 2018 Sep 27;279-287. https://doi.org/10.1145/3240323.3240355

Author

Yang, Longqi ; Wang, Chenyang ; Cui, Yin ; Belongie, Serge ; Xuan, Yuan ; Estrin, Deborah. / Unbiased offline recommender evaluation for missing-not-at-random implicit feedback. In: RecSys 2018 - 12th ACM Conference on Recommender Systems. 2018 ; pp. 279-287.

Bibtex

@inproceedings{19bb2c3538494d1cbcb7a6dd20860daf,

title = "Unbiased offline recommender evaluation for missing-not-at-random implicit feedback",

abstract = "Implicit-feedback Recommenders (ImplicitRec) leverage positive only user-item interactions, such as clicks, to learn personalized user preferences. Recommenders are often evaluated and compared offline using datasets collected from online platforms. These platforms are subject to popularity bias (i.e., popular items are more likely to be presented and interacted with), and therefore logged ground truth data are Missing-Not-At-Random (MNAR). As a result, the widely used Average-Over-All (AOA) evaluator is biased toward accurately recommending trendy items. In this paper, we (a) investigate evaluation bias of AOA and (b) develop an unbiased and practical offline evaluator for implicit MNAR datasets using the Inverse-Propensity-Scoring (IPS) technique. Through extensive experiments using four real-world datasets and four widely used algorithms, we show that (a) popularity bias is widely manifested in item presentation and interaction; (b) evaluation bias due to MNAR data pervasively exists in most cases where AOA is used to evaluate ImplicitRec; and (c) the unbiased estimator significantly reduces the AOA evaluation bias by more than 30% in the Yahoo! music dataset in terms of the Mean Absolute Error (MAE).",

keywords = "Bias, Evaluation, Implicit feedback, Propensity, Recommendation",

author = "Longqi Yang and Chenyang Wang and Yin Cui and Serge Belongie and Yuan Xuan and Deborah Estrin",

note = "Publisher Copyright: {\textcopyright} 2018 Association for Computing Machinery.; 12th ACM Conference on Recommender Systems, RecSys 2018 ; Conference date: 02-10-2018 Through 07-10-2018",

year = "2018",

month = sep,

day = "27",

doi = "10.1145/3240323.3240355",

language = "English",

pages = "279--287",

journal = "RecSys 2018 - 12th ACM Conference on Recommender Systems",

}

RIS

TY - GEN

T1 - Unbiased offline recommender evaluation for missing-not-at-random implicit feedback

AU - Yang, Longqi

AU - Wang, Chenyang

AU - Cui, Yin

AU - Belongie, Serge

AU - Xuan, Yuan

AU - Estrin, Deborah

PY - 2018/9/27

Y1 - 2018/9/27

N2 - Implicit-feedback Recommenders (ImplicitRec) leverage positive only user-item interactions, such as clicks, to learn personalized user preferences. Recommenders are often evaluated and compared offline using datasets collected from online platforms. These platforms are subject to popularity bias (i.e., popular items are more likely to be presented and interacted with), and therefore logged ground truth data are Missing-Not-At-Random (MNAR). As a result, the widely used Average-Over-All (AOA) evaluator is biased toward accurately recommending trendy items. In this paper, we (a) investigate evaluation bias of AOA and (b) develop an unbiased and practical offline evaluator for implicit MNAR datasets using the Inverse-Propensity-Scoring (IPS) technique. Through extensive experiments using four real-world datasets and four widely used algorithms, we show that (a) popularity bias is widely manifested in item presentation and interaction; (b) evaluation bias due to MNAR data pervasively exists in most cases where AOA is used to evaluate ImplicitRec; and (c) the unbiased estimator significantly reduces the AOA evaluation bias by more than 30% in the Yahoo! music dataset in terms of the Mean Absolute Error (MAE).

AB - Implicit-feedback Recommenders (ImplicitRec) leverage positive only user-item interactions, such as clicks, to learn personalized user preferences. Recommenders are often evaluated and compared offline using datasets collected from online platforms. These platforms are subject to popularity bias (i.e., popular items are more likely to be presented and interacted with), and therefore logged ground truth data are Missing-Not-At-Random (MNAR). As a result, the widely used Average-Over-All (AOA) evaluator is biased toward accurately recommending trendy items. In this paper, we (a) investigate evaluation bias of AOA and (b) develop an unbiased and practical offline evaluator for implicit MNAR datasets using the Inverse-Propensity-Scoring (IPS) technique. Through extensive experiments using four real-world datasets and four widely used algorithms, we show that (a) popularity bias is widely manifested in item presentation and interaction; (b) evaluation bias due to MNAR data pervasively exists in most cases where AOA is used to evaluate ImplicitRec; and (c) the unbiased estimator significantly reduces the AOA evaluation bias by more than 30% in the Yahoo! music dataset in terms of the Mean Absolute Error (MAE).

KW - Bias

KW - Evaluation

KW - Implicit feedback

KW - Propensity

KW - Recommendation

UR - http://www.scopus.com/inward/record.url?scp=85056795110&partnerID=8YFLogxK

U2 - 10.1145/3240323.3240355

DO - 10.1145/3240323.3240355

M3 - Conference article

AN - SCOPUS:85056795110

SP - 279

EP - 287

JO - RecSys 2018 - 12th ACM Conference on Recommender Systems

JF - RecSys 2018 - 12th ACM Conference on Recommender Systems

T2 - 12th ACM Conference on Recommender Systems, RecSys 2018

Y2 - 2 October 2018 through 7 October 2018

ER -

ID: 301825505