Adaptive distributional extensions to DFR ranking
Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review
Standard
Adaptive distributional extensions to DFR ranking. / Petersen, Casper; Simonsen, Jakob Grue; Järvelin, Kalervo; Lioma, Christina.
Proceedings of the 25th ACM International Conference on Information and Knowledge Management. Association for Computing Machinery, 2016. p. 2005-2008.Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - GEN
T1 - Adaptive distributional extensions to DFR ranking
AU - Petersen, Casper
AU - Simonsen, Jakob Grue
AU - Järvelin, Kalervo
AU - Lioma, Christina
N1 - Conference code: 25
PY - 2016
Y1 - 2016
N2 - Divergence From Randomness (DFR) ranking models assume that informative terms are distributed in a corpus differently than non-informative terms. Different statistical models (e.g. Poisson, geometric) are used to model the distribution of non-informative terms, producing different DFR models. An informative term is then detected by measuring the divergence of its distribution from the distribution of non-informative terms. However, there is little empirical evidence that the distributions of non-informative terms used in DFR actually fit current datasets. Practically this risks providing a poor separation between informative and non-informative terms, thus compromising the discriminative power of the ranking model. We present a novel extension to DFR, which first detects the best-fitting distribution of non-informative terms in a collection, and then adapts the ranking computation to this best-fitting distribution. We call this model Adaptive Distributional Ranking (ADR) because it adapts the ranking to the statistics of the specific dataset being processed each time. Experiments on TREC data show ADR to outperform DFR models (and their extensions) and be comparable in performance to a query likelihood language model (LM).
AB - Divergence From Randomness (DFR) ranking models assume that informative terms are distributed in a corpus differently than non-informative terms. Different statistical models (e.g. Poisson, geometric) are used to model the distribution of non-informative terms, producing different DFR models. An informative term is then detected by measuring the divergence of its distribution from the distribution of non-informative terms. However, there is little empirical evidence that the distributions of non-informative terms used in DFR actually fit current datasets. Practically this risks providing a poor separation between informative and non-informative terms, thus compromising the discriminative power of the ranking model. We present a novel extension to DFR, which first detects the best-fitting distribution of non-informative terms in a collection, and then adapts the ranking computation to this best-fitting distribution. We call this model Adaptive Distributional Ranking (ADR) because it adapts the ranking to the statistics of the specific dataset being processed each time. Experiments on TREC data show ADR to outperform DFR models (and their extensions) and be comparable in performance to a query likelihood language model (LM).
KW - cs.IR
U2 - 10.1145/2983323.2983895
DO - 10.1145/2983323.2983895
M3 - Article in proceedings
SP - 2005
EP - 2008
BT - Proceedings of the 25th ACM International Conference on Information and Knowledge Management
PB - Association for Computing Machinery
Y2 - 24 October 2016 through 28 October 2016
ER -
ID: 167474998