Algorithms and estimators for summarization of unaggregated data streams
Research output: Contribution to journal › Journal article › Research › peer-review
Standard
Algorithms and estimators for summarization of unaggregated data streams. / Cohen, Edith; Duffield, Nick; Kaplan, Haim; Lund, Carstent; Thorup, Mikkel.
In: Journal of Computer and System Sciences, Vol. 80, No. 7, 2014, p. 1214-1244.Research output: Contribution to journal › Journal article › Research › peer-review
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - JOUR
T1 - Algorithms and estimators for summarization of unaggregated data streams
AU - Cohen, Edith
AU - Duffield, Nick
AU - Kaplan, Haim
AU - Lund, Carstent
AU - Thorup, Mikkel
PY - 2014
Y1 - 2014
N2 - Abstract Statistical summaries of IP traffic are at the heart of network operation and are used to recover aggregate information on subpopulations of flows. It is therefore of great importance to collect the most accurate and informative summaries given the router's resource constraints. A summarization algorithm, such as Cisco's sampled NetFlow, is applied to IP packet streams that consist of multiple interleaving IP flows. We develop sampling algorithms and unbiased estimators which address sources of inefficiency in current methods. First, we design tunable algorithms whereas currently a single parameter (the sampling rate) controls utilization of both memory and processing/access speed (which means that it has to be set according to the bottleneck resource). Second, we make a better use of the memory hierarchy, which involves exporting partial summaries to slower storage during the measurement period.
AB - Abstract Statistical summaries of IP traffic are at the heart of network operation and are used to recover aggregate information on subpopulations of flows. It is therefore of great importance to collect the most accurate and informative summaries given the router's resource constraints. A summarization algorithm, such as Cisco's sampled NetFlow, is applied to IP packet streams that consist of multiple interleaving IP flows. We develop sampling algorithms and unbiased estimators which address sources of inefficiency in current methods. First, we design tunable algorithms whereas currently a single parameter (the sampling rate) controls utilization of both memory and processing/access speed (which means that it has to be set according to the bottleneck resource). Second, we make a better use of the memory hierarchy, which involves exporting partial summaries to slower storage during the measurement period.
KW - NetFlow
KW - Data streams
KW - Random sampling
KW - IP flows
KW - Subpopulation queries
KW - Flow size distribution
U2 - 10.1016/j.jcss.2014.04.009
DO - 10.1016/j.jcss.2014.04.009
M3 - Tidsskriftartikel
VL - 80
SP - 1214
EP - 1244
JO - Journal of Computer and System Sciences
JF - Journal of Computer and System Sciences
SN - 0022-0000
IS - 7
ER -
ID: 130285400