Algorithms and estimators for summarization of unaggregated data streams

Research output: Contribution to journalJournal articleResearchpeer-review

Standard

Algorithms and estimators for summarization of unaggregated data streams. / Cohen, Edith; Duffield, Nick; Kaplan, Haim; Lund, Carstent; Thorup, Mikkel.

In: Journal of Computer and System Sciences, Vol. 80, No. 7, 2014, p. 1214-1244.

Research output: Contribution to journalJournal articleResearchpeer-review

Harvard

Cohen, E, Duffield, N, Kaplan, H, Lund, C & Thorup, M 2014, 'Algorithms and estimators for summarization of unaggregated data streams', Journal of Computer and System Sciences, vol. 80, no. 7, pp. 1214-1244. https://doi.org/10.1016/j.jcss.2014.04.009

APA

Cohen, E., Duffield, N., Kaplan, H., Lund, C., & Thorup, M. (2014). Algorithms and estimators for summarization of unaggregated data streams. Journal of Computer and System Sciences, 80(7), 1214-1244. https://doi.org/10.1016/j.jcss.2014.04.009

Vancouver

Cohen E, Duffield N, Kaplan H, Lund C, Thorup M. Algorithms and estimators for summarization of unaggregated data streams. Journal of Computer and System Sciences. 2014;80(7):1214-1244. https://doi.org/10.1016/j.jcss.2014.04.009

Author

Cohen, Edith ; Duffield, Nick ; Kaplan, Haim ; Lund, Carstent ; Thorup, Mikkel. / Algorithms and estimators for summarization of unaggregated data streams. In: Journal of Computer and System Sciences. 2014 ; Vol. 80, No. 7. pp. 1214-1244.

Bibtex

@article{e63bee21fa3b4a248cdaa9c82abeff76,
title = "Algorithms and estimators for summarization of unaggregated data streams",
abstract = "Abstract Statistical summaries of IP traffic are at the heart of network operation and are used to recover aggregate information on subpopulations of flows. It is therefore of great importance to collect the most accurate and informative summaries given the router's resource constraints. A summarization algorithm, such as Cisco's sampled NetFlow, is applied to IP packet streams that consist of multiple interleaving IP flows. We develop sampling algorithms and unbiased estimators which address sources of inefficiency in current methods. First, we design tunable algorithms whereas currently a single parameter (the sampling rate) controls utilization of both memory and processing/access speed (which means that it has to be set according to the bottleneck resource). Second, we make a better use of the memory hierarchy, which involves exporting partial summaries to slower storage during the measurement period.",
keywords = "NetFlow, Data streams, Random sampling, IP flows, Subpopulation queries, Flow size distribution",
author = "Edith Cohen and Nick Duffield and Haim Kaplan and Carstent Lund and Mikkel Thorup",
year = "2014",
doi = "10.1016/j.jcss.2014.04.009",
language = "Dansk",
volume = "80",
pages = "1214--1244",
journal = "Journal of Computer and System Sciences",
issn = "0022-0000",
publisher = "Academic Press",
number = "7",

}

RIS

TY - JOUR

T1 - Algorithms and estimators for summarization of unaggregated data streams

AU - Cohen, Edith

AU - Duffield, Nick

AU - Kaplan, Haim

AU - Lund, Carstent

AU - Thorup, Mikkel

PY - 2014

Y1 - 2014

N2 - Abstract Statistical summaries of IP traffic are at the heart of network operation and are used to recover aggregate information on subpopulations of flows. It is therefore of great importance to collect the most accurate and informative summaries given the router's resource constraints. A summarization algorithm, such as Cisco's sampled NetFlow, is applied to IP packet streams that consist of multiple interleaving IP flows. We develop sampling algorithms and unbiased estimators which address sources of inefficiency in current methods. First, we design tunable algorithms whereas currently a single parameter (the sampling rate) controls utilization of both memory and processing/access speed (which means that it has to be set according to the bottleneck resource). Second, we make a better use of the memory hierarchy, which involves exporting partial summaries to slower storage during the measurement period.

AB - Abstract Statistical summaries of IP traffic are at the heart of network operation and are used to recover aggregate information on subpopulations of flows. It is therefore of great importance to collect the most accurate and informative summaries given the router's resource constraints. A summarization algorithm, such as Cisco's sampled NetFlow, is applied to IP packet streams that consist of multiple interleaving IP flows. We develop sampling algorithms and unbiased estimators which address sources of inefficiency in current methods. First, we design tunable algorithms whereas currently a single parameter (the sampling rate) controls utilization of both memory and processing/access speed (which means that it has to be set according to the bottleneck resource). Second, we make a better use of the memory hierarchy, which involves exporting partial summaries to slower storage during the measurement period.

KW - NetFlow

KW - Data streams

KW - Random sampling

KW - IP flows

KW - Subpopulation queries

KW - Flow size distribution

U2 - 10.1016/j.jcss.2014.04.009

DO - 10.1016/j.jcss.2014.04.009

M3 - Tidsskriftartikel

VL - 80

SP - 1214

EP - 1244

JO - Journal of Computer and System Sciences

JF - Journal of Computer and System Sciences

SN - 0022-0000

IS - 7

ER -

ID: 130285400