A lightweight in-place implementation for software thread-level speculation
Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review
Standard
A lightweight in-place implementation for software thread-level speculation. / Oancea, Cosmin Eugen; Mycroft, Alan; Harris, Tim.
Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures.: SPAA'09. ACM, 2009. p. 223-232.Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - GEN
T1 - A lightweight in-place implementation for software thread-level speculation
AU - Oancea, Cosmin Eugen
AU - Mycroft, Alan
AU - Harris, Tim
N1 - @inproceedings{Oancea:2009:LII:1583991.1584050, author = {Oancea, Cosmin E. and Mycroft, Alan and Harris, Tim}, title = {A Lightweight In-place Implementation for Software Thread-level Speculation}, booktitle = {Proceedings of the Twenty-first Annual Symposium on Parallelism in Algorithms and Architectures}, series = {SPAA '09}, year = {2009}, isbn = {978-1-60558-606-9}, location = {Calgary, AB, Canada}, pages = {223--232}, numpages = {10}, url = {http://doi.acm.org/10.1145/1583991.1584050}, doi = {10.1145/1583991.1584050}, acmid = {1584050}, publisher = {ACM}, address = {New York, NY, USA}, keywords = {roll-back, thread-level speculation (tls)}, }
PY - 2009
Y1 - 2009
N2 - Thread-level speculation (TLS) is a technique that allows parts of a sequential program to be executed in parallel. TLS ensures the parallel program's behaviour remains true to the language's original sequential semantics; for example, allowing multiple iterations of a loop to run in parallel if there are no conflicts between them.Conventional software-TLS algorithms detect conflicts dynamically. They suffer from a number of problems. TLS implementations can impose large storage overheads caused by buffering speculative work. TLS implementations can offer disappointing scalability, if threads can only commit speculative work back to the "real" heap sequentially. TLS implementations can be slow because speculative reads must consult look-aside tables to see earlier speculative writes, or because speculative operations replace normal reads and writes with expensive synchronisation primitives (e.g. CAS or memory fences).We present a streamlined software-TLS algorithm for mostly-parallel loops that aims to avoid these problems. We allow speculative work to be performed in place, so we avoid buffering, and so that reads naturally see earlier writes. We avoid needing a serial-commit protocol. We avoid the need for CAS or memory fences in common operations. We strive to reduce the size of TLS-related conflict-detection state, and to interact well with typical data-cache implementations. We evaluate our implementation on off-the-shelf hardware using seven applications from SciMark2, BYTEmark and JOlden. We achieve an average 77% of the speed-up of manually-parallelized versions of the benchmarks for fully parallel loops. We achieve a maximum of a 5.8x speed-up on an 8-core machine.
AB - Thread-level speculation (TLS) is a technique that allows parts of a sequential program to be executed in parallel. TLS ensures the parallel program's behaviour remains true to the language's original sequential semantics; for example, allowing multiple iterations of a loop to run in parallel if there are no conflicts between them.Conventional software-TLS algorithms detect conflicts dynamically. They suffer from a number of problems. TLS implementations can impose large storage overheads caused by buffering speculative work. TLS implementations can offer disappointing scalability, if threads can only commit speculative work back to the "real" heap sequentially. TLS implementations can be slow because speculative reads must consult look-aside tables to see earlier speculative writes, or because speculative operations replace normal reads and writes with expensive synchronisation primitives (e.g. CAS or memory fences).We present a streamlined software-TLS algorithm for mostly-parallel loops that aims to avoid these problems. We allow speculative work to be performed in place, so we avoid buffering, and so that reads naturally see earlier writes. We avoid needing a serial-commit protocol. We avoid the need for CAS or memory fences in common operations. We strive to reduce the size of TLS-related conflict-detection state, and to interact well with typical data-cache implementations. We evaluate our implementation on off-the-shelf hardware using seven applications from SciMark2, BYTEmark and JOlden. We achieve an average 77% of the speed-up of manually-parallelized versions of the benchmarks for fully parallel loops. We achieve a maximum of a 5.8x speed-up on an 8-core machine.
U2 - 10.1145/1583991.1584050
DO - 10.1145/1583991.1584050
M3 - Article in proceedings
SN - 978-1-60558-606-9
SP - 223
EP - 232
BT - Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures.
PB - ACM
ER -
ID: 164443450