Modular Acceleration: Tricky Cases of Functional High-performance Computing
Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review
Standard
Modular Acceleration: Tricky Cases of Functional High-performance Computing. / Henriksen, Troels; Elsman, Martin; Oancea, Cosmin E.
FHPC 2018 - Proceedings of the 7th ACM SIGPLAN International Workshop on Functional High-Performance Computing, co-located with ICFP 2018. ed. / Mike Rainey; Kei Davis. New York, NY, USA : Association for Computing Machinery, 2018. p. 10-21.Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - GEN
T1 - Modular Acceleration: Tricky Cases of Functional High-performance Computing
AU - Henriksen, Troels
AU - Elsman, Martin
AU - Oancea, Cosmin E.
PY - 2018
Y1 - 2018
N2 - This case study examines the data-parallel functional implementation of three algorithms: generation of quasi-random Sobol numbers, breadth-first search, and calibration of Heston market parameters via a least-squares procedure. We show that while all these problems permit elegant functional implementations, good performance depends on subtle issues that must be confronted in both the implementations of the algorithms themselves, as well as the compiler that is responsible for ultimately generating high-performance code. In particular, we demonstrate a modular technique for generating quasi-random Sobol numbers in an efficient manner, study the efficient implementation of an irregular graph algorithm without sacrificing parallelism, and argue for the utility of nested regular data parallelism in the context of nonlinear parameter calibration.
AB - This case study examines the data-parallel functional implementation of three algorithms: generation of quasi-random Sobol numbers, breadth-first search, and calibration of Heston market parameters via a least-squares procedure. We show that while all these problems permit elegant functional implementations, good performance depends on subtle issues that must be confronted in both the implementations of the algorithms themselves, as well as the compiler that is responsible for ultimately generating high-performance code. In particular, we demonstrate a modular technique for generating quasi-random Sobol numbers in an efficient manner, study the efficient implementation of an irregular graph algorithm without sacrificing parallelism, and argue for the utility of nested regular data parallelism in the context of nonlinear parameter calibration.
KW - Compilers
KW - GPU
KW - Parallelism
UR - http://www.scopus.com/inward/record.url?scp=85056766137&partnerID=8YFLogxK
U2 - 10.1145/3264738.3264740
DO - 10.1145/3264738.3264740
M3 - Article in proceedings
SN - 978-1-4503-5813-2
SP - 10
EP - 21
BT - FHPC 2018 - Proceedings of the 7th ACM SIGPLAN International Workshop on Functional High-Performance Computing, co-located with ICFP 2018
A2 - Rainey, Mike
A2 - Davis, Kei
PB - Association for Computing Machinery
CY - New York, NY, USA
T2 - 7th ACM SIGPLAN International Workshop on Functional High-Performance Computing
Y2 - 29 September 2018 through 29 September 2018
ER -
ID: 204479272