OpenMP is a well-established standard for programming shared-memory parallel computers. Since 2013, the latest version of OpenMP (4.0 or later) has included the possiblity of utilizing heterogeneous computing systems that are made up of multicore CPUs and accelerators (such as GPUs and many-integratred-core coprocessors). We want to examine how typical numerical algorithms should be re-implemented using OpenMP-4, with the aim of using heterogeneous computing systems.
OpenMP 4.0 presents a large leap in shared-memory programming
What you will do:
Learn to implement simple numerical applications using OpenMP-4 on regular shared-memory CPU based systems, and gather knowledge about the differences between OpenMP-4 and the earlier versions of OpenMP.
Extend the above implementations to a heterogeneous system with both multicore CPUs and GPUs.
Study the obtained performance by comparing with the well-known compiler directives OpenACC.
Compare the obtained OpenMP-4 performance with hand-coded CPU-GPU implementation.