Understanding the performance of scientific software on parallel architectures
The term "peak performance" of a computer is often misleading. For example, an Intel CPU core of 2.5GHz is theoretically capable of 10 billion floating-point operations per second, but the actual performance of your application is nowhere near this peak. Why?
Memory hirearchy is of utter importance for the performance of scientific software
The answer to the above question can be that your application is not possible to run faster, because there are other limitations on modern computers than the clock rate. For example, the typical bottleneck is the memory system. Multi-core and many-core architectures offer even greater peak performance, which is however even more difficult to achieve. So when and how do you know that the performance of your application is as good as it should be?
- Aim of project: Simple and yet sufficiently accurate performance prediction models for a set of numerical computations. The targeted hardward are multi-core and many-core architectures.
- Approach: The master student is to first sort out the most important hardware features with respect to performance. These include at least (1) data transfer rate between L1 cache and registers, (2) bandwidth between main memory and last-level caches, (3) sharing and contention of the last-level caches among several CPU cores, (4) clock rate of the CPU cores. Then, performance prediction models are to be set up for several well-known numerical computation kernels, verifying the understanding so far. Thereafter, performance models are to be developed for a couple of real-world applications. On the basis of the performance models, necessary performance improvements are to be carried out.
- Learning outcome: After finishing the thesis, the master student is expected to get in-depth knowledge of state-of-the-art multi-core and many-core hardware, while becoming good at both theoretical performance analysis and hands-on performance optimizations.