Monday, Half Day, PM
Room A112

Title: Cache-based iterative algorithms

Presenters: Ulrich J. Ruede, Univeristy of Erlangen; Craig Douglas, Center for Computational Sciences, University of Kentucky

Level: 30% Introductory, 50% Intermediate, 20% Advanced


In order to mitigate the effect of the gap between the high execution speed of modern RISC CPUs and the comparatively poor main memory performance, computer architectures nowadays comprise several additional levels of smaller and faster cache memories which are located physically between the processor and main memory. Efficient program execution, i.e. high MFLOPS rates, can only be achieved if the codes respect this hierarchical memory design. Unfortunately, today's compilers are still far away from automatically performing code transformations like the ones we apply to achieve remarkable speedups. As a consequence, much of this optimization effort is left to the programmer.

In this tutorial, we will first discuss the underlying hardware properties and then present both data layout optimizations, like array padding for example, and data access optimizations, like e.g., loop blocking. The application of these techniques to iterative numerical schemes --- like Gauss-Seidel and multigrid --- can significantly enhance their cache performance and thus reduce their execution times on a variety of machines. We will consider both structured and unstructured grid computations.

These techniques have been implemented in our multigrid library DiMEPACK which is freely available on the web. The use of this library will also be discussed.