• Efficient Layouts for Hierarchical Memories (Thursday 1:30-3:00PM)
    Room A102/104/106
    Chair: Barbara Horner-Miller, Artic Region Supercomputing Center

    • Title: A Ghost Cell Expansion Method for Reducing Communications in Solving PDE Problems
    • Authors:
      Chris Ding (Lawrence Berkeley National Laboratory)
      Yun He (Lawrence Berkeley National Laboratory)
    • Abstract:
      In solving Partial Differential Equations, such as the Barotropic equations in ocean models, on Distributed Memory Computers, finite difference methods are commonly used. Most often, processor subdomain boundaries must be updated at each time step. This boundary update process involves many messages of small sizes, therefore large communication overhead. Here we propose a new approach which expands the ghost cell layers and thus updates boundaries much less frequently --- reducing total message volume and groupping small messages into bigger ones. Together with a technique for eliminating diagonal communications, the method speedup communication substantially, upto 170\%. We explain the method and implementation in details, provide systematic timing results and performance analysis on the Cray T3E and IBM SP.

    • Title: Improving Parallel Irregular Reductions Using Partial Array Expansion
    • Authors:
      Eladio Gutierrez (University of Malaga, Spain)
      Oscar Plata (University of Malaga, Spain)
      Emilio L. Zapata (University of Malaga, Spain)
    • Abstract:
      Much effort has been devoted recently to efficiently parallelize irregular reductions. In this paper, parallelizing techniques for these computations are analyzed in terms of three performance aspects: parallelism, data locality and memory overhead. These aspects have a strong influence in the overall performance and scalability of the parallel code. We will discuss how the parallelization techniques usually try to optimize some of these aspects, while missing the other(s). We will show that by combining complementary techniques we can improve the overall performance/scalability of the parallel irregular reduction, obtaining an effective solution for large problems on large machines. Specifically, a combination of array expansion and a locality-oriented method (DWA-LIP), named partial array expansion, is introduced. An implementation of the proposed method is discussed, showing that the transformation that the compiler must apply to the irregular reduction code is not excessively complex. Finally, the method is analyzed and experimentally evaluated.

    • Title: Increasing Temporal Locality with Skewing and Recursive Blocking
    • Authors:
      Guohua Jin (Rice University)
      John Mellor-Crummey (Rice University)
      Robert Fowler (Rice University)
    • Abstract:
      We present a strategy, called recursive prismatic time skewing, that increase temporal reuse at all memory hierarchy levels, thus improving the performance of scientific codes that use iterative methods. Prismatic time skewing partitions iteration space of multiple loops into skewed prisms with both spatial and temporal (or convergence) dimensions. Novel aspects of this work include: multi-dimensional loop skewing; handling carried data dependences in the skewed loops without additional storage; bi-directional skewing to accommodate periodic boundary conditions; and an analysis and transformation strategy that works inter-procedurally. We combine prismatic skewing with a recursive blocking strategy to boost reuse at all levels in a memory hierarchy. A preliminary evaluation of these techniques shows significant performance improvements compared both to original codes and to methods described previously in the literature. With an inter-procedural application of our techniques, we were able to reduce total primary cache misses of a large application code by 27% and secondary cache misses by 119%.