Hire an Illini

Shelby L. Lockhart

  • Advisor:
    • Luke Olson
  • Departments:
  • Areas of Expertise:
    • High Performance Computing
    • Scientific Computing
  • Thesis Title:
    • Reducing Communication Bottlenecks in Iterative Solvers
  • Thesis abstract:
    • This work focuses on reducing communication bottlenecks in iterative solvers, specifically, communication bottlenecks associated with the advent of modern high performance computing architectures. Communication bottlenecks can be the result of irregular point-to-point communication or high synchronization costs across all participating processes. There are two ways to address these communication bottlenecks, namely, alterations to the parallel implementation of communication within a given operation or algorithmic changes to reduce communication in an algorithm by performing mathematically equivalent operations that require less communication. A novel node-aware communication technique that takes advantage of the high-bandwidth interconnects and high CPU core counts on modern supercomputers is presented, which demonstrates improved performance for irregular point-to-point communication for both distributed inter-CPU and inter-GPU communication when communicating large data volumes. The technique is used in the design of a communication efficient enlarged conjugate gradient method implemented within a CPU-based parallel solver. Additionally, the technique is used to reduce communication costs in the unstructured mesh boundary exchanges in MIRGE-Com, a GPU-based codebase for solving the compressible Navier-Stokes equations for viscous flows, and the Euler equations for inviscid flows of reactive fluid mixtures within the Center for Exascale Enabled Scramjet Design. Recent research introduced low synchronization orthogonalization routines within the context of generalized minimum residual methods, demonstrating the power of re-writing solvers to take advantage of mathematically equivalent operations that decompose into more distributed-computing friendly kernels. A performance study of these low synchronization orthogonalization routines extended to Anderson acceleration is presented, demonstrating a reduction in synchronization cost for the method in both distributed CPU and GPU computing environments. Furthermore, an analysis of recent Anderson acceleration variants, alternating Anderson acceleration and composite Anderson acceleration, and the extension of low synchronization orthogonalization techniques to composite Anderson acceleration is included. Importantly, considerations for the development of performant Anderson acceleration solvers on emerging supercomputer architectures are discussed. In sum, this work contributes strategies for reducing communication bottlenecks in iterative solvers which take into consideration the underlying architecture of high performance computing systems. Techniques utilizing restructured message passing and updated mathematical algorithms which require less costly communication are presented.
  • Downloads:

Contact information:
sll2@illinois.edu