"Multigrain parallelism and application-level resource balancing: Improving performance and robustness of iterative methods" Dr. Andreas Stathopoulos College of William and Mary The numerical solution of large, sparse, eigenvalue problems and systems of linear equations is central to many scientific and engineering applications. Krylov-type iterative methods often provide the only algorithmic means for solving these problems while parallel computing is a critical computational component in large scale applications. Yet, the performance of existing methods falls short of the capabilities of today's hardware platforms for various reasons. Synchronization in traditional, fine grain implementations of iterative methods becomes a scalability hurdle in case of massively parallel processors (MPPs), or in case of high overhead networks in COWs. In addition, COW environments are often heterogenous and/ or time shared. In that case, dynamic load balancing is central to achieving high performance. There are two key observations in this research. First, today's large memory sizes enable small subgroups of processors of MPPs or COWs to store the whole matrix. This is also true in many matrix-free applications. Each of these subgroups can apply the preconditioning step independently on a different block vector, thus mixing fine and coarse grain. Second, an advanced interaction between the algorithm and the runtime system is required to achieve load balancing. We present a multigrain implementation of the Jacobi-Davidson eigenvalue solver, and we dynamically load balance its execution by letting every processor subgroup iterate on its preconditioning equation for a fixed amount of time. By design, this scheme adapts to variable external load. In addition, if the program detects memory thrashing on a node, it recedes its preconditioning phase from that node, hopefully speeding the completion of competing jobs and hence the relinquishing of their resources.