Parallel Implicit Solvers for Radiation Transport Systems: Overview

Parallel implicit solvers of Newton-Krylov-Schwarz (NKS) and full-approximation-scheme (FAS) multigrid type will be applied to prototypical unclassified nonlinear problems in radiation transport (RT), to develop customizations of these algorithms to RT applications and to the ASCI testbed computers, and to compare and combine these two classes of solvers in the ASCI application and architecture environment. NKS and multilevel methods provide a range of near-optimal scaling behaviors, from optimal convergence with sacrifices in parallel efficiency to optimal parallel efficiency with sacrifices in convergence rate. As a result of these investigations, ASCI computational physicists will be equipped to design and tune solvers for classified RT codes.

Radiation transport is, of course, a subsystem in the multidisciplinary problems of ultimate interest to ASCI. In past work, we have demonstrated the effectiveness of our parallel implicit solvers on problems involving hydrodynamics alone. After demonstrating their effectiveness on problems in radiation transport alone, the ultimate motivation is to incorporate them into multiple-scale coupled problems in radiation hydrodynamics.

NKS and multilevel methods have been employed to date on structured and unstructured three-dimensional problems in aerodynamics, on grids of millions of vertices, and demonstrated on up to a thousand nodes of distributed memory architectures in the ``fixed-memory-per-node'' limit of greatest interest to ASCI tera-scale computing. The ASCI platforms will be benchmarked on these familiar applications, as they are continually upgraded. This will provide useful feedback to the ASCI program from the outset, while simultaneously developing the RT applications agenda.

Radiation transport problems, whose workload dominates the budgets of many DP lab production supercomputers, are typically solved today through a process of operator splitting, field-by-field decoupling, and global relaxations/recurrences that limit their convergence rates and parallel scalability on large-scale problems. It is expected by most researchers that most of today's production codes will need to be rewritten for execution on the ASCI teraflop-scale machines. Indeed, their solvers will need to be replaced, but the most valuable application domain subroutines in these codes can be ``rescued'' for distributed-memory parallel execution with relatively little change. In particular, the discretization routines --- those that define the conservation law residual vectors in steady-state codes or the delta-quantities in evolution codes, and associated ``left-hand side'' linearized operators --- can be extracted and called by independently hosted computational processes in a subdomain-by-subdomain manner with significant memory locality, by a distributed solver library. The distributed solution algorithms are implemented to have the same domain-based memory locality as the routines that evaluate the discretized residuals.

In the context of addressing ASCI problems, existing solver methodology will be extended in a number of ways. The most important of these are the refinement of data orderings and blockings for cache locality, the introduction of multiobjective domain partitioning, and the development of implicit algorithms with greater latency tolerance. The first of these topics is beneficial at any parallel granularity. The last two become crucial at the high parallel granularities that characterize the multi-teraflop/s ASCI systems.

[ ASCI Project Home Page | Highlights | Overview | People | Talks | Papers ]