The machines being requisitioned and assembled for the Accelerated Strategic Computing Initiative (ASCI) are not the first to contain processors numbering in the tens of thousands, However, unlike the SIMD machines of the decade past, the individual processors are powerful pipelined, multiple-issue floating-point engines with complex multilevel memories. A simple parallel performance model which assumes that computation costs fall inversely to processor number, while communication costs rise with some slowly growing network topology-dependent function of processor number, is inadequate to account for performance discontinuities that are crossed as local problem size suddenly pops into the next level of the memory hierarchy.
The convergence theory for Schwarz methods developed in the decade past is useful for scalar linear problems of great variety: elliptic, parabolic, and hyperbolic, with uniform, anisotropic, and inhomogeneous coefficients. However, the problems slated for solution on teraflop-scale machines are multicomponent and nonlinear. A condition number estimate for a single linearization is inadequate to account for the evolving character of linear subiterations in a nonlinear solution process.
Software libraries for PDEs developed in the pre-object oriented era are wonderful for packaging algorithmic and applications domain expertise for production users, and the ``natural'' role for libraries seems even greater on parallel production facilities. However, libraries (and some parallel language constructs) created on the presumption that their data structures (and parallel distributions thereof) can be targeted by the calling program, are inadequate for multiphysics or multi(-computational-)phase problems in which computational work per data structure element is not uniform in time.
For all of these reasons, and more, it is difficult to combine performance results on a single application with convergence and parallel efficiency estimates to extrapolate the scalability of domain decomposition algorithms to many of the practical problems that have been invoked to motivate their development. Familiarity with existing performance models, theories, and software libraries becomes even more important in this brave new world, however, since a variety of different regimes must be pieced together. Decomposition by domain remains a useful framework in which to handle the integrated problem, as will be argued through real and proximately imaginable case studies.
This talk draws upon collaborations with several collaborators in an NSF Multidisciplinary Computing Challenge project.