Achieving High Sustained Performance in an Unstructured Mesh CFD Application

W. D. Gropp, D. K. Kaushik, D. E. Keyes, and B. F. Smith

There are three aspects to achieving high sustained performance on large-scale CFD applications, in terms of solutions per second. The first is a scalable algorithm, in the sense that the convergence rate does not deteriorate with increasing resolution of the discretization. The second is good per-processor performance, which requires great care for sparse problems on contemporary cache-based microprocessors. The third is a scalable parallel implementation, in the sense that the time-per-iteration does not deteriorate as as some measure of the the computational work per iteration and the number of processors increase in proportion. The pseudo-transient Newton-Krylov-Schwarz (Psi-NKS), method is a reasonable means of achieving high performance on distributed hierarchical memory machines for general nonlinear PDE problems. Convergence rate can be controlled by either time-step size or the addition of a coarse grid. The the memory-bandwidth limit (a more realistic measure of achievable performance than peak floating-point for sparse problems). Finally, on any architecture with a sufficiently rich interconnection network, Psi-NKS leads to good per-iteration scalability, as argued from a simple analytical models and verified by numerical experiments. We illustrate these claims within the context of an "applications grade" general purpose computational fluid dynamics code, which has been ported to several of the world's most powerful machines.