High-Order Discontinuous Galerkin Methods by GPU Metaprogramming

Andreas Kloeckner
Seminar

Creating peak-performance and scalable compute codes on graphics processors is a challenge that is aggravated by complicated and constantly changing hardware. In this talk, I will describe techniques and tools to tap the enormous performance potential of GPUs for discontinuous Galerkin finite element solvers. Particular emphasis will
be on the advantages that high-order discretizations offer on modern SIMD-like architectures. I will explain a few of the design considerations and tricks that enabled sustained single-chip floating point performance of above 200 GFlops/s across a wide range of discretization parameters and equation types. I will introduce tools for run-time code generation and empirical optimization from a high-level language that were crucial to the present effort. With the infrastructure in place, further discussion will concern some potential applications and perspectives on how this technology might change requirements for algorithms that work alongside PDE solvers, such as time steppers and linear solvers.