Designing for Heterogeneity and Unreliability: Examples from Adaptive Mesh Refinement

Anshu Dubey
Seminar

High performance scientific software has many unique and challenging characteristics.  These codes typically consist of different stages of computation. The stages in turn use disparate algorithms and components that often put conflicting demands on the programming models and data structures. Serious performance and portability issues arise from heterogeneity of both algorithms and platforms. One can exploit abstractions to achieve portability, but for that the software must be designed to allow efficient interplay between various abstractions. Additionally, because of expected unreliability of future platforms, resiliency has become a critical element in software design. The structure of the application can be exploited to formulate a resiliency strategy that is adjustable depending on different computing environments. I have been recently working on the design of a software framework and a customizable resiliency strategy in the context of adaptive mesh refinement (AMR). A presentation of this work will form the first part of my talk. In the second part I will give a broad overview of my earlier work with software engineering of FLASH, a multi-component & multi-physics community code.