Trace collection for simulation driven co-design of exascale platforms and codes

PI Name: 
David Evensky
PI Email: 
evensky@sandia.gov
Institution: 
Sandia National Laboratories
Allocation Program: 
INCITE
Allocation Hours at ALCF: 
5 Million
Year: 
2011
Research Domain: 
Computer Science

A key problem facing application developers is that the exascale machines which they are targeting will not be available for roughly another seven years. Developers must begin adapting their codes now in order to be ready for these future machines. At the same time, due to the expense of building and operating an exascale machine, it will be necessary to apply tighter engineering margins to their design. Simple metrics such as the ratio of the floating point computation rate to communication rate will not be sufficient to specify machine requirements. Lower layers in the software stack will also have to adapt. Architecture simulation provides the tools that can bridge all of these areas and will be a key factor in enabling co-design of hardware, runtime components, and application code. Sandia has an ongoing NNSA-funded effort in the Structural Simulation Toolkit (SST) computer architecture simulator.

This work is being leveraged in successful responses to the ASCR Advanced Architectures and Critical Technologies for Exascale Computing and X-Stack Software Research calls, and we are currently participating in several Exascale Co-Design Center call responses. The course-grained component of SST (SST/macro) is capable of simulating the performance characteristics of very large systems (>106 cores) and a wide variety of interconnects. It can also be used to experiment with alternate programming models. It can be driven with trace files collected from running applications or with a skeleton application which mimics the control flow and behavior of the real application. For trace-driven simulation we use our DUMPI library to capture full MPI call signatures, PAPI counters, and on supported platforms, and generalized subroutine call tracing. The use of DUMPI allows us to replay application traces on different, simulated architectures and obtain high fidelity predictions of system performance. The INCITE LCF platforms are ideally suited for collecting exceptionally high quality traces that have low OS noise and knowable/predictable node placement. An INCITE allocation would permit us to collect data on both machines and cross validate our results at very large scales. This sort of thorough, large-scale validation will be essential in order to ensure the accuracy of simulated results and will give us confidence that we can successfully co-design applications, runtimes, and systems for exascale computing.