Influence of the Memory Subsystem on Monte Carlo Code Performance

Paul Romano
Seminar

Recent studies by ANL/MCS have shown that Monte Carlo transport calculations of reactor problems are primarily limited not by floating point operations but rather by memory latency and bandwidth. This has important consequences for future algorithmic efforts; algorithms that require less memory but more computation will result in better scalability. The present study gives a detailed look at how miss rates and latencies in a multi-level memory hierarchy can have significant effects on the performance of a Monte Carlo code. Simulations of the Monte Carlo performance benchmark using the OpenMC Monte Carlo code were run, and hardware performance counters were collected using the Performance API (PAPI). The results of the simulations and an accompanying analysis suggest that for light-water reactor depletion problems, the most important factor that determines performance is the effective memory latency accounting for characteristics of the L2 cache, L3 cache, and main memory. The observed performance in multi-socket NUMA architectures was able to be explained by the performance counters collected.

Bio:
Paul Romano is a senior nuclear engineer currently working in Nuclear Data and Methods at the Knolls Atomic Power Laboratory as well as a research affiliate with the Massachusetts Institute of Technology. Prior to joining KAPL, he completed his doctoral studies in Nuclear Science and Engineering at MIT, where his thesis focused on the development of parallel algorithms for Monte Carlo simulations. Paul is a strong proponent of open source software as the lead developer of the OpenMC Monte Carlo code and also a founder/developer of the PyNE third-party Python package for nuclear engineering.