Speedup over the baseline DCMESH code on a single Polaris node resulting from a series of code optimizations. Measurement was made using a single OpenMP thread for simplicity. Image: Taufeq Mohammed Razakh, University of Southern California
Simulating quantum systems and complex materials is among the most computationally demanding challenges in science. Such simulations are critical for advancing technologies in energy, electronics, and quantum information, but the calculations often scale beyond the limits of conventional methods. To prepare for the exascale era, researchers from the University of Southern California are leveraging ALCF resources to develop GPU-optimized algorithms for density functional theory (DFT) simulations and precision-aware strategies for electronic structure calculations. Central to both efforts is the team’s Divide-and-Conquer Maxwell-Ehrenfest Surface Hopping (DCMESH) framework, which enables scalable simulations of light-matter interactions.
For light-matter interactions, modeling electron dynamics in real time requires propagating wavefunctions with high fidelity across thousands of time steps, a process that can be prohibitively slow. DFT-based electronic structure codes bottleneck as they approach exascale, with matrix operations and floating-point precision choices heavily influencing performance and accuracy. Overcoming these challenges requires not just more powerful supercomputers but also innovations in algorithms tailored to modern architectures.
For light-matter dynamics, the team optimized DCMESH kernels for GPUs, minimizing CPU-GPU data transfer using shadow dynamics and hierarchical offloading while accelerating the most compute-intensive linear algebra routines. These simulations were run on ALCF’s Polaris and on the Intel Data Center GPU Max Series hardware that powers Aurora. The team also systematically studied how different BLAS (Basic Linear Algebra Subprograms) precision modes affect performance and accuracy in DCMESH, identifying the optimal balance for large-scale DFT simulations.
The team’s DCMESH implementation achieved up to 644x speedup on Nvidia A100 GPUs compared with CPU-based methods, enabling detailed simulations of ultrafast light-induced phenomena. On a single Intel Data Center GPU Max Series 1550, their use of mixed-precision BLAS routines delivered a 1.35x speedup while maintaining accuracy in key outputs such as excited electron counts, current density, and kinetic energy, reaching up to 3.9x for large-scale operations. Weak-scaling tests on 256 Polaris nodes (1,024 GPUs) maintained parallel efficiency above 96 percent, demonstrating the framework’s scalability. These advances establish the foundation for even larger, extreme-scale simulations on Aurora and other exascale systems.
By advancing both GPU acceleration for quantum dynamics and precision-aware strategies for electronic structure calculations, this work lays critical groundwork for the efficient use of Aurora and other exascale systems. These innovations will help researchers probe fundamental processes in light-matter interactions, accelerate the discovery of new materials, and extend the scientific reach of quantum simulations. The team’s precision-aware approach is broadly applicable to other HPC workloads dominated by linear algebra
Razakh, T. M., T. Linker, Y. Luo, R. K. Kalia, K.-I. Nomura, and P. Vashishta. “Accelerating Quantum Light-Matter Dynamics on Graphics Processing Units,” 2024 IEEE International Parallel and Distributed Processing Symposium Workshops (May 2024), IEEE. https://doi.org/10.1109/IPDPSW63119.2024.00176
Piroozan, N., S. J. Pennycook, T. M. Razakh, P. Caday, N. Kumar, and A. Nakano. “Impact of Varying BLAS Precision on DCMESH,” SC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis (November 2024), IEEE. https://doi.org/10.1109/SCW63240.2024.00187