Optimizing MPI Collective Communication for Exascale using Machine Learning Speaker: Mike Wilkins, Ph.D. Candidate

As high performance computing systems grow to exascale and beyond, optimizing MPI collective communication is increasingly critical. Collective performance relies heavily on selecting the appropriate algorithm to implement the desired communication. To improve the selection process, Machine learning (ML) autotuners are a promising alternative to the outdated heuristics used by production MPI libraries. However, ML autotuners require significant training time, rendering them impractical on large scale systems. I will present our novel ML autotuner design, which incorporates a custom active learning strategy and other optimizations to minimize training overhead. Additionally, I will describe our recent effort to develop new variable-radix algorithms that outperform the state-of-the-art algorithms and enhance the potency of ML autotuning.

Bio: Mike Wilkins is a Ph.D. candidate at Northwestern University, co-advised by Dr. Peter Dinda and Dr. Nikos Hardavellas. He is also a visiting student at Argonne National Laboratory, co-advised by Dr. Yanfei Guo and Dr. Rajeev Thakur. Mike’s research focuses on transparent optimizations for communication on highly-parallel computer systems.

To add to calendar:

Click on: https://wordpress.cels.anl.gov/cels-seminars/

Enter your credentials.

Search for your seminar

Click “Add to calendar”

Argonne Leadership Computing Facility

Leadership Computing Resources

Featured: Aurora

Computational Science

Featured: Engineering

Growing the HPC Community

Accelerating Science

Support Center

Featured: Get Started

Optimizing MPI Collective Communication for Exascale using Machine Learning Speaker: Mike Wilkins, Ph.D. Candidate

11/20/2023, 9 – 10am CT