Supercomputers and Parallel AI Training

Huihuo Zheng , ALCF
STS Zheng Session

With large datasets, training on a single computer can take years. To increase efficiency, models can be trained faster in parallel. Trainees will be introduced to parallel training with Horovod and discuss parallel processing with MPI, along with submitting large-scale training jobs on ALCF systems.

About the Speaker

Huihuo Zheng is a computer scientist at the Argonne Leadership Computing Facility. His areas of interest are first-principles simulations of condensed matter systems, excited state properties of materials, strongly correlated electronic systems, and high-performance computing.