Supercomputers and Parallel AI Training

Huihuo Zheng , ALCF
Marieme Ngom, Argonne National Laboratory
STS Zheng Session

With large datasets, training on a single computer can take years. To increase efficiency, models can be trained faster in parallel. Trainees will be introduced to parallel training with Horovod and discuss parallel processing with MPI, along with submitting large-scale training jobs on ALCF systems.

Targeted Adaptive Design - Targeted Adaptive Design (TAD) is a novel probabilistic machine learning model that aims at efficiently and autonomously finding control parameters that would result in a desired design. This talk will present TAD and its application to Atomic Layer Deposition (ALD).

About the Speaker

Huihuo Zheng is a computer scientist at the Argonne Leadership Computing Facility. His areas of interest are first-principles simulations of condensed matter systems, excited state properties of materials, strongly correlated electronic systems, and high-performance computing.

Marieme Ngom is a postdoctoral researcher in the Mathematics and Computer Science (MCS) division of Argonne National Laboratory. Her research interests lie at the intersection of machine learning and dynamical systems modeling with applications in chemical engineering and material sciences. Ngom received her Ph.D. in Mathematics from the University of Illinois at Chicago (UIC) in 2018 and holds an MSc in Mathematics from the University of Paris-Saclay (formerly Paris XI) and an MEng in Computer Science and Applied Mathematics from the National Polytechnic Institute of Toulouse.