Scalable Parallel Machine Learning on High-Performance Computing Systems - Clustering and Reinforcement Learning

Weijian Zheng, Argonne National Laboratory
CS Seminar Graphic

This talk will introduce my work on accelerating two machine learning applications on HPC systems. My parallel algorithm design focuses on the following factors: good scalability, optimized communication cost, improved or similar convergence rate, and comparable optimization cost or solution quality. I will first present how to integrate the synchronization-reducing algorithm and the sampling methods to a new parallel clustering algorithm. Later I will present a framework to solve combinatorial optimization problems over large graphs using reinforcement learning. I will also discuss different parallelism strategies and performance optimization methods over the new framework.

Weijian Zheng is a postdoc appointee in Data Science and Learning Division at Argonne National Laboratory. He earned his PhD in Computer Science from Purdue University in 2022. His research interests include parallel algorithm design, parallel machine learning algorithms, high-performance computing, and distributed computing.