Adaptive Parallelism Mapping in Dynamic Environments Using Machine Learning

Modern day hardware platforms are parallel and diverse, ranging from mobiles to data centers and co-location of mainstream parallel applications is increasingly becoming common. The resulting resource contention may lead to drastic degradation in a program’s performance. In addition, the execution environment composed of workloads and hardware resources, is dynamic and unpredictable. Efficient matching of program parallelism to machine parallelism under uncertainty is hard. The mapping policies should anticipate these variations and enable effective resiliency to the a pplications. This talk proposes solutions to the mapping of parallel programs in dynamic environments. It employs predictive modelling techniques to adaptively map programs by determining the best degree of parallelism. When evaluated on highly dynamic executions, these solutions are proven to surpass default, state-of-art adaptive and analytic approaches.

Next, I will introduce an approach for a transparent fault-tolerance approach for MPI that leverages the application checkpoint/restart mechanism used in scientific applications. I will then present a novel approach to optimize applications running on heterogeneous systems. This work analyzes parallel codes and uses machine learning model to decide the best data placement in multi-level memory hierarchy in GPUs.

Argonne Leadership Computing Facility

Leadership Computing Resources

Featured: Aurora

Computational Science

Featured: Engineering

Growing the HPC Community

Accelerating Science

Support Center

Featured: Get Started

Featured: MyALCF

Adaptive Parallelism Mapping in Dynamic Environments Using Machine Learning

08/20/2018, 5:30am CT