Probabilistic Machine Learning, and Deep Learning for Modeling Compute Facilities and Applications in Physical Sciences

Sandeep Madireddy
Seminar

Machine learning approaches have demonstrated state-of-the-art performances in wide range of commercial application domains; however, their success is limited for scientific domain due to several challenges such as uncertainty, data scarcity, and model interpretability.

In the first part of the talk, I will describe probabilistic machine learning approaches (sensitivity-based Gaussian process and conditional variational autoencoder (CVAE)) developed to model application I/O performance and its variability as a function of application and file system characteristics on leadership-class systems. Then I will describe a concept-drift-aware predictive modeling approach to adapt the performance models to account for abrupt changes in HPC systems. This approach consists of online Bayesian changepoint detection, and a moment-matching transformation to adapt the predictive model for the changepoint.

In second part of the talk, I will describe my work on deep learning for image data from physical sciences. First, I will describe our approach to automatically segment 3D atom probe tomography data of Co- and Al-based super alloys, using deep learning-based edge detection that transfer learns knowledge from real world images. Secondly, I will describe a deep learning-based compression artifact reduction in JPEG images of outputs generated from climate simulations, that has the ability to transfer learn the knowledge from previous simulations to dynamically enhance the data from the running simulation.

I will conclude the talk by touching upon other ongoing works and discuss my future research directions in scientific machine learning.