Privacy-Preserving Federated Learning for Foundation Models

PI Kibaek Kim, Argonne National Laboratory
Co-PI Thomas Flynn, Brookhaven National Laboratory
Minseok Ryu, Arizona State University
Olivera Kotevska, Oak Ridge National Laboratory
Farzad Yousefian, Rutgers University, New Brunswick
Project Summary

Using DOE supercomputers, researchers are training powerful AI models that learn from text, images, and energy data across many institutions—without sharing sensitive data—to accelerate discovery, strengthen the power grid, and enable secure scientific collaboration.

Project Description

This project advances privacy-preserving federated learning (PPFL) to enable the training of large-scale foundation models (FMs) on sensitive, multimodal scientific data distributed across institutions. By leveraging the Department of Energy’s (DOE) high-performance computing (HPC) facilities—including Frontier, Aurora, Polaris, and Perlmutter—the research team will train FMs in four key areas: extracting knowledge from scientific text, interpreting high-resolution imaging data from DOE light sources, forecasting building energy consumption using national building datasets, and modeling electric grid operations through graph-based learning. These models will be developed without centralizing data, preserving privacy while enabling collaborative AI development across national laboratories and universities. 

The project supports DOE’s mission by delivering AI capabilities that enhance energy resilience, scientific discovery, and secure collaboration. The PPFL framework will integrate scalable optimization and privacy- preserving mechanisms—such as adaptive compression, federated pruning, and differential privacy—to reduce communication costs and safeguard sensitive data. The outcomes will establish a foundation for collaboratively training foundation models on multiple DOE’s exascale systems, contributing to breakthroughs in imaging science, biomedical science, and grid modernization, while setting a precedent for secure, multi-institutional research at leadership computing scale.

Allocations