This project is inventing ways to train and run giant science AI models far faster and with much less energy—using smart compression on supercomputers—so they’re cheaper to build, easier to use, and don’t crowd out other critical research.
Large foundation models for science will face the same challenges of pre-training and inference as state- of-the-art large language models (LLMs). The time and energy needed on exascale computing systems will drastically limit the number of pre-training attempts (to only one) and capability to tune the model at this stage, leaving model corrections feasible only at post-training (alignment, fine-tuning) stage. The resource occupation needed for a single pre-training will reduce the availability of high-performance computing systems for mission-critical simulations and data analytics significantly. Thirdly, foundation models require significant resources to run inferences, which will limit their broad deployment for science. Reducing drastically the computing and memory cost of foundation models for science will have a critical impact on the feasibility, duration, and energy consumption of pre-training and inferences.
Leveraging the research team’s prior research on tensor-compressed pre-training, this project will design and develop a memory- and computing-efficient pre-training framework and generate various resource- efficient foundation models based on it. To this end, the research team will explore three novel contributions: (1) theoretical foundation and novel optimization of low-rank tensor-compressed pre-training for large-scale foundation models, (2) training acceleration via mixed-precision low-rank tensor optimization and customized tensorized automatic differentiation, (3) graphic processing unit (GPU) optimization on leadership computing platforms for large-scale tensor-compressed pre-training on massive GPUs. This project aims to enable energy-efficient training and inference of extreme-scale foundation models for science, significantly reducing training time and energy cost.
Based on the Year-1 progress of the research team, this ALCC allocation will be used to support the Year-2 research plan of a 3-year research project. Main computing experiments include validation of the proposed methods and the comparison with various pre-training baseline methods on some public-domain LLMs and vision language models. 2025 ASCR Leadership Computing C