Democratizing AI by Training Deployable Open-source Language Models

Project Summary

The team will use a sizeable INCITE allocation to explore efficient alternatives for transformer models for language modeling.

Project Description

Artificial intelligence (AI), and deep learning (DL) in particular, is rapidly becoming pervasive in almost all areas of computer science, and is even being used to assist computational science modeling and simulations. At the forefront of this development are large language models (LLMs). The challenges the team seeks to address in this project originate from the fact that large models do not fit on a single CPU/GPU and/or take a long time to train. Scaling the training of large neural networks to extreme levels of parallelism requires parallelizing and optimizing different computational and communication motifs such as dense and sparse tensor computations, irregular communication patterns, load imbalance issues, and fast filesystem access.

The team will use a sizeable INCITE allocation across the three platforms (Polaris, Aurora, and Frontier) to advance research in three directions. First, the scaling of parallel training of deep learning models to a large number of GPUs is non-trivial. They plan to use their framework, AxoNN, to analyze and optimize the performance and portability of training, finetuning, and inference. Second, they plan to explore efficient alternatives for transformer models for language modeling. They intend to train variants of modern language model architectures that are directly aimed at usability constraints in smaller academic laboratories. The team is focused on variants with smaller memory footprints and adaptive compute capabilities at deployment to enable more research and development in the fields of machine learning and NLP. Third, they propose to fine-tune trained models for several downstream tasks. The team plans to utilize the trained models for several HPC-related tasks such as improving portability and studying performance explainability.

Argonne Leadership Computing Facility

Leadership Computing Resources

Featured: Aurora

Computational Science

Featured: Engineering

Growing the HPC Community

Accelerating Science

Support Center

Featured: Get Started

Featured: MyALCF

Democratizing AI by Training Deployable Open-source Language Models

Leadership Computing Resources

Featured: Aurora

Computational Science

Featured: Engineering

Growing the HPC Community

Accelerating Science

Support Center

Featured: Get Started

Featured: MyALCF

Democratizing AI by Training Deployable Open-source Language Models

More Computer Science Projects

Tensor-Compressed Sustainable Pre-Training of Extreme-Scale Foundation Models

Enhancing APS-Enabled Research through Integrated Research Infrastructure

EMERGE: ExaEpi Calibration Runs and Surrogate Models