DataStates-LLM: Lazy Asynchronous Checkpointing for Large Language Models

Authors
Maurya, A., R. Underwood, M. M. Rafique, F. Cappello, and B. Nicolae
Publication Date
Name of Publication Source
HPDC '24: Proceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing
Publisher
IEEE
Conference Location
New York, NY
DOI
10.1145/3625549.3658685