A Hybrid Tensor-Expert-Data Parallelism Approach to Optimize Mixture-of-Experts Training

Authors
Singh, S., O. Ruwase, A. A. Awan, S. Rajbhandari, Y. He, and A. Bhatele
Publication Date
Name of Publication Source
ICS '23: Proceedings of the 37th International Conference on Supercomputing
Publisher
ACM
Page Numbers
203-214
DOI
10.1145/3577193.3593704