Microsoft DeepSpeed lets 1 GPU train 40 billion parameter neural net
Microsoft DeepSpeed lets 1 GPU train 40 billion parameter neural net
Microsoft Releases AI Training Library ZeRO-3 Offload
DeepSpeed ZeRO-3 Offload
Microsoft DeepSpeed blog https://www.deepspeed.ai/news/2021/03/07/zero3-offload.html
The DeepSpeed team provided an overview of the features and benefits of the release in a recent blog post. ZeRO-3 Offload increases the memory efficiency of distributed training for deep-learning models built on the PyTorch framework, providing super-linear scaling across multiple GPUs. By offloading the storage of some data from the GPU to the CPU, larger model sizes per GPU can be trained, enabling model sizes up to 40B parameters on a single GPU. Adopting the DeepSpeed framework for training requires minimal refactoring of model code, and current users can take advantage of the new features by modifying a config file.
Please like and share this post if you enjoyed it using the buttons at the bottom!
Stay up to date. Subscribe to my posts https://morrislee1234.wixsite.com/website/contact
Web site with my other posts by category https://morrislee1234.wixsite.com/website
Comments