Microsoft DeepSpeed lets 1 GPU train 40 billion parameter neural net

morrislee
Apr 14, 2021
1 min read

Microsoft Releases AI Training Library ZeRO-3 Offload

InfoQ article https://www.infoq.com/news/2021/04/microsoft-zero3-offload

DeepSpeed ZeRO-3 Offload

Microsoft DeepSpeed blog https://www.deepspeed.ai/news/2021/03/07/zero3-offload.html

GitHub https://github.com/microsoft/DeepSpeed

The DeepSpeed team provided an overview of the features and benefits of the release in a recent blog post. ZeRO-3 Offload increases the memory efficiency of distributed training for deep-learning models built on the PyTorch framework, providing super-linear scaling across multiple GPUs. By offloading the storage of some data from the GPU to the CPU, larger model sizes per GPU can be trained, enabling model sizes up to 40B parameters on a single GPU. Adopting the DeepSpeed framework for training requires minimal refactoring of model code, and current users can take advantage of the new features by modifying a config file.

Please like and share this post if you enjoyed it using the buttons at the bottom!

Stay up to date. Subscribe to my posts https://morrislee1234.wixsite.com/website/contact

Web site with my other posts by category https://morrislee1234.wixsite.com/website

#DeepLearning #DeepSpeed #PyTorch #AINewsClips #AI #ML #ArtificialIntelligence #MachineLearning

News to help your R&D in artificial intelligence, machine learning, robotics, computer vision, smart hardware

Microsoft DeepSpeed lets 1 GPU train 40 billion parameter neural net

Recent Posts

Comments