MobileViT: an accurate, light-weight, mobile-friendly vision transformer
MobileViT: an accurate, light-weight, mobile-friendly vision transformer
MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer
arXiv paper abstract https://arxiv.org/abs/2110.02178
arXiv PDF paper https://arxiv.org/pdf/2110.02178.pdf
Light-weight convolutional neural networks (CNNs) are the de-facto for mobile vision tasks. ... However, these networks are spatially local.
To learn global representations, self-attention-based vision transformers (ViTs) have been adopted. Unlike CNNs, ViTs are heavy-weight.
... introduce MobileViT, a light-weight and general-purpose vision transformer for mobile devices. ... presents ... transformers as convolutions.
... MobileViT significantly outperforms CNN- and ViT-based networks across different tasks and datasets.
On the ImageNet-1k dataset, MobileViT achieves top-1 accuracy of 78.4% with about 6 million parameters, which is 3.2% and 6.2% more accurate than MobileNetv3 (CNN-based) and DeIT (ViT-based) for a similar number of parameters.
On the MS-COCO object detection task, MobileViT is 5.7% more accurate than Mo-bileNetv3 for a similar number of parameters.
Please like and share this post if you enjoyed it using the buttons at the bottom!
Stay up to date. Subscribe to my posts https://morrislee1234.wixsite.com/website/contact
Web site with my other posts by category https://morrislee1234.wixsite.com/website
#ComputerVision #Transformers #AINewsClips #AI #ML #ArtificialIntelligence #MachineLearning
Comments