Vision transformer beats CNN on mobile devices for accuracy and speed with ElasticViT

morrislee
Mar 20, 2023
1 min read

ElasticViT: Conflict-aware Supernet Training for Deploying Fast Vision Transformer on Diverse Mobile Devices

arXiv paper abstract https://arxiv.org/abs/2303.09730

arXiv PDF paper https://arxiv.org/pdf/2303.09730.pdf

... designing lightweight and low-latency ViT models for diverse mobile devices remains a big challenge.

... propose ElasticViT, a two-stage NAS approach that trains a high-quality ViT supernet over a very large search space that supports a wide range of mobile devices, and then searches an optimal sub-network (subnet) for direct deployment.

... Complexity-aware sampling limits the FLOPs difference among the subnets sampled across adjacent training steps, while covering different-sized subnets in the search space.

Performance-aware sampling further selects subnets that have good accuracy, which can reduce gradient conflicts and improve supernet quality.

... discovered models, ElasticViT models, achieve top-1 accuracy ... without extra retraining, outperforming all prior CNNs and ViTs in terms of accuracy and latency.

... the first ViT models that surpass state-of-the-art CNNs with significantly lower latency on mobile devices.

Please like and share this post if you enjoyed it using the buttons at the bottom!

Stay up to date. Subscribe to my posts https://morrislee1234.wixsite.com/website/contact

Web site with my other posts by category https://morrislee1234.wixsite.com/website

LinkedIn https://www.linkedin.com/in/morris-lee-47877b7b

#ComputerVision #Transformers #AINewsClips #AI #ML #ArtificialIntelligence #MachineLearning

News to help your R&D in artificial intelligence, machine learning, robotics, computer vision, smart hardware

Vision transformer beats CNN on mobile devices for accuracy and speed with ElasticViT

Recent Posts

Comments