Real-time scene segmentation on mobile devices using feature pyramids with TopFormer

morrislee
Apr 14, 2022
1 min read

TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation

arXiv paper abstract https://arxiv.org/abs/2204.05525v1

arXiv PDF paper https://arxiv.org/pdf/2204.05525v1.pdf

GitHub https://github.com/hustvl/TopFormer

Although vision transformers (ViTs) have achieved great success ... the heavy computational cost hampers their applications to dense prediction tasks such as semantic segmentation on mobile devices.

... present a mobile-friendly architecture named Token Pyramid Vision Transformer (TopFormer).

... TopFormer takes Tokens from various scales as input to produce scale-aware semantic features, which are then injected into the corresponding tokens to augment the representation.

... significantly outperforms CNN- and ViT-based networks across several semantic segmentation datasets and achieves a good trade-off between accuracy and latency.

... TopFormer achieves 5% higher accuracy in mIoU than MobileNetV3 with lower latency on an ARM-based mobile device.

... tiny version of TopFormer achieves real-time inference on an ARM-based mobile device with competitive results. ...

Please like and share this post if you enjoyed it using the buttons at the bottom! Stay up to date. Subscribe to my posts https://morrislee1234.wixsite.com/website/contact Web site with my other posts by category https://morrislee1234.wixsite.com/website LinkedIn https://www.linkedin.com/in/morris-lee-47877b7b #ComputerVision #Segmentation #AINewsClips #AI #ML #ArtificialIntelligence #MachineLearning

News to help your R&D in artificial intelligence, machine learning, robotics, computer vision, smart hardware

Real-time scene segmentation on mobile devices using feature pyramids with TopFormer

Recent Posts

Comments