Real-time scene segmentation on mobile devices using feature pyramids with TopFormer
Real-time scene segmentation on mobile devices using feature pyramids with TopFormer
TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation
arXiv paper abstract https://arxiv.org/abs/2204.05525v1
arXiv PDF paper https://arxiv.org/pdf/2204.05525v1.pdf
Although vision transformers (ViTs) have achieved great success ... the heavy computational cost hampers their applications to dense prediction tasks such as semantic segmentation on mobile devices.
... present a mobile-friendly architecture named Token Pyramid Vision Transformer (TopFormer).
... TopFormer takes Tokens from various scales as input to produce scale-aware semantic features, which are then injected into the corresponding tokens to augment the representation.
... significantly outperforms CNN- and ViT-based networks across several semantic segmentation datasets and achieves a good trade-off between accuracy and latency.
... TopFormer achieves 5% higher accuracy in mIoU than MobileNetV3 with lower latency on an ARM-based mobile device.
... tiny version of TopFormer achieves real-time inference on an ARM-based mobile device with competitive results. ...
Please like and share this post if you enjoyed it using the buttons at the bottom!
Stay up to date. Subscribe to my posts https://morrislee1234.wixsite.com/website/contact
Web site with my other posts by category https://morrislee1234.wixsite.com/website
LinkedIn https://www.linkedin.com/in/morris-lee-47877b7b
#ComputerVision #Segmentation #AINewsClips #AI #ML #ArtificialIntelligence #MachineLearning
Commentaires