Vision transformer for object detection beats state-of-art ResNet-50
Vision transformer for object detection beats state-of-art ResNet-50
Will Transformers Replace CNNs in Computer Vision?
In less than 5 minutes, you will know how the transformer architecture can be applied to computer vision with a new paper called the Swin Transformer
YouTube 9 min video https://www.youtube.com/watch?v=QcCJJOLCeJQ
Comparison of SWIN with other object detection algorithms
Object Detection on COCO test-dev
Papers With Code https://paperswithcode.com/sota/object-detection-on-coco
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows (Microsoft)
arXiv paper abstract https://arxiv.org/abs/2103.14030v1
arXiv PDF paper https://arxiv.org/pdf/2103.14030v1.pdf
This paper presents a new vision Transformer, called Swin Transformer, that capably serves as a general-purpose backbone for computer vision.
...
This hierarchical architecture has the flexibility to model at various scales and has linear computational complexity with respect to image size. These qualities of Swin Transformer make it compatible with a broad range of vision tasks, including image classification ... and dense prediction tasks such as object detection ... and semantic segmentation
...
Its performance surpasses the previous state-of-the-art by a large margin of +2.7 box AP and +2.6 mask AP on COCO, and +3.2 mIoU on ADE20K, demonstrating the potential of Transformer-based models as vision backbones.
Please like and share this post if you enjoyed it using the buttons at the bottom!
Stay up to date. Subscribe to my posts https://morrislee1234.wixsite.com/website/contact
Web site with my other posts by category https://morrislee1234.wixsite.com/website
Comments