Object detection and segmentation with a simple transformer using scale-aware attention with SimPLR
Object detection and segmentation with a simple transformer using scale-aware attention with SimPLR
SimPLR: A Simple and Plain Transformer for Object Detection and Segmentation
arXiv paper abstract https://arxiv.org/abs/2310.05920
arXiv PDF paper https://arxiv.org/pdf/2310.05920.pdf
The ability to detect objects in images at varying scales has played a pivotal role in the design of modern object detectors.
Despite considerable progress in removing handcrafted components using transformers, multi-scale feature maps remain a key factor for their empirical success, even with a plain backbone like the Vision Transformer (ViT).
... show that this reliance on feature pyramids is unnecessary and a transformer-based detector with scale-aware attention enables the plain detector `SimPLR' whose backbone and detection head both operate on single-scale features.
The plain architecture allows SimPLR to effectively take advantages of self-supervised learning and scaling approaches with ViTs, yielding strong performance compared to multi-scale counterparts.
... SimPLR indicates better performance than end-to-end detectors (Mask2Former) and plain-backbone detectors (ViTDet), while consistently being faster ...
Please like and share this post if you enjoyed it using the buttons at the bottom!
Stay up to date. Subscribe to my posts https://morrislee1234.wixsite.com/website/contact
Web site with my other posts by category https://morrislee1234.wixsite.com/website
Comments