Segment scene with unknown objects by enhance localization capabilities of CLIP with NACLIP
Segment scene with unknown objects by enhance localization capabilities of CLIP with NACLIP
Pay Attention to Your Neighbours: Training-Free Open-Vocabulary Semantic Segmentation
arXiv paper abstract https://arxiv.org/abs/2404.08181
arXiv PDF paper https://arxiv.org/pdf/2404.08181.pdf
... vision-language ... models, such as CLIP, have ... effectiveness in ... zero-shot image-level tasks ... work has investigated ... these models in open-vocabulary semantic segmentation (OVSS).
However, existing approaches often rely on impractical supervised pre-training or access to additional pre-trained networks.
... propose a strong baseline for training-free OVSS, termed Neighbour-Aware CLIP (NACLIP), representing a straightforward adaptation of CLIP tailored for this scenario.
... enforces localization of patches in the self-attention of CLIP's vision transformer which, despite being crucial for dense prediction tasks, has been overlooked in the OVSS literature.
By ... choices favouring segmentation, ... improves performance without ... additional data, auxiliary pre-trained networks, or extensive hyperparameter tuning
... Experiments are performed on 8 popular semantic segmentation benchmarks, yielding state-of-the-art performance on most scenarios ...
Please like and share this post if you enjoyed it using the buttons at the bottom!
Stay up to date. Subscribe to my posts https://morrislee1234.wixsite.com/website/contact
Web site with my other posts by category https://morrislee1234.wixsite.com/website
Comments