Segment scene using information from vision-language models without neural training with PnP-OVSS

morrislee
Nov 30, 2023
1 min read

Plug-and-Play, Dense-Label-Free Extraction of Open-Vocabulary Semantic Segmentation from Vision-Language Models

arXiv paper abstract https://arxiv.org/abs/2311.17095

arXiv PDF paper https://arxiv.org/pdf/2311.17095.pdf

From an enormous amount of image-text pairs, large-scale vision-language models (VLMs) learn to implicitly associate image regions with words

... propose ... Plug-and-Play Open-Vocabulary Semantic Segmentation (PnP-OVSS) ... leverages a VLM with direct text-to-image cross-attention and an image-text matching loss to produce semantic segmentation.

However, cross-attention alone tends to over-segment, whereas cross-attention plus GradCAM tend to under-segment.

To alleviate this issue, ... introduce Salience Dropout; by iteratively dropping patches that the model is most attentive to, ... are able to better resolve the entire extent of the segmentation mask.

... method does not require any neural network training and performs hyperparameter tuning without the need for any segmentation annotations, even for a validation set.

PnP-OVSS ... substantial improvements over a comparable baseline ... and even outperforms most baselines that conduct additional network training on top of pretrained VLMs.

Please like and share this post if you enjoyed it using the buttons at the bottom!

Stay up to date. Subscribe to my posts https://morrislee1234.wixsite.com/website/contact

Web site with my other posts by category https://morrislee1234.wixsite.com/website

LinkedIn https://www.linkedin.com/in/morris-lee-47877b7b

#ComputerVision #Segmentation #AINewsClips #AI #ML #ArtificialIntelligence #MachineLearning

News to help your R&D in artificial intelligence, machine learning, robotics, computer vision, smart hardware

Segment scene using information from vision-language models without neural training with PnP-OVSS

Recent Posts

Comments