top of page

News to help your R&D in artificial intelligence, machine learning, robotics, computer vision, smart hardware

As an Amazon Associate I earn

from qualifying purchases

Writer's picturemorrislee

Segment scene using information from vision-language models without neural training with PnP-OVSS

Segment scene using information from vision-language models without neural training with PnP-OVSS


Plug-and-Play, Dense-Label-Free Extraction of Open-Vocabulary Semantic Segmentation from Vision-Language Models

arXiv paper abstract https://arxiv.org/abs/2311.17095





From an enormous amount of image-text pairs, large-scale vision-language models (VLMs) learn to implicitly associate image regions with words


... propose ... Plug-and-Play Open-Vocabulary Semantic Segmentation (PnP-OVSS) ... leverages a VLM with direct text-to-image cross-attention and an image-text matching loss to produce semantic segmentation.


However, cross-attention alone tends to over-segment, whereas cross-attention plus GradCAM tend to under-segment.


To alleviate this issue, ... introduce Salience Dropout; by iteratively dropping patches that the model is most attentive to, ... are able to better resolve the entire extent of the segmentation mask.


... method does not require any neural network training and performs hyperparameter tuning without the need for any segmentation annotations, even for a validation set.


PnP-OVSS ... substantial improvements over a comparable baseline ... and even outperforms most baselines that conduct additional network training on top of pretrained VLMs.



Please like and share this post if you enjoyed it using the buttons at the bottom!


Stay up to date. Subscribe to my posts https://morrislee1234.wixsite.com/website/contact

Web site with my other posts by category https://morrislee1234.wixsite.com/website



25 views0 comments

Comments


ClickBank paid link

bottom of page