Segment unknown objects using VLM to filter texts and enhance masks with CLIP as RNN

morrislee
Dec 14, 2023
1 min read

CLIP as RNN: Segment Countless Visual Concepts without Training Endeavor

arXiv paper abstract https://arxiv.org/abs/2312.07661

arXiv PDF paper https://arxiv.org/pdf/2312.07661.pdf

GitHub https://torrvision.com/clip_as_rnn

Existing open-vocabulary image segmentation methods require a fine-tuning step on mask annotations and/or image-text datasets. Mask labels are labor-intensive

... , without fine-tuning, VLMs trained under weak image-text supervision tend to make suboptimal mask predictions when there are text queries referring to non-existing concepts in the image.

... introduce a novel recurrent framework that progressively filters out irrelevant texts and enhances mask quality without training efforts.

The recurrent unit is a two-stage segmenter built upon a VLM with frozen weights.

... model retains the VLM's broad vocabulary space and strengthens its segmentation capability.

... method outperforms not only the training-free counterparts, but also those fine-tuned with millions of additional data samples, and sets new state-of-the-art records for both zero-shot semantic and referring image segmentation tasks ...

Please like and share this post if you enjoyed it using the buttons at the bottom!

Stay up to date. Subscribe to my posts https://morrislee1234.wixsite.com/website/contact

Web site with my other posts by category https://morrislee1234.wixsite.com/website

LinkedIn https://www.linkedin.com/in/morris-lee-47877b7b

#ComputerVision #Segmentation #AINewsClips #AI #ML #ArtificialIntelligence #MachineLearning

News to help your R&D in artificial intelligence, machine learning, robotics, computer vision, smart hardware

Segment unknown objects using VLM to filter texts and enhance masks with CLIP as RNN

Recent Posts

Comments