Segment unknown objects by training on independent image-mask and image-text pairs with Uni-OVSeg

morrislee
Feb 15, 2024
1 min read

Open-Vocabulary Segmentation with Unpaired Mask-Text Supervision

arXiv paper abstract https://arxiv.org/abs/2402.08960

arXiv PDF paper https://arxiv.org/pdf/2402.08960.pdf

... open-vocabulary segmentation ... rely on image-mask-text triplets, yet this ... is labour-intensive

... liberate ... correspondence between masks and texts by using independent image-mask and image-text pairs, which can be easily collected respectively.

With this unpaired mask-text supervision, ... propose ... weakly-supervised open-vocabulary segmentation framework (Uni-OVSeg) that leverages confident pairs of mask predictions and entities in text descriptions.

Using the independent image-mask and image-text pairs, ... predict a set of binary masks and associate them with entities by resorting to the CLIP embedding space.

... using the large vision-language model (LVLM) to refine text descriptions and devise a multi-scale ensemble to stablise the matching between masks and entities.

Compared to text-only weakly-supervised methods, ... Uni-OVSeg achieves substantial improvements ... and even surpasses fully-supervised methods ...

Please like and share this post if you enjoyed it using the buttons at the bottom!

Stay up to date. Subscribe to my posts https://morrislee1234.wixsite.com/website/contact

Web site with my other posts by category https://morrislee1234.wixsite.com/website

LinkedIn https://www.linkedin.com/in/morris-lee-47877b7b

#ComputerVision #Segmentation #AINewsClips #AI #ML #ArtificialIntelligence #MachineLearning

News to help your R&D in artificial intelligence, machine learning, robotics, computer vision, smart hardware

Segment unknown objects by training on independent image-mask and image-text pairs with Uni-OVSeg

Recent Posts

Comments