Segment unknown objects by training on independent image-mask and image-text pairs with Uni-OVSeg
Segment unknown objects by training on independent image-mask and image-text pairs with Uni-OVSeg
Open-Vocabulary Segmentation with Unpaired Mask-Text Supervision
arXiv paper abstract https://arxiv.org/abs/2402.08960
arXiv PDF paper https://arxiv.org/pdf/2402.08960.pdf
... open-vocabulary segmentation ... rely on image-mask-text triplets, yet this ... is labour-intensive
... liberate ... correspondence between masks and texts by using independent image-mask and image-text pairs, which can be easily collected respectively.
With this unpaired mask-text supervision, ... propose ... weakly-supervised open-vocabulary segmentation framework (Uni-OVSeg) that leverages confident pairs of mask predictions and entities in text descriptions.
Using the independent image-mask and image-text pairs, ... predict a set of binary masks and associate them with entities by resorting to the CLIP embedding space.
... using the large vision-language model (LVLM) to refine text descriptions and devise a multi-scale ensemble to stablise the matching between masks and entities.
Compared to text-only weakly-supervised methods, ... Uni-OVSeg achieves substantial improvements ... and even surpasses fully-supervised methods ...
Please like and share this post if you enjoyed it using the buttons at the bottom!
Stay up to date. Subscribe to my posts https://morrislee1234.wixsite.com/website/contact
Web site with my other posts by category https://morrislee1234.wixsite.com/website
Comments