Segment object with only image labels using CLIP and SAM to make segmentation seeds with Yang
Segment object with only image labels using CLIP and SAM to make segmentation seeds with Yang
Foundation Model Assisted Weakly Supervised Semantic Segmentation
arXiv paper abstract https://arxiv.org/abs/2312.03585
arXiv PDF paper https://arxiv.org/pdf/2312.03585.pdf
This work aims to leverage pre-trained foundation models, such as contrastive language-image pre-training (CLIP) and segment anything model (SAM), to address weakly supervised semantic segmentation (WSSS) using image-level labels.
... propose a coarse-to-fine framework based on CLIP and SAM for generating high-quality segmentation seeds.
... construct an image classification task and a seed segmentation task, which are jointly performed by CLIP ... A SAM-based seeding (SAMS) module is designed and applied to each task to produce either coarse or fine seed maps.
... design a multi-label contrastive loss supervised by image-level labels and a CAM activation loss supervised by the generated coarse seed map. These losses are used to learn the prompts
... input each image along with the learned segmentation-specific prompts into CLIP and
the SAMS module to produce high-quality segmentation seeds. These ... serve as pseudo labels to train an off-the-shelf segmentation network like other two-stage WSSS methods.
... method achieves the state-of-the-art performance on PASCAL VOC 2012 and competitive results on MS COCO 2014.
Please like and share this post if you enjoyed it using the buttons at the bottom!
Stay up to date. Subscribe to my posts https://morrislee1234.wixsite.com/website/contact
Web site with my other posts by category https://morrislee1234.wixsite.com/website
Comments