Segment untrained objects by grouping visual features and enhancing descriptions with O3S
Segment untrained objects by grouping visual features and enhancing descriptions with O3S
Multi-Modal Prototypes for Open-Set Semantic Segmentation
arXiv paper abstract https://arxiv.org/abs/2307.02003
arXiv PDF paper https://arxiv.org/pdf/2307.02003.pdf
In semantic segmentation, adapting a visual system to novel object categories at inference time has always been both valuable and challenging.
... existing methods rely on ... support examples as visual cues or class names as textual cues ... these ... two ... studied in isolation, neglecting the complementary intrinsic of low-level visual and high-level language
... define ... open-set semantic segmentation (O3S), which aims to learn seen and unseen semantics from both visual examples and textual names.
... extracts multi-modal prototypes for segmentation task, by first single modal self-enhancement and aggregation, then multi-modal complementary fusion.
... aggregate visual features into several tokens as visual prototypes, and enhance the class name with detailed descriptions for textual prototype generation. The two modalities are then fused to generate multi-modal prototypes
... State-of-the-art results are achieved even on more detailed part-segmentation, Pascal-Animals, by only training on coarse-grained datasets ...
Please like and share this post if you enjoyed it using the buttons at the bottom!
Stay up to date. Subscribe to my posts https://morrislee1234.wixsite.com/website/contact
Web site with my other posts by category https://morrislee1234.wixsite.com/website
Comments