Segment scene using words in the caption using one stage with PPMN
Segment scene using words in the caption using one stage with PPMN
PPMN: Pixel-Phrase Matching Network for One-Stage Panoptic Narrative Grounding
arXiv paper abstract https://arxiv.org/abs/2208.05647v1
arXiv PDF paper https://arxiv.org/pdf/2208.05647v1.pdf
Panoptic Narrative Grounding (PNG) is an emerging task whose goal is to segment visual objects of things and stuff categories described by dense narrative captions of a still image.
... two-stage approach first extracts segmentation region proposals ... then conducts coarse region-phrase matching to ground the candidate regions for each noun phrase.
However, the two-stage pipeline usually suffers from the performance limitation of low-quality proposals in the first stage ... as well as complicated strategies designed for things and stuff
... To alleviate ... drawbacks, ... propose a one-stage end-to-end Pixel-Phrase Matching Network (PPMN), which directly matches each phrase to its corresponding pixels instead of region proposals
... model can exploit sufficient and finer cross-modal semantic correspondence from the supervision of densely annotated pixel-phrase pairs
... method achieves new state-of-the-art performance on the PNG benchmark with 4.0 absolute Average Recall gains.
Please like and share this post if you enjoyed it using the buttons at the bottom!
Stay up to date. Subscribe to my posts https://morrislee1234.wixsite.com/website/contact
Web site with my other posts by category https://morrislee1234.wixsite.com/website
Comments