Segment scene using words in the caption using one stage with PPMN

morrislee
Aug 17, 2022
1 min read

PPMN: Pixel-Phrase Matching Network for One-Stage Panoptic Narrative Grounding

arXiv paper abstract https://arxiv.org/abs/2208.05647v1

arXiv PDF paper https://arxiv.org/pdf/2208.05647v1.pdf

GitHub https://github.com/dzh19990407/ppmn

Panoptic Narrative Grounding (PNG) is an emerging task whose goal is to segment visual objects of things and stuff categories described by dense narrative captions of a still image.

... two-stage approach first extracts segmentation region proposals ... then conducts coarse region-phrase matching to ground the candidate regions for each noun phrase.

However, the two-stage pipeline usually suffers from the performance limitation of low-quality proposals in the first stage ... as well as complicated strategies designed for things and stuff

... To alleviate ... drawbacks, ... propose a one-stage end-to-end Pixel-Phrase Matching Network (PPMN), which directly matches each phrase to its corresponding pixels instead of region proposals

... model can exploit sufficient and finer cross-modal semantic correspondence from the supervision of densely annotated pixel-phrase pairs

... method achieves new state-of-the-art performance on the PNG benchmark with 4.0 absolute Average Recall gains.

Please like and share this post if you enjoyed it using the buttons at the bottom!

Stay up to date. Subscribe to my posts https://morrislee1234.wixsite.com/website/contact

Web site with my other posts by category https://morrislee1234.wixsite.com/website

LinkedIn https://www.linkedin.com/in/morris-lee-47877b7b

#ComputerVision #Segmentation #AINewsClips #AI #ML #ArtificialIntelligence #MachineLearning

News to help your R&D in artificial intelligence, machine learning, robotics, computer vision, smart hardware

Segment scene using words in the caption using one stage with PPMN

Recent Posts

Comments