Improved segmenting of objects in a video that are mentioned in a text query with ReferFormer
Improved segmenting of objects in a video that are mentioned in a text query with ReferFormer
Language as Queries for Referring Video Object Segmentation
arXiv paper abstract https://arxiv.org/abs/2201.00487v1
arXiv PDF paper https://arxiv.org/pdf/2201.00487v1.pdf
Referring video object segmentation (R-VOS) is an emerging cross-modal task that aims to segment the target object referred by a language expression in all video frames.
... propose a simple and unified framework built upon Transformer, termed ReferFormer.
It views the language as queries and directly attends to the most relevant regions in the video frames.
... all the queries are obligated to find the referred objects only.
... The object tracking is achieved naturally by linking the corresponding queries across frames.
... On Ref-Youtube-VOS, Refer-Former ... exceeds the previous state-of-the-art performance by 8.4 points. ...
Please like and share this post if you enjoyed it using the buttons at the bottom!
Stay up to date. Subscribe to my posts https://morrislee1234.wixsite.com/website/contact
Web site with my other posts by category https://morrislee1234.wixsite.com/website
Comments