Video correspondence by self-supervised learning with less cost by using spatial then time with Li
Video correspondence by self-supervised learning with less cost by using spatial then time with Li
Spatial-then-Temporal Self-Supervised Learning for Video Correspondence
arXiv paper abstract https://arxiv.org/abs/2209.07778v1
arXiv PDF paper https://arxiv.org/pdf/2209.07778v1.pdf
Learning temporal correspondence from unlabeled videos is of vital importance in computer vision, and has been tackled by different kinds of self-supervised pretext tasks.
... propose a spatial-then-temporal pretext task to address the training data cost problem.
... use contrastive learning from unlabeled still image data to obtain appearance-sensitive features.
... switch to unlabeled video data and learn motion-sensitive features by reconstructing frames.
... propose a global correlation distillation loss to retain the appearance sensitivity learned in the first step, as well as a local correlation distillation loss in a pyramid structure to combat temporal discontinuity.
... method surpasses the state-of-the-art self-supervised methods on a series of correspondence-based tasks ...
Please like and share this post if you enjoyed it using the buttons at the bottom!
Stay up to date. Subscribe to my posts https://morrislee1234.wixsite.com/website/contact
Web site with my other posts by category https://morrislee1234.wixsite.com/website
Comments