Survey of transformers for video
Survey of transformers for video
Video Transformers: A Survey
arXiv paper abstract https://arxiv.org/abs/2201.05991v1
arXiv PDF paper https://arxiv.org/pdf/2201.05991v1.pdf
... Transformers a promising tool for solving video related tasks, but some adaptations are required.
... In this survey ... analyse and summarize the main contributions and trends for adapting Transformers to model video data.
... delve into how videos are embedded and tokenized, finding a very widspread use of large CNN backbones to reduce dimensionality and a predominance of patches and frames as tokens.
... study how the Transformer layer has been tweaked to handle longer sequences, generally by reducing the number of tokens in single attention operation.
... explore how other modalities are integrated with video and
... conduct a performance comparison on the most common benchmark for Video Transformers (i.e., action classification), finding them to outperform 3D CNN counterparts with equivalent FLOPs and no significant parameter increase.
Please like and share this post if you enjoyed it using the buttons at the bottom!
Stay up to date. Subscribe to my posts https://morrislee1234.wixsite.com/website/contact
Web site with my other posts by category https://morrislee1234.wixsite.com/website
Comments