Survey of transformers for video

morrislee
Jan 20, 2022
1 min read

Video Transformers: A Survey

arXiv paper abstract https://arxiv.org/abs/2201.05991v1

arXiv PDF paper https://arxiv.org/pdf/2201.05991v1.pdf

... Transformers a promising tool for solving video related tasks, but some adaptations are required.

... In this survey ... analyse and summarize the main contributions and trends for adapting Transformers to model video data.

... delve into how videos are embedded and tokenized, finding a very widspread use of large CNN backbones to reduce dimensionality and a predominance of patches and frames as tokens.

... study how the Transformer layer has been tweaked to handle longer sequences, generally by reducing the number of tokens in single attention operation.

... explore how other modalities are integrated with video and

... conduct a performance comparison on the most common benchmark for Video Transformers (i.e., action classification), finding them to outperform 3D CNN counterparts with equivalent FLOPs and no significant parameter increase.

Please like and share this post if you enjoyed it using the buttons at the bottom!

Stay up to date. Subscribe to my posts https://morrislee1234.wixsite.com/website/contact

Web site with my other posts by category https://morrislee1234.wixsite.com/website

#ComputerVision #Transformers #AINewsClips #AI #ML #ArtificialIntelligence #MachineLearning

News to help your R&D in artificial intelligence, machine learning, robotics, computer vision, smart hardware

Survey of transformers for video

Recent Posts

Comments