top of page

News to help your R&D in artificial intelligence, machine learning, robotics, computer vision, smart hardware

As an Amazon Associate I earn

from qualifying purchases

Writer's picturemorrislee

Transformers for action recognition 40 times faster by focus attention in time

Transformers for action recognition 40 times faster by focus attention in time


An Image is Worth 16x16 Words, What is a Video Worth?

arXiv paper abstract https://arxiv.org/abs/2103.13915

... significantly reducing the number of frames required for inference. Our approach relies on a temporal transformer that applies global attention over video frames, and thus better exploits the salient information in each frame. Therefore our approach is very input efficient, and can achieve SotA results (on Kinetics dataset) with a fraction of the data (frames per video), computation and latency. Specifically on Kinetics-400, we reach 78.8 top-1 accuracy with x30 less frames per video, and x40 faster inference than the current leading method.


Please like and share this post if you enjoyed it using the buttons at the bottom!


Stay up to date. Subscribe to my posts https://morrislee1234.wixsite.com/website/contact

Web site with my other posts by category https://morrislee1234.wixsite.com/website

25 views0 comments

Comments


ClickBank paid link

bottom of page