top of page

News to help your R&D in artificial intelligence, machine learning, robotics, computer vision, smart hardware

As an Amazon Associate I earn

from qualifying purchases

Writer's picturemorrislee

Classification of long videos using state-space with ViS4mer

Classification of long videos using state-space with ViS4mer


Long Movie Clip Classification with State-Space Video Models



Most modern video recognition ... operate on short video clips (e.g., 5-10s in length).


... challenging to ... long movie understanding tasks, which typically require sophisticated long-range temporal reasoning capabilities.


... video transformers ... address this ... by ... long-range temporal self-attention. However, ... quadratic cost of self-attention ...


... propose ViS4mer ... uses a standard Transformer encoder for short-range spatiotemporal feature extraction, and a multi-scale temporal S4 decoder for subsequent long-range temporal reasoning.


... ViS4mer is 2.63x faster and requires 8x less GPU memory than the corresponding pure self-attention-based model.


... ViS4mer achieves state-of-the-art results in 7 out of 9 long-form movie video classification tasks on the LVU benchmark. ...



Please like and share this post if you enjoyed it using the buttons at the bottom!


Stay up to date. Subscribe to my posts https://morrislee1234.wixsite.com/website/contact

Web site with my other posts by category https://morrislee1234.wixsite.com/website



78 views0 comments

Comments


ClickBank paid link

bottom of page