top of page

News to help your R&D in artificial intelligence, machine learning, robotics, computer vision, smart hardware

As an Amazon Associate I earn

from qualifying purchases

Writer's picturemorrislee

Identify human actions on objects and their locations after training on image captions

Identify human actions on objects and their locations after training on image captions


Weakly Supervised Human-Object Interaction Detection in Video via Contrastive Spatiotemporal Regions



... introduce the task of weakly supervised learning for detecting human and object interactions in videos.


... system does not know what types of human-object interactions are present in a video or the actual spatiotemporal location of the human and the object.


... introduce a contrastive weakly supervised training loss that aims to jointly associate spatiotemporal regions in a video with an action and object vocabulary and encourage temporal continuity of the visual appearance of moving objects as a form of self-supervision.


To train our model, we introduce a dataset comprising over 6.5k videos with human-object interaction annotations that have been semi-automatically curated from sentence captions associated with the videos.


We demonstrate improved performance over weakly supervised baselines adapted to our task on our video dataset.



Please like and share this post if you enjoyed it using the buttons at the bottom!


Stay up to date. Subscribe to my posts https://morrislee1234.wixsite.com/website/contact

Web site with my other posts by category https://morrislee1234.wixsite.com/website


21 views0 comments

Comments


ClickBank paid link

bottom of page