Identify better the events and participants in an image with CLIP-Event

morrislee
Jan 14, 2022
1 min read

CLIP-Event: Connecting Text and Images with Event Structures

arXiv paper abstract https://arxiv.org/abs/2201.05078

arXiv PDF paper https://arxiv.org/pdf/2201.05078.pdf

... vision-language pretraining models primarily focus on understanding objects in images or entities in text, they often ignore the alignment at the level of events and their argument structures.

... propose a contrastive learning framework to enforce vision-language pretraining models to comprehend events and associated argument (participant) roles.

... take advantage of text information extraction technologies to obtain event structural knowledge, and utilize multiple prompt functions to contrast difficult negative descriptions by manipulating event structures.

... zero-shot CLIP-Event outperforms the state-of-the-art supervised model in argument extraction on Multimedia Event Extraction ...

Please like and share this post if you enjoyed it using the buttons at the bottom! Stay up to date. Subscribe to my posts https://morrislee1234.wixsite.com/website/contact Web site with my other posts by category https://morrislee1234.wixsite.com/website #ComputerVision #ObjectDetection #AINewsClips #AI #ML #ArtificialIntelligence #MachineLearning

News to help your R&D in artificial intelligence, machine learning, robotics, computer vision, smart hardware

Identify better the events and participants in an image with CLIP-Event

Recent Posts

Comments