Identify better the events and participants in an image with CLIP-Event
Identify better the events and participants in an image with CLIP-Event
CLIP-Event: Connecting Text and Images with Event Structures
arXiv paper abstract https://arxiv.org/abs/2201.05078
arXiv PDF paper https://arxiv.org/pdf/2201.05078.pdf
... vision-language pretraining models primarily focus on understanding objects in images or entities in text, they often ignore the alignment at the level of events and their argument structures.
... propose a contrastive learning framework to enforce vision-language pretraining models to comprehend events and associated argument (participant) roles.
... take advantage of text information extraction technologies to obtain event structural knowledge, and utilize multiple prompt functions to contrast difficult negative descriptions by manipulating event structures.
... zero-shot CLIP-Event outperforms the state-of-the-art supervised model in argument extraction on Multimedia Event Extraction ...
Please like and share this post if you enjoyed it using the buttons at the bottom!
Stay up to date. Subscribe to my posts https://morrislee1234.wixsite.com/website/contact
Web site with my other posts by category https://morrislee1234.wixsite.com/website
#ComputerVision #ObjectDetection #AINewsClips #AI #ML #ArtificialIntelligence #MachineLearning
Comments