top of page

News to help your R&D in artificial intelligence, machine learning, robotics, computer vision, smart hardware

As an Amazon Associate I earn

from qualifying purchases

Writer's picturemorrislee

Better image captioning and question answering using weakly supervised training

Better image captioning and question answering using weakly supervised training


SimVLM: Simple Visual Language Model Pretraining with Weak Supervision

arXiv paper abstract https://arxiv.org/abs/2108.10904



... Vision-Language Pretraining (VLP) has achieved impressive performance on many multimodal downstream tasks.


However, the requirement for expensive annotations ... limits the scalability of existing approaches


... relax these constraints and present a minimalist pretraining framework, named Simple Visual Language Model (SimVLM).


... by exploiting large-scale weak supervision, and is trained end-to-end with a single prefix language modeling objective.


Without utilizing extra data or task-specific customization, the resulting model significantly outperforms previous pretraining methods and achieves new state-of-the-art results on a wide range of discriminative and generative vision-language benchmarks, including VQA (+3.74% vqa-score), NLVR2 (+1.17% accuracy), SNLI-VE (+1.37% accuracy) and image captioning tasks (+10.1% average CIDEr score).


... demonstrate that SimVLM acquires strong generalization and transfer ability, enabling zero-shot behavior including open-ended visual question answering and cross-modality transfer.



Please like and share this post if you enjoyed it using the buttons at the bottom!


Stay up to date. Subscribe to my posts https://morrislee1234.wixsite.com/website/contact

Web site with my other posts by category https://morrislee1234.wixsite.com/website


25 views0 comments

Comments


ClickBank paid link

bottom of page