top of page

News to help your R&D in artificial intelligence, machine learning, robotics, computer vision, smart hardware

As an Amazon Associate I earn

from qualifying purchases

Writer's picturemorrislee

Get image matching text plus image, also get descriptions of images

Get image matching text plus image, also get descriptions of images


ALIGN: Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision



Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision

arXiv paper abstract https://arxiv.org/abs/2102.05918


... In this paper, we leverage a noisy dataset of over one billion image alt-text pairs

... A simple dual-encoder architecture learns to align visual and language representations of the image and text pairs using a contrastive loss. We show that the scale of our corpus can make up for its noise and leads to state-of-the-art representations even with such a simple learning scheme.

... The aligned visual and language representations also set new state-of-the-art results on Flickr30K and MSCOCO benchmarks, even when compared with more sophisticated cross-attention models. The representations also enable cross-modality search with complex text and text + image queries.


Please like and share this post if you enjoyed it using the buttons at the bottom!


Stay up to date. Subscribe to my posts https://morrislee1234.wixsite.com/website/contact

Web site with my other posts by category https://morrislee1234.wixsite.com/website


10 views0 comments

Comments


ClickBank paid link

bottom of page