Get image matching text plus image, also get descriptions of images

morrislee
May 12, 2021
1 min read

ALIGN: Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision

Google AI Blog https://ai.googleblog.com/2021/05/align-scaling-up-visual-and-vision.html

Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision

arXiv paper abstract https://arxiv.org/abs/2102.05918

arXiv PDF paper https://arxiv.org/pdf/2102.05918.pdf

... In this paper, we leverage a noisy dataset of over one billion image alt-text pairs

... A simple dual-encoder architecture learns to align visual and language representations of the image and text pairs using a contrastive loss. We show that the scale of our corpus can make up for its noise and leads to state-of-the-art representations even with such a simple learning scheme.

... The aligned visual and language representations also set new state-of-the-art results on Flickr30K and MSCOCO benchmarks, even when compared with more sophisticated cross-attention models. The representations also enable cross-modality search with complex text and text + image queries.

Please like and share this post if you enjoyed it using the buttons at the bottom!

Stay up to date. Subscribe to my posts https://morrislee1234.wixsite.com/website/contact

Web site with my other posts by category https://morrislee1234.wixsite.com/website

#ComputerVision #Captioning #AINewsClips #AI #ML #ArtificialIntelligence #MachineLearning

News to help your R&D in artificial intelligence, machine learning, robotics, computer vision, smart hardware

Get image matching text plus image, also get descriptions of images

Recent Posts

Comments