top of page

News to help your R&D in artificial intelligence, machine learning, robotics, computer vision, smart hardware

As an Amazon Associate I earn

from qualifying purchases

Writer's picturemorrislee

Untrained object detection using over 100 times less data by flexible captioning with OTTER

Untrained object detection using over 100 times less data by flexible captioning with OTTER


Data Efficient Language-supervised Zero-shot Recognition with Optimal Transport Distillation



... Previous works, such as CLIP, use InfoNCE loss to train a model to predict the pairing between images and text captions.


CLIP, however, is data hungry and requires more than 400M image-text pairs for training.


The inefficiency can be partially attributed to the fact that the image-text pairs are noisy.


... propose OTTER (Optimal TransporT distillation for Efficient zero-shot Recognition), which uses online entropic optimal transport to find a soft image-text match as labels for contrastive learning.


Based on pretrained image and text encoders, models trained with OTTER achieve strong performance with only 3M image text pairs.


... Over 42 evaluations on 7 different dataset/architecture settings x 6 metrics, OTTER outperforms (32) or ties (2) all baselines in 34 of them.



Please like and share this post if you enjoyed it using the buttons at the bottom!


Stay up to date. Subscribe to my posts https://morrislee1234.wixsite.com/website/contact

Web site with my other posts by category https://morrislee1234.wixsite.com/website


36 views0 comments

Comentários


ClickBank paid link

bottom of page