Untrained object detection using over 100 times less data by flexible captioning with OTTER

morrislee
Dec 28, 2021
1 min read

Data Efficient Language-supervised Zero-shot Recognition with Optimal Transport Distillation

arXiv paper abstract https://arxiv.org/abs/2112.09445v2

arXiv PDF paper https://arxiv.org/pdf/2112.09445v2.pdf

GitHub https://github.com/facebookresearch/otter

... Previous works, such as CLIP, use InfoNCE loss to train a model to predict the pairing between images and text captions.

CLIP, however, is data hungry and requires more than 400M image-text pairs for training.

The inefficiency can be partially attributed to the fact that the image-text pairs are noisy.

... propose OTTER (Optimal TransporT distillation for Efficient zero-shot Recognition), which uses online entropic optimal transport to find a soft image-text match as labels for contrastive learning.

Based on pretrained image and text encoders, models trained with OTTER achieve strong performance with only 3M image text pairs.

... Over 42 evaluations on 7 different dataset/architecture settings x 6 metrics, OTTER outperforms (32) or ties (2) all baselines in 34 of them.

Please like and share this post if you enjoyed it using the buttons at the bottom!

Stay up to date. Subscribe to my posts https://morrislee1234.wixsite.com/website/contact

Web site with my other posts by category https://morrislee1234.wixsite.com/website

#ComputerVision #ObjectDetection #AINewsClips #AI #ML #ArtificialIntelligence #MachineLearning

News to help your R&D in artificial intelligence, machine learning, robotics, computer vision, smart hardware

Untrained object detection using over 100 times less data by flexible captioning with OTTER

Recent Posts

Comments