Survey of computer vision backbones on many tasks
Survey of computer vision backbones on many tasks
Battle of the Backbones: A Large-Scale Comparison of Pretrained Models across Computer Vision Tasks
arXiv paper abstract
arXiv PDF paper
Neural network based computer vision systems are typically built on a backbone ... it is difficult for practitioners to make informed decisions about which backbone to choose.
... benchmarking a diverse suite of pretrained models, including vision-language models, those trained via self-supervised learning, and the Stable Diffusion backbone, across a diverse set of computer vision tasks ranging from classification to object detection to OOD generalization and more.
... sheds light on promising directions for the research community to advance computer vision by illuminating strengths and weakness of existing approaches through a comprehensive analysis conducted on more than 1500 training runs.
While vision transformers (ViTs) and self-supervised learning (SSL) are increasingly popular, ... find that convolutional neural networks pretrained in a supervised fashion on large training sets still perform best on most tasks among the models ...
Moreover, in apples-to-apples comparisons on the same architectures and similarly sized pretraining datasets, ... find that SSL backbones are highly competitive, indicating that future works should perform SSL pretraining with advanced architectures and larger pretraining datasets.
... release the raw results of ... experiments along with code that allows researchers to put their own backbones through the gauntlet ...
Please like and share this post if you enjoyed it using the buttons at the bottom!
Stay up to date. Subscribe to my posts
Web site with my other posts by category