Detect unknown objects in hierarchy by making image descriptions for training with DetCLIPv3
Detect unknown objects in hierarchy by making image descriptions for training with DetCLIPv3
DetCLIPv3: Towards Versatile Generative Open-vocabulary Object Detection
arXiv paper abstract https://arxiv.org/abs/2404.09216
arXiv PDF paper https://arxiv.org/pdf/2404.09216.pdf
... introduce DetCLIPv3, a high-performing detector that excels not only at both open-vocabulary object detection, but also generating hierarchical labels for detected objects.
DetCLIPv3 ... three core designs: 1. Versatile model architecture ... derive a robust open-set detection ... with generation ability via the integration of a caption head.
2. High information density data: ... auto-annotation ... leveraging visual large language model to refine captions for ... image-text pairs, providing ... labels to enhance the training.
3. Efficient training strategy: ... employ a pre-training stage with low-resolution inputs ... enables the object captioner to ... learn ... visual concepts from ... image-text paired data.
This is followed by a fine-tuning stage that leverages a small number of high-resolution samples to further enhance detection performance.
... DetCLIPv3 demonstrates superior open-vocabulary detection ... also ... state-of-the-art ... in dense captioning task on VG dataset, showcasing its strong generative capability.
Please like and share this post if you enjoyed it using the buttons at the bottom!
Stay up to date. Subscribe to my posts https://morrislee1234.wixsite.com/website/contact
Web site with my other posts by category https://morrislee1234.wixsite.com/website
Comments