Segment 3D scene with unknown objects using NeRF and ranking with CLIP foundation model OV-NeRF
Segment 3D scene with unknown objects using NeRF and ranking with CLIP foundation model OV-NeRF
OV-NeRF: Open-vocabulary Neural Radiance Fields with Vision and Language Foundation Models for 3D Semantic Understanding
arXiv paper abstract https://arxiv.org/abs/2402.04648
arXiv PDF paper https://arxiv.org/pdf/2402.04648.pdf
The development of Neural Radiance Fields (NeRFs) has provided ... open-vocabulary 3D semantic perception ... However ... methods that extract semantics ... from Contrastive Language-Image Pretraining (CLIP) for ... learning encounter difficulties
... propose OV-NeRF, which exploits the potential of pre-trained vision and language foundation models to enhance semantic field learning through proposed single-view and cross-view strategies.
First, from the single-view perspective, ... introduce Region Semantic Ranking (RSR) regularization by leveraging 2D mask proposals derived from SAM to rectify the noisy semantics of each training view
... Second, from the cross-view perspective, ... propose a Cross-view Self-enhancement (CSE) strategy to address the challenge raised by view-inconsistent semantics.
Rather than invariably utilizing the 2D inconsistent semantics from CLIP, CSE leverages the 3D consistent semantics generated from the well-trained semantic field itself for semantic field training, aiming to ... enhance overall semantic consistency across different views.
... OV-NeRF outperforms current state-of-the-art methods ... approach exhibits consistent superior results across various CLIP configurations, further verifying its robustness.
Please like and share this post if you enjoyed it using the buttons at the bottom!
Stay up to date. Subscribe to my posts https://morrislee1234.wixsite.com/website/contact
Web site with my other posts by category https://morrislee1234.wixsite.com/website
Comments