One model for 2D and 3D perception using positional encodings of 2D and 3D tokens with ODIN
One model for 2D and 3D perception using positional encodings of 2D and 3D tokens with ODIN
ODIN: A Single Model for 2D and 3D Perception
arXiv paper abstract https://arxiv.org/abs/2401.02416
arXiv PDF paper https://arxiv.org/pdf/2401.02416.pdf
Project page https://odin-seg.github.io
... models on contemporary 3D perception ... consume and label dataset-provided 3D point clouds, obtained through post processing of sensed multiview RGB-D images.
They are typically trained in-domain, forego large-scale 2D pre-training and outperform alternatives that featurize the posed RGB-D multiview images instead.
... propose ODIN (Omni-Dimensional INstance segmentation), a model that can segment and label both 2D RGB images and 3D point clouds, using a transformer architecture that alternates between 2D within-view and 3D cross-view information fusion.
... model differentiates 2D and 3D feature operations through the positional encodings of the tokens involved
... ODIN achieves state-of-the-art performance on ... 3D instance segmentation benchmarks, and competitive performance on ScanNet, S3DIS and COCO.
It outperforms all previous works by a wide margin when the sensed 3D point cloud is used in place of the point cloud sampled from 3D mesh ...
Please like and share this post if you enjoyed it using the buttons at the bottom!
Stay up to date. Subscribe to my posts https://morrislee1234.wixsite.com/website/contact
Web site with my other posts by category https://morrislee1234.wixsite.com/website
Comments