One model for 2D and 3D perception using positional encodings of 2D and 3D tokens with ODIN

morrislee
Jan 5, 2024
1 min read

ODIN: A Single Model for 2D and 3D Perception

arXiv paper abstract https://arxiv.org/abs/2401.02416

arXiv PDF paper https://arxiv.org/pdf/2401.02416.pdf

... models on contemporary 3D perception ... consume and label dataset-provided 3D point clouds, obtained through post processing of sensed multiview RGB-D images.

They are typically trained in-domain, forego large-scale 2D pre-training and outperform alternatives that featurize the posed RGB-D multiview images instead.

... propose ODIN (Omni-Dimensional INstance segmentation), a model that can segment and label both 2D RGB images and 3D point clouds, using a transformer architecture that alternates between 2D within-view and 3D cross-view information fusion.

... model differentiates 2D and 3D feature operations through the positional encodings of the tokens involved

... ODIN achieves state-of-the-art performance on ... 3D instance segmentation benchmarks, and competitive performance on ScanNet, S3DIS and COCO.

It outperforms all previous works by a wide margin when the sensed 3D point cloud is used in place of the point cloud sampled from 3D mesh ...

Please like and share this post if you enjoyed it using the buttons at the bottom!

Stay up to date. Subscribe to my posts https://morrislee1234.wixsite.com/website/contact

Web site with my other posts by category https://morrislee1234.wixsite.com/website

LinkedIn https://www.linkedin.com/in/morris-lee-47877b7b

#ComputerVision #3D #AINewsClips #AI #ML #ArtificialIntelligence #MachineLearning

News to help your R&D in artificial intelligence, machine learning, robotics, computer vision, smart hardware

One model for 2D and 3D perception using positional encodings of 2D and 3D tokens with ODIN

Recent Posts

Comments