3D object detection boxes directly from image and point data using multi-modal features with CMT
3D object detection boxes directly from image and point data using multi-modal features with CMT
Cross Modal Transformer: Towards Fast and Robust 3D Object Detection
arXiv paper abstract https://arxiv.org/abs/2301.01283
arXiv PDF paper https://arxiv.org/pdf/2301.01283.pdf
... propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection.
Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes.
The spatial alignment of multi-modal tokens is performed by encoding the 3D points into multi-modal features.
The core design of CMT is quite simple while its performance is impressive.
It achieves 74.1% NDS (state-of-the-art with single model) on nuScenes test set while maintaining faster inference speed.
Moreover, CMT has a strong robustness even if the LiDAR is missing ...
Please like and share this post if you enjoyed it using the buttons at the bottom!
Stay up to date. Subscribe to my posts https://morrislee1234.wixsite.com/website/contact
Web site with my other posts by category https://morrislee1234.wixsite.com/website
Comments