Better depth from monocular image by reasoning globally and locally with MonoViT

morrislee
Aug 12, 2022
1 min read

MonoViT: Self-Supervised Monocular Depth Estimation with a Vision Transformer

arXiv paper abstract https://arxiv.org/abs/2208.03543v1

arXiv PDF paper https://arxiv.org/pdf/2208.03543v1.pdf

GitHub https://github.com/zxcqlf/monovit

Self-supervised monocular depth estimation is an attractive solution that does not require hard-to-source depth labels for training.

... However, their limited receptive field constrains existing network architectures to reason only locally, dampening the effectiveness of the self-supervised paradigm.

... propose MonoViT, a brand-new framework combining the global reasoning enabled by ViT models with the flexibility of self-supervised monocular depth estimation.

By combining plain convolutions with Transformer blocks, ... model can reason locally and globally, yielding depth prediction at a higher level of detail and accuracy, allowing MonoViT to achieve state-of-the-art performance on the established KITTI dataset ...

Please like and share this post if you enjoyed it using the buttons at the bottom!

Stay up to date. Subscribe to my posts https://morrislee1234.wixsite.com/website/contact

Web site with my other posts by category https://morrislee1234.wixsite.com/website

LinkedIn https://www.linkedin.com/in/morris-lee-47877b7b

#ComputerVision #3D #AINewsClips #AI #ML #ArtificialIntelligence #MachineLearning

News to help your R&D in artificial intelligence, machine learning, robotics, computer vision, smart hardware

Better depth from monocular image by reasoning globally and locally with MonoViT

Recent Posts

Comments